 to session 17 of our course on Quality Control and Improvement with Minitabh and Professor Indrajit Mukherjee, Shailesh J Mehta, School of Management ok. So, last time we were discussing about Confidence Interval and that gives a lead to another important concept which is extensively used in quality and improvements, control and improvements which is known as Hypothesis Testing. So, I will give a brief idea of Hypothesis Testing over here and then we will see what type of test is generally conducted which will help us in assessing the qualities like in many situations that we encounter in real life problems like that ok. So, let me start with Hypothesis Testing speaking on Hypothesis Testing. So, and we have discussed about population and samples, population means if it is in manufacturing I told it is infinite population and from there whatever is the CTqueen interest and what we do is that we take samples and measure the CTQs and try to estimate the parameters of the population let us say mean I am interested in because location accuracy and precision both are important whether it is mean or a variance like that. So, data has to be collected from the population. So, what we do is that we take certain amount of samples which is adequate like that and then what we do is that we try to assess based on the sample observations, sample observations. So, we calculate the mean and we try to say that this is the confidence interval of the population parameter or population mean like that or variance of the other. So, there will be a boundary that we can determine like upper bound and lower bound like that for mean and also for variance like that that we have seen in confidence interval and in confidence interval what is important to observe is that the statistician has provided a measure that we can be wrong our confidence interval can be wrong and that is the level of significance that is the level of significance which is taken generally as 5 percent alpha level of significance that we have seen last time. So, with the given level of alpha we can determine the confidence intervals. So, if you want to minimize the alpha in that case or minimize the error in your confidence interval. So, what is required is that either I increase the sample size like that or what we can do is that we can just enhance the confidence band enhance the confidence band or increase the confidence band like that. So, as alpha decreases in that case what will happen is that my confidence band will increase like that. So, this extending and reducing this depends on the alpha values like that depends on the alpha value which will define the boundary conditions which will define the upper bound and lower bound or the confidence interval which will define the confidence interval. So, based on this concept itself this was extended in hypothesis testing which is nothing but a statement or claims that is proved or disproved like that basically we are interested to generally assess some of the parameters and what is the value of the parameter is it close to certain value or is it different from certain values like that ok. This is a simple example where you can see that population mean is given as 25, but I have got a value from the sample observation I have got a value of 20 over here. So, this is the sample observation and this is the population and it is expected that the population from experience or something they expect that it should be close to 25 like that. But I have taken a sample observation over here and the X bar observation that we are getting over here is around 20 ok. So, is 20 close to 25 can we say that although it is coming out to be 20, but we expect that if this is this is natural. So, when the population mean is at 25 this value is quite natural like that. So, this is a let us say normal distribution assumption and mu is say to be 25 and I have got some value which is 20 over here X bar is I am getting a value of 20. So, is it very close to the mean value over here that is that is the thing that we want to check over here ok. So, and I will do it once like that I cannot take repeated samples like that that privilege I do not have. So, with one sample observation I am getting a mean value of this. So, either I have to prove or disprove the claim that is made over here. So, this is the statement that is I have to prove or disprove over here whether it is close to 25 or it is different from 25 that means, it is very far away from 25 like that ok. For this types of claims or CTQ whatever you can think of. So, any CTQ values and we expect some some values and we are getting some value. So, is the center or has the center really shifted like that. So, in process control chart what we have seen is that. So, we define some control limit lines. So, this is based on some new observation over here that means, population new observation is somewhere over here, but I am getting some values mean observation over here let us say average values over here. So, this control chart should give me a signal whenever is whenever the mean is shifting like that. So, that that type of scenario also exists in control chart like that. So, I am proving and disproving like that. So, it is in control and out of control like that. So, that is also done in control chart techniques like that ok. So, here I am taking a simple example over here I have some data observation or population mean age over here target population whatever it is. So, in this case my my expectation is it is near to 25, but I am getting 20 is 20 close to 25 that I want to prove or disprove like that ok. So, how do I do that by hypothesis testing. So, statistician has given us some options over here whether I am close to 25 or I am not I am different from 25 basically. So, that can be done in hypothesis testing. And so, when we write the hypothesis testing what we do is that we say something is null statement or void statement over here which is known as H naught and alternate statement over here which is known as H 1 like that. So, whenever a person is taken to court in that case judge what he has to decide is that either the person is a innocent person or the person is guilty basically. So, there are two hypothesis over here. So, initially what judges assume that that person is innocent like that. So, that is the null hypothesis or void statement like that and the alternate statement will be the person is guilty basically. So, that will be the alternate statement like that ok. So, see this is the way we define statements like that. So, this is a non-statistical example we can think of. So, whenever somebody is in court so, either he is a innocent person or he is a guilty person. So, either I go for one of the claims over here and or I disprove this one that the person is guilty like that. So, based on the evidence that is provided by the police and other from the society what we gather like that we either prove or disprove the statement. So, over here statement is that person is innocent and the alternate statement is person is guilty like that. Similarly, manufacturing example let us say and this is taken from Montgomery the mean burning strength I am interested in that is the CTQ I am interested in over here. And we expect it should be it should be near to 50 centimeters per second like that and we want to prove or disprove the statement like that. So, one is null statement over here, one is alternate statement. So, H naught versus H 1 over here that is a null and alternate hypothesis that we try to check generally ok. So, we will have some evidence. So, in that case what we will do is that we will we will collect some sample observation and based on that mean burning net that we get over here x bar information and standard deviation information that we get based on that what we will do is that either we go for this statement over here mu equals to 50 or mu naught equals to 50. Now my sample observation cannot be exactly equals to 50. Now then the idea of confidence interval is used like that the similar idea is used over here to assess and I will give a upper bound and lower bound if this statement is true and then I will see that if x bar falls within this or not does not fall within this. So, with the help of a state statistic what we will do is that we will either go for this statement or go against this statement. So, I am trying to prove whether it is equals to 50 or whether it is not equals to 50. This is null and this is alternate. Generally null is written like that. So, sometimes we will see that greater than equals to on one side and less than like that. So, null can be greater than equals to or less than equals to and the opposite will be greater than or less than conditions like that. So, this is also possible the way in many of the books you will see. So, what I am trying to say over here is that one will be null one will be alternate hypothesis. So, I want to prove or disprove the statement like that whether improvement has happened or improvement has not happened like that in any quality measures that we are doing. So, that is also an important issue when we say that we have done improvements whether we have really done or it is not statistically significant basically. So, that we prove or disprove like that. So, for that we use hypothesis testing concept like that. So, hypothesis testing can be two-sided hypothesis testing that means what does it say that not equals to condition when this is the condition when we say that it can go on either side of mu 0 like that. So, either it is equals to or not equals to. So, it can be greater than it can be less than conditions like that hypothesis testing can also be one-sided over here. That means, this is one-sided over here it is less than equals to or it is less than or it is greater than like that. So, it is strict condition that we are imposing over here either less than or greater than like that is one-sided statement what we say one-sided statement that we make. So, and if this is the less than condition then in that case, this should be greater than equals to condition basically this should be the other condition. So, I can write equals to or equivalence. So, greater than equals to if it is a continuous distribution in that case it does not make much difference. So, whether it is equal to greater than equals to condition like that ok ah. So, in this case ah we we have ah concept of hypothesis testing. So, what statistician has given us that if the mean burning rate lies over here, then I can define a ah boundary condition over here, I can define that this is the upper condition and this is the lower condition and this is the acceptance zone. We can say that this is accept zone and this will be the rejection zone this will be the rejection zone over here. So, what statistician says is that I can define this one. Now, with a level of significance alpha that means, with a confidence level of alpha over here that means, I can be wrong I can be wrong, but I am defining a zone over here. And if my value of x bar fall within this whatever sample statistics I have calculated, if it falls within this zone ah what we can say is that ah it is not different from 50 basically it is not different from 50. But if some observation coming out over here which is very far away over here and and that is not lying within the acceptance zone over here and that is defined by this to this zone over here, then in that case we will say this is unnatural observation this is unexpected. So, in that case ah we expect that the population mean is not 50 at least. So, if you are seeing extreme observation which is away from this one in that case what will happen is that we can we can say that these observations are unusual. So, in that case mean this statement over here mu equals to 50 does not hold true. So, I will go by this statement over here and I have evidence to go for this mu not equals to 50 conditions like that because and I but however, I for the hypothesis testing always whatever decision I am taking. So, in hypothesis testing what we are doing is that we are making inference basically over here. So, hypothesis testing is all about inference. So, earlier what we are doing is that ah we are not taking any conclusive decisions in the earlier cases, but now we want to accept or reject some statements like that whether improvement has happened not happened based on statistical evidence we want to conclude basically ok. So, ah and that is done in in quality everywhere we have to decide. So, control chart also we have to decide ah stable unstable stable unstable scenarios like that for that they are also we are doing hypothesis testing basically. Unknowingly we have not mentioned ah we I have not mentioned that one, but you can also think about that as a hypothesis testing what we are doing with the control limit lines the UCL and NCL like that ok. So, ah over here ah so, how they are defining this ah upper limit and lower limit. So, that is based on confidence interval that alpha level of significance that you have considered earlier also. So, alpha by 2 on this side and alpha by 2 both sided test not equals to condition. So, in this case ah rejection region will be on one side it can be also on the other side over here. So, in this case with alpha by 2 and based on the distribution assumptions over here what we can do is that we can draw the demarcation line like this whatever is dot 48.5 this value can be gathered based on alpha level of significance and what is the distribution ah we have to we can assume over here or what distribution it is falling based on that we can define the upper limits and we can define the lower limits ah for acceptance zone and the rejection zone is beyond that basically rejection zone is beyond that like that and for doing hypothesis testing. So, they will use a test statistic basically they will use a test statistic like that say statistic. So, ah this test statistic ah can be z state test statistics t test statistics like that. So, f f statistics so, so many options are there over here. So, generally z distribution t distribution f distribution these are the things that is generally this type of test statistics are generally used to make a conclusion on mean or variance like that ok. So, what is the formulation of that? So, I have to define the reason and based on that if it goes outside that if any value of the CTQ goes outside that we will accept or reject the statement like that ok. So, ah and whenever we are doing this that I told that there will be an error associated with this this is known as type 1 error which is easy to see also that means if I am if I am defining the zone over here ah what what is possible over here is that although the mean is at mu 0 let us say 50 or something like that over here ok. So, I have defined the zone over here. So, and the values x bar that we are getting can fall anywhere over here anywhere over here ah, but my value has fallen over here. So, I have rejected the null hypothesis and gone by the alternate hypothesis like that, but you can also think about if it is a normal distribution some values will always fall outside the boundary conditions that you have specified with a given level of alpha over here. So, with a given level of alpha y minus alpha by 2 on this side and plus alpha on this side ah or alpha alpha by 2 on this side and alpha by 2 this is a region. So, in that case area over here. So, in this case alpha by 2 alpha by 2. So, in this case ah what can happen is that some values naturally can go outside also some values can naturally go outside also. So, ah that type of error when I make when mu equals to mu 0, but I reject this hypothesis when I reject this hypothesis ah although this x bar what value I have got ah may be natural over here because some values will always go outside this because normal distribution does not touch the axis over here. So, in this case some error I am committing over here this is known as type 1 error. So, alpha is also known as type 1 error over here. So, ah and ah type 1 error talks about rejecting the null when it is true basically although the mu is at mu 0, but I am rejecting this hypothesis over here I am rejecting the null over here that means, ah although I have got a value which is falling outside the ah acceptance zone over here, but that can be completely natural. So, that type of miss misjudgment can happen 5 percent of the time which I have taken alpha value of 5 percent over here or level of significance is considered ah which is my acceptance zone which is my rejection zone that will be defined by what is the value of alpha I have assumed in my experimentation basically ok. So, 5 percent what we are assuming over here most of the time we will find people are taking 95 percent as the ah this acceptance zone for defining the acceptance zone and 5 percent is the rejection zone basically when it is defined. So, it can be one sided or it can be both sided test by whenever I am doing one sided and both sided. So, ah when both sided test we are doing. So, ah area can be on this side rejection region can be on both side if I am doing one sided test on the greater than this one ah greater than condition. So, it can be higher than this side ah and if it is less than condition. So, mu is greater than when I am trying to prove some values 50 over here. So, this is the region. So, this will be 5 percent over here and when I am doing two sided test over here this this will be 2.5 percent on this side this can be 2.5 percent on this side like that. So, one on this side one on this side. So, in this case this will be alpha by 2 on this side and this will be alpha by 2 on this side this is complete alpha on one side. So, this is completely because other side I am not interested. So, all the weightage I will put on one side like that ok. So, all the this type 1 error should be on one side. So, that that is a that is the idea over here. So, in this case type 1 error is something that we reject the null hypothesis when it is true this is very serious we reject that. So, when when we are talking about let us say we are making some drugs and in that case now and void statement is it does not have any effect an alternate statement is it has an effect it has an effect. So, in case I reject the null when it is when it is true that means, the drug is not effective, but I am saying it is effective basically. So, that that is a very serious condition like that. So, type 1 error is very serious like that. So, in this case if I am committing such kind of error and I want to minimize this one how can I minimize this one increase the sample size increase the sample size that is mathematically we can show that if we increase the sample size in that case what will happen is that your judgment will be fine because you see apple tree more apples you take in that case judgment will always be quite close to the population average like that. So, one easiest way so, how you can reduce the type 1 error over here in hypothesis testing you increase the sample size, but you cannot increase so much that it is it should be cost effective also. So, there is a optimal sample size calculation also when we are doing hypothesis testing when we are doing hypothesis testing like that. So, if I if I can demarcate like that there are two kinds of error that can happen in hypothesis testing. So, you can see some other some other lectures on hypothesis testing which is easily available in any or any web course where you can NPTEL web course. So, where we have already given about details of hypothesis testing like that or any any lectures which is which you can see from YouTube's or some other where detail explanation of type 1 and type 2 error is given. I have given a brief idea that type 1 error is more serious and and it it talks about rejecting the null when it is true basically void condition I am rejecting although it is true. That means, there is no improvement, but I am saying that there is improvement like that. So, that is more serious in my opinion ok. So, then there is a type 2 error also I fail to reject that means, I accept the null over here. So, although the rug is effective in that case I say no the rug is not effective like that. So, that is the type 2 error that we can commit over here. When the hypothesis is false we have to go for the alternate, but we have not gone because the data does not show like that and we have made a mis mis judgment over here we have also taken a mis or decision is incorrect like that. So, that can also go wrong. So, I fail to reject the null when it is when it is false. So, that is known as beta error over here one is alpha error one is beta error over here alpha error is what we should be concerned about ok while doing hypothesis testing like that. So, to reduce the beta error or a type 2 error over here again the same same and type 2 error is not easy to calculate because of where the mean has shifted that information is required. So, what I will suggest is that we define the alpha and based on that we do the test we want to minimize this one if you have to minimize this one and you increase the sample size that is the simplest way. So, more and more sample size alpha and beta both can be reduced like that ok. So, that is the what we do in hypothesis testing basically ok one is type 1 error please remember one is type 1 error and type 2 error. So, alpha based on alpha we make a judgment basically when we and we increase the sample size to reduce the beta error there ok. So, and that is the one way we we can we can go about it ok. So, hypothesis testing is all about testing some statement whether it is. So, one will be null statement one will be alternate statements like that ok. So, first important test that we can think of in hypothesis testing is one sample Z test over here ok one sample Z test. So, one example is taken where sigma is known over here sigma is known and some consideration is is information last time also we have seen. So, for defining the confidence interval what we have taken is that some sample observation was taken over here and this example is taken from Montgomery again ok. So, this data information gives me a in descriptive statistics what we have seen is that mean value of 5.5 to 4 5 to 5 the sample observation what what observations we have based on that X bar was calculated and S standard deviation was calculated like that and standard error is nothing, but X by X square root of n like that or sigma by root n basically sigma by root n over here. So, and bound condition can also be defined. So, this is also defined lower bound like that and this is the 95 percent confidence interval that we have considered over here and now we want to test. So, known standard deviation it is saying that there is a known standard deviation of this value over here. Now I want to test. So, whenever variance is known there can be scenarios where variance is known population variance basically population variance is known and population variance is unknown. So, in this case population variance is given over here and I want to test the null hypothesis mean equals to this one or it is greater than. So, one sided test one sided test I want to do basically over here ok. So, and in this case I want to test that whether the mean is equals to 50 or mean is differing from 50 like that because I am interested in characteristics that means the outcome over here concentration should be close to 0.5 that may be the target what is given by the customer like that it should be close to target and I want to show that statistically to the customer like that. So, and this dataset so, one test statistics is used test statistic is used over here. So, this is the test statistics that is used to confirm whether the whether I will go for this statement or I will go for this statement like that. So, x bar can be calculated over here and hypothetical mean is over here as 0.5 and this sigma is also given which is which is also mentioned over here. So, this information will go over here and any information is also given because number of observations also is known to me number of observations is also known to me over here. And and based on that I have to make a conclusion based on the z value I have to make a conclusion over here whether I will go for this statement or whether this statement is true like that ok. How do we do it in Minitab? So, I will show you that one and before hand I will also like to say that the z test is used assuming that the variable for a normal distribution. Although this is robust because I am making a z conversion over here from the mean value. So, we expect that the test statistics will work over here and in this case I am making a z conversion over here. So, x bar is converted into z and I am using a standard normal distribution to get that one, but underlying assumption is that this is not highly skewed this data is not highly skewed and it follows normal distribution then we are assuming that that to be true when we are applying z test over here ok. Let us try to see assuming that this is true and we do the one sample z test over here to see whether it is mu equals to 0.5 or mu not equals to 0.5 like that. So, then for that doing that test what is what is required is that I need to see some other important aspects like what I mentioned is p value that that we have to see taken note of that. So, now this is the data set that we are having. So, concentration is the data set that I am using over here. So, what I will do is that I will go to statistics and then what I will do is that basic statistics and then I will go to one sample z I will go to one sample z assuming that it follows normal distribution over here. So, in this case I will take concentration over here and known standard deviation is given over here as if I go back to the 3.3486 that is the standard deviation that is given over here and I have mentioned that one over here. So, I have I have just typed this one. So, perform hypothesis testing this I have clicked over here and hypothesis that is that we want to test is whether it is 0.5 or whether it is defined from 0.5 like that. So, I have I have just ticked this one. So, options may not be there. So, you have to click this one and then write over here 0.5 0.5 like that. So, when you do that then you go to the options and the confidence level to define the confidence interval this is the confidence interval 95 percent. So, the level of confidence confidence level is 95 percent means 5 percent is the our that is level of significance that is the level of significance over here. So, I want to test whether mean is greater than condition. So, mean not equals to is given over here less than is given over here mean is greater than. So, I want to test mean is greater than one sided test I want to do and with a alpha level of significance considering over here as 5 percent like that I click ok and graph what you can do is that you can see box plot of this data like that and histogram is also possible. So, over here and I click ok and when I do that what I see is that this test was performed over here null and the corresponding z value that is the z statistics was calculated. So, this z value what you see this z value was calculated over here and that is reported by Minitab in this analysis that is reported as 0.52. But whether to accept null or to reject null we will make a decision based on this p value and let us remember the p value is 0.301 over here ok. So, if the p value is more than 0.05 you can note it down if the p value is more than 0.05 in that case we cannot reject the null hypothesis. So, only if p value is less than 0.05 then we will because 5 percent is the level of significance we are considering if the p value is less than 0.05 then we will reject the null hypothesis and we will go by the alternate hypothesis over here. As the p value is more than 0.05 so, we cannot reject that one that means concentration is not different from 0.5 basically over here. So, in this case we will accept the null hypothesis and we will say we do not have evidence to prove that mu is very different from 0.5 statistically we cannot prove that. So, we will go by the statement null hypothesis over here. So, this is one thing that we have to note, but also we have to note that before we do this test set test what is required is that this data should be normal distributed like that. Now, while this data set is taken from example is taken from Montgomery over here, but we can also test whether it is normal or not. So, in this case what we will do is that just to cross check, but I am I am saying that this set test is quite robust ok. Even if some small deviation happens that will not affect the outcome of the test, but we have alternatives also over here to do something more over here. So, what we can do is that we can go to stat and what we can do is that in control chart I told there is a transformation that is possible. But first let us try to see whether it is normal or not normal. Basic statistics I will go to normality test and in normality test what we will do is that we will take concentration over here and I will do Anderson-Darling test. So, when I do Anderson-Darling test what happens is that what we see is that p value is less than 0.05 over here. When p value is less than 0.05 this data is not normal. When this data is not normal it will deviate from this line central line that you see straight line that you see. So, deviation has happened. So, it is non-normal scenarios that we are facing with this data set over here. So, if it is non-normal, if it is non-normal in that case what we can do is that we can transform the data. So, what was done is that in this case. So, I can transform the data. How do I transform the data? I go to control chart, I go for box-cops transformation over here. So, what I did is that box-cops transformation on this data set. So, highlighted the data subgroup size is 1 over here. So, in this case options where I should save I will save in C 2 like that. So, I have done C 2 over here and I click ok and when I do that ok what happens is that this graph comes over here. So, this is the box-cops transformation that is recommended over here and the recommended value that you can see is 0.5. When recommended y to the power lambda is 0.5 that represents square root transformation basically that represents square root transformation. So, what was done is that one square root transformation was done over here. This can also be done by using other options over here that means, calculator and calculator we can use and the columns where I where I want to save. So, this can be done in calculators over here. So, that is also possible, but this is the square root transform value over here. So, this is the square root transform value over here. So, then what you can do this is a transform value. So, then what we can do is that we can we can do the same Z statistics one sample on the transform data basically. So, I will do one sample Z. So, in this case what I will do is that transform data I will consider over here and instead of 0.34 I will take square root transformation over here. So, if I go to calculators over here 0.3486 or something that the value that we have considered over here. So, this can be I can take square root transformation over here. So, let me just cross check what was the value 3486. So, 3486 sorry this needs to be corrected over here. So, 0.3486 over here and I take a square root value of this. So, 0.59 approximately equals to 0.59 with the standard deviation and let us do 0.59. So, what is required is that I will type over here as 0.59 that is the square root over here and perform hypothesis 0.5 has to be also converted. So, what is the corresponding value 0.5 square root of that 0.5 if I take and then take a square root of this and the value is 0.707. So, what I will do is that point I want to test 0.7 something like that. So, this is the converted values. So, I am doing the same test only I have done the conversion of the CTQs basically over here. So, when you when you do the options over here and do the testing over here the analysis p value is again you see the p value when I am let me just copy this one. So, if I copy this as a picture over here and I paste it in let us say excel. So, we can do that and we can enlarge the image like that. So, if I am doing that let us try to see what happens. So, this may be quite small window what we are seeing. So, what I will do is that I will just show you what is the results that we are getting over here. So, I will open a blank sheet over here and I will copy paste this one. So, and I will enlarge this one. So, you can see that z statistics is coming out to be this value and p value is coming out to be 0.601. So, and we are doing one sided test over here. So, one sided test when we are doing one sided test what we have given is that in this hypothesis testing I have mentioned that options it should be greater than condition. So, this you have to remember what you are doing one sided test or both sided test like that. So, if you click ok you will get the results. So, even after transformation what is happening is that I am going with the same decision. So, I have not transformed and the decision was go for H naught and if I am transforming also I am going for H naught like that ok. If transformation works if transformation works. So, in case transformation does not work then we will see what is to be done in our next next session like that. If transformation does not work what is the options that we have that we will see in the next sessions like that ok. So, thank you for listening. We will continue from here and we will see some more examples of hypothesis testing ok. Thank you.