 Welcome to session 19 of our course on Quality Control and Improvement with Minitab. So, I am Professor Indrajit Mukherjee from Shailesh Jameethal School of Management at Bombay. So, earlier session what we have done is that we have talked about hypothesis testing and in that case ah how to compare the ah sample observations with the population ah mean ah some values whether we can compare that one we we can make a judgment out of that that means ah whether the mean is this value or not ah is it equal or not equals to conditions like that that is the simplest way we can explain hypothesis testing. And we have used a Z test and we have used a T test ah condition to ah satisfy that one and we have also seen that if underlying assumptions of normal distribution fails in that case what is to be done what conversion we can do like box cox transformation we have seen and then we do the hypothesis testing or the converted data like that or otherwise we go for non-parametric test that we also seen like that and ah that considers median and ranking ah concept and then based on that we get the p values and we make interpretation p is less than 0.05 ah we go for the alternate hypothesis and p if it is if it is more than 0.05 less than equals if it is greater than that one. So, in that case we cannot reject the null. So, that is the condition based on which we make analysis and so ah now what we will do is that we will go to a specific test which is known as two sample T test which is very relevant to our quality concept and that can be considered as the starting point of experimentation ok. So, ah this concept I will give over here the statistical concept that is used to to do this testing and how it is to be done in MINITAB like that. So, here ah this problem that I am highlighting over here is that ah there are two different catalysts that is used ah and then ah in a chemical process and in that case the experimenter is interested in to know that whether to use catalyst A or catalyst B which improves the yield basically. So, I have a data observation over here ah which is from catalyst A when I have used catalyst A what was the yield. So, this is the percentage yield that was reported like that and this is catalyst B which was ah when I have used catalyst B and the sample observations that I have and what is the yield percentage that is measured over here. I want to check whether the in population this mu 1 average that we are getting over here and mu 2 average whether they are same whether they are different like that. So, I have x 1 observations over here x 1 bar ah A observations let us say and x B bar of this is the average value that we are getting out of the samples over here. We can have standard deviation of A we can also get standard deviation of B ah that information also we can get from this data observation that we have ok. Now, I have to I have to make a judgment whether my null hypothesis will be over here mu 1 equals to mu 2 or mu 1 not equals to mu 2 if I am considering both sided test like that ok. So, null hypothesis that was considered is mu 1 and mu 2 mu 1 not equals to mu 2 that is the other condition that I am considering over here. So, so while doing this test so what we do is that we use as a we use a specific test statistic that is known as t a t test statistic over here. So, we will make a calculated value that will be calculated this is t 0 what you see over here and that will be calculated based on this average value what you see. So, it is written as y 1 over here we can write y A over here we can write y B over here instead of x A and x B we can write y A and y B like that this is the outcome. So, I want to maximize my yield over here which catalyst we use that is the that is the thing I want to analyze over here. So, one is A and one is B. So, so if the if this difference is significantly different so y 1 y A bar and y B bar so if this this difference and this difference is close to 0 we can assume if they are close in that case they are not much different. So, my analysis should show the result should come out that hypothesis testing whenever I am doing that if it is close to 0 in that case we expect that the null hypothesis cannot be rejected like that if it is very different from 0 what is expected is that we should reject the null hypothesis like that ok. So, that is the basic interpretation I am trying to make over here, but the test statistics that is used over here this difference is calculated and then a pool standard deviation is calculated and this n 1 information is number of observation over here and n 2 information is number of observation that is for catalyst B like that. So, we can write it as N A and N B like that number of observation while doing with catalyst A and number of observations that we have taken or sample observation that we have taken for catalyst B this can be same this can be different here it is same, but it can be different also and then we calculate a pool standard deviation for the for the analysis over here which is given as N 1 or N A minus 1 I can consider SA over here and this can be N B and SP like this square of this this is N A N B like that minus 2 like that. So, we can calculate this pool standard deviation because all information is known over here this pool standard deviation will be placed over here and based on that we will calculate a t 0 value which is known as we can we can also say t calculated value like that. Then t tabulated value can also be seen because the degree of freedom is given over here and based on that and alpha level of significance for any hypothesis testing we have to assume some alpha level and we are assuming let us say 0.5 or 5 percent is the level of significance that we are assuming over here and in this case what we can do is that with a given level of alpha and degree of freedom over here that is N 1 plus N A plus N B minus 2 over here what we can do is that we can get the value tabulated calculated value of t 0 over here and this can be compared with the tabulated value like that which depends on alpha and degree of freedom. So, that can be compared and based on that we can make a judgment. So, that was the old way and nowadays what we are doing is that we are seeing the p-value what I told because of software interface we are getting p-values any of the software will report the p-values and based on that we can reject or accept the analysis like that based on the outcomes of p-values like that ok. So, here also when I am doing these two sample details to compare the means over here mu 1 equals to mu 2 or mu 1 not equals to mu 2 some assumptions are there. So, over here what is the assumption basic assumption is that the data that is coming from catalyst A and catalyst B both should follow normal distribution like that. So, normality assumption is there. So, normal assumptions is required for each individual observations that the each individual data says that we are getting. So, this should follow normal distribution this should also follow normal distribution that is the and this is coming with mu 1 as the process average this may be mu 2 or we can say mu A and mu B like that that also can be said like that. So, they are coming from normal distribution and also we are assuming that the data are independent. So, set of data over here with catalyst A that we have we have collected over here has nothing to do with the data set that is in catalyst B like that they are independent with each other. So, there is no correlation that should exist between data set A and data set B like that or catalyst A with catalyst A and catalyst B like that. So, that check is also needs to be ensured over here and then what we can do is the third check that is required is that whether the variance over here. So, the SA that if we have calculated over here can we estimate whether the variance of A is same as variance of B like that. So, this will dictate what type of T statistics that we will use T statistics that we will use and in case they are same one T statistics will be used if they are different another T statistics will be used with a given degree of freedom like that. So, that is given condition and you can see any books to understand the basics on these two sample T test and the statistical that is T statistics that is used like that. So, degree of freedom will change in case this varies like that. So, in case this is not this is not same in that case some different test with different degree of freedom is required like that. And so, for that we need to do this check. So, normality one independent independent testing that the data sets are independent and the variance is same or not that we will also considered over here. So, this data set I have in mini tab. So, I will try to do the test as per the requirements like that. So, what I will do is that here you see that data set is in C4 and C5 column what I will do is that I will go to stat and basic stat over here I want to check normality first. So, whether I want to show whether catalyst A is normal or not data set that I am having I am using Anderson Darling test again for the data set A and I will click ok over here. So, when I give ok what I see is that the p value that is reported over here is around 0.516. So, data seems to be normal. So, there is no problem. So, I can close this one. And similarly I can do for second for normality test for data set B catalyst B over here and I do the same testing over here and then for catalyst B what I get is that again I get a p value which is more than 0.05. So, here also normality assumption is not violated and then what we can do is that whether they are independent or not catalyst A and catalyst B what we can do is that we can see the correlation coefficient over here. So, correlation can be checked over here for catalyst A and catalyst B over here and options is that I can use Pearson correlation and then results what I can do is that correlation matrix. So, this can be reported over here and I click ok. So, in this case what we get is that near perfect relationship what we are getting over here. So, correlation of this data set sorry this analysis over here I have done some mistakes. So, I will change this correlation over here. So, catalyst A and catalyst B that is that is of important for us when I do that one what I see is that this is our coefficient is over here. So, these values are 0.3 approximately 0.3 negative negative correlation what it is showing and maybe p value we can we can also check the p values for this. So, we can go to stat and again basic statistics like that correlation analysis over here. So, options what we can use that store correlation matrix like that and graphs results. So, we can pair wise correlation matrix and then we can see we can give and then what we can see is that p value will be reported over here. So, if you see the p values over here and I am copying as a picture and I will paste it let us say in excel so that is visible also. So, let me just paste this one to show you what is the correlation that is coming out to be between catalyst A and catalyst B. So, let me just click this one and let us paste the whatever information we have got. So, over here what you see is that p value is greater than 0.05. So, that means catalyst A and catalyst B there is no significant correlation that exists between catalyst A and the data set that I have got in sample 1 and sample 2. So, they are not so, correlation also can be checked by Pearson correlation we have discussed like that and that whether they are significantly correlated or they are not significantly correlated that also can be seen with a p value that is reported over here and it says that p value is more than 0.05 will indicate that there is no statistically significant correlation that exists. If it is less than 0.05 significant correlation will will be then we can say that there is a significant correlation. So, data seems to be independent over here so, the second condition also holds. So, in this case third condition what we have to do is that whether the variance is same or not. So, I will do a two variance test like that two variance test over here. So, both samples are in one column no both samples are in different columns. So, sample number 1 I will give catalyst A sample as catalyst B. So, you can change this order like that sample 1 sample 2 there is no you can put it any anywhere you can put catalyst A and catalyst B like that. So, then you go to options over here. So, because I have checked normality so, I will use test and confidence interval based on normal distribution over here. So, what I want to check whether the variance is same like that. So, I can use variance testing over here I can also use standard standard deviation testing over here I am using variance test let us say. So, in this case whether the ratio whether the variance is same or not same that I want to check over here. So, ratio not equals to hypothesis ratio over here. So, whether there so, both sided test I am doing over here is assuming normal distribution. So, in this case if I give and then what I can see is that I can see summary plots like that I can also see results like that all the results are given over here. So, this is by default and I click over here and what I get is that I will get a F statistics over here F statistic and correspondingly I will get a here I can copy this as an image and then I can paste it over here. So, I can remove this one now and I can show you this one what is the results outcome over here. So, if you see the results over here F statistic was used over here method is F that is mentioned over here and the statistic value is 0.64 and in this case if the ratio is very close to 1 I am checking whether the variance is or sigma 1 square by sigma 2 square is equals to ratio is equals to 1 not equals to 1. So, S 1 square by S 2 square that ratio will be taken and based on that what what comes out to is 0.64 which is close to 1 basically and that is confirmed also that p value is not significant over here. So, if p value is not significant that means the variance is same basically. When variance is same I will go for that condition while doing the hypothesis testing or two sample details like that. So, in this case in catalyst what we have seen is that it follows individual follows normal then their data is independent like that and also the variance of A and variance of B are same in population basically. So, then what I will do is that stat I will go to stat basic stat and two sample details now I will apply the two sample details over here. So, when I apply two sample details like that it will ask are they in one column or they are in different column they are in one column we have mentioned. So, all are in different columns. So, catalyst A and catalyst B we are trying to do and then you go to options and then you assume equal variance. So, this you have to click over here assume equal variance 95 percent is the confidence band that we are using every time. So, that is by default and difference not equals to condition we want to check catalyst A or catalyst B whichever is giving me is the means are different or not. So, both sided tests I am doing over here. So, in this case I will give ok and then box plot can also be seen over here box plot over here and I click ok. What will happen is that I will get some values over here and I will get a corresponding values over here. So, if I copy this one and then in that case what will happen is that I will I will I can paste this one over here. So, this is already done. So, we can just enhance this visibility for enhancing the visibility. So, in this case when we have done the two sample T test what what the value that we are getting is 0.729 is the P value and that is more than what is what is recommended value of 0.05. So, I cannot reject the null hypothesis. So, in this case mu 1 equals to mu 2 that means when I am using catalyst A and catalyst B means are not differing much statistically basically ok. So, whichever catalyst you use the overall means remain same. So, in this case whatever improvement maybe with a new catalyst that you have developed that is B is not giving you higher higher yield as compared to catalyst A. So, we can retain the catalyst A like that until unless we have enough evidence that this is effective we will catalyst some new catalyst is effective we will not not go ahead and say that we will not make claims like that catalyst B is effective as compared to A. So, this type of testing like a drug is effective or not like that. So, with the original drug we do the testing and we calculate what is the effectiveness of the drug with some measures like that and with a second drug which is the new drug that you have developed whether it is giving you better efficiency as compared to drug A like that that can be tested using this type of two sample T test like that ok. So, what is required when I am doing a two sample T test is that all these assumptions needs to be satisfied over here all these assumptions needs to be satisfied and, but if this assumption fails if this assumption fails one option is that conversion of the data set like what we have mentioned like that we convert it to normality and then do the testing like that and both the data set has to be converted. So, in that case you have to keep in mind like that because to make a fair comparison between the two data set like that ok. So, and another option is that another option is that non-parametric testing. So, I have a non-parametric option which is known as man witness test in case you do not want to go about the because the assumption fails. So, I want to assure big, but this test are so robust I can assure you that even if some deviation happens small deviation to moderate deviation happens in that case the conclusion will be more or less same with non-parametric like that. So, I will use a man witness test which is provided by Minitab also given in Minitab and that is the recommended one when when the assumption fails and you do not want to go with assumptions. So, in that case I will go directly to non-parametric testing over here. So, man witness test I will do and in this case median again the median whenever I am going for non-parametric median value is recommended as compared to mean over here. So, first sample is in catalyst A and second sample is catalyst B. Let us assume that a distribution phase assumption phase I am I am checking not equals to condition over here. So, I am doing and if I click ok over here. So, what will happen is that you will get a W statistics over here and you will get a value of P value over here. So, that will be reported like that here also P value is reported. So, here also we will get some P values like that ok. W is the statistics that is man witness statistics that we are we are we are getting over here. And this P value indicates that P is more than 0.05. So, indicates that medians are not different over here. So, these are the difference of medians like that medians are not different over here. So, basically conclusion is that both the drugs are equally effective like that there is no one cannot give higher yield as compared to the other one. So, percentage yield improvement is not much significant statistically significant what we have done. So, catalyst A yield is same as in population is same as catalyst B. So, with one sample information let us say over here what we have. So, we have 1, 2, 3, 4, 5, 6, 7, 8 observations like that for a for a given catalyst and another 8 observations on this side. So, at the degree of freedom you see 14. So, N 1 plus N 2 minus 2 that will be 14 like that. So, that degree of freedom is used over here and the corresponding P value says that there is no difference between catalyst A and catalyst B like that. So, we can have different examples over here like this is one example taken again from Montgomery's design of experiment book where some cement formulation is checked. So, one is modified one and one is unmodified. Unmodified is the original formulation. Modified is the new formulation of cement that is there and we want to check and do the two sample details and want to confirm that whether they are different or whether they are same like that. So, what we will do is that again we will do the basic statistics testing over here. So, I want to check two sided test let us say. So, in this case what I will do is that first I will check the normality. So, I will check for modified mortar is it normal. So, and the data set says that when I do the first testing over here P value is 0.388. So, it is normal. So, second data set what I will do is that second we will do the same normality testing and instead of modified I will use unmodified over here and check the normality ah measures over here and what I am observing over here P value is 0.9 approximately that also indicates that unmodified mortar data that we have is also in population will be normal and then what we can do is that ah. So, first assumption is gone then second assumption what we can do is that basic statistics in that case correlation we can check. So, modified one and unmodified one. So, I will click ok over here and I will do the P value that is coming over here and the P value observed is 0.554. So, I am not converting I am not taking the. So, ah this also indicates that modified and unmodified the correlation is about 0.2. So, I told you that thumb thumb rule is that approximately more than 0.7 will be significantly different significant correlation exist like that, but we have a P value P value testing for that. So, we have a P value hypothesis testing measures over here. So, that will confirm that there is no ah correlation between the two ah data set that we are having modified and unmodified they are independent data observation over here ok. So, ah the final testing that we have to do is that whether they are same whether the variance is same or not. So, I do a two variance test and ah I will go for this two variance test over here. So, one is modified one is unmodified. So, I will go to options over here 95 percent confidence level we will take over here and this is sample one variance and sample two variance. So, I am doing two sided test and I will click this option because normality assumptions we are making and we have also cross check that one. So, I will click ok and then I will click ok over here. So, what will happen is that I will get a variance test over here and F test indicates ah that there is no difference. So, F test if I can take this one and I take it to excel. So, ah the values that we are getting over here is F test for the modified and unmodified cement that means variance testing that we are doing for the two dataset. So, p value is 0.478 and that indicates that basically the variance is same or the ratio of variance is close to one and that is not different from one like that. So, that can be confirmed from here and so, we can go. So, every check is done over here. So, first test second test and third test is satisfactory then what we can do is that we go to basic statistics two sample t test and then one want to check whether modified cement is ah different from unmodified one. So, we can also do one sided test like that. So, if I change the condition we can do which is higher than which one. So, ah that if it is different that can also be checked by the results itself. So, if I take equals to condition also we can we can do that. So, ah if they are significantly different like that. So, in this case what we will do I will click ok and after the ah test statistic what we are getting over here. So, this can be seen same test statistics what we have got over here. So, this is ah catalyst one and catalyst two and for this what we are getting is that ah let us go back to this analysis over here. So, this I can copy this and place it over here. So, let me place it over here and we can see ah whether the modified and unmodified are different or not. So, here what we see is that p value is 0.042 that means what p value is less than 0.05 that means there is a difference between these two values that we are we are getting over here. So, that means one mean is different from the other means. So, modified motor and unmodified motor formulation. So, let us go to the mean which is different from which one. So, let us go to this and we have confirmed that there is significant difference between these two. So, we can place it like that and paste it over here and we can see ah enlarge this one and see which is different from which one. So, we have seen that p value indicates that they are different. Now, modified motor is giving me a strength of 16.76 and unmodified is giving me a mean strength of 17.04 like that. So, unmodified is giving you at strength as compared to modified motor over here and they are significantly different statistically over here ok. So, if you have to choose between modified motor and unmodified motor I will because strength over here what is measured in this data set ah ok. So, ah tensile strength let us say tensile strength is measured over here which is given condition. So, ah ah so, in this case whichever strength is higher I will go for that. So, unmodified where the results indicate that unmodified is giving me average which is higher than the modified one and they are statistically different basically what which was proved by this p value over here. So, if you have to implement which formulation to be adopted in this case I will go for the unmodified or old formulation like that. I will not go for the modified formulation or the motor that is next one because that is giving me lower strength as compared to the unmodified motor. So, in a population at population level what we can see is that unmodified one is giving me higher strength as compared to modified motor over here ok. So, that is the physical interpretation we can take out of this. We can take another example like what is given over here arsenic content in phonics whether it is different from Arizona arsenic content like that is another example where we can see whether ah which test to adopt like that. So, in this case what we can see is that whether the ah basic assumptions holds true or not. So, in this case we will go for phonics and let us try to do the testing over here. So, it is just on the line that means ah you see p value is exactly equals to 0.05. So, we may consider that this is ah not greater than this one. So, ah this assumptions holds like that if I go to arsenic in ah other other condition like that. So, I will go to basic studies normal normalty test and in this case I will take the arsenic in Arizona and try to do this test over here. So, here also we see that the conditions are satisfactory over here and we can we can we can assume that normality assumption is not violated as such. So, then we go to the basic statistics and correlation we can check between these two data set ah arsenic one and arsenic of Arizona phonics and Arizona then we can test the p values over here and what we see over here p value is also not significant that means there is no high correlation between these two data set that means there independent data set what we are using the p value is coming out to be I am not pasting it into our. So, otherwise we can we can just confirm that one and past it over here. So, if I if I remove this observation from here and I see that is on our case and the this Pearson correlation which is coming out to be 0.535. So, this Pearson correlation is coming out to be 0.535 p value is more than 0.05. So, nothing significant correlation that and it is correlation you can see is minus 224 and anything more than 0.7 I told should be significant, but here minus 224. So, in this case it it should not come out to be significant over here. So, that is what is reflected over here in the p values of 0.535 that is the value we are getting. So, what we have seen is that that whenever I am considering over here these examples. So, they are not different statistic this is independent. So, in that case the final test what we have to do is that all checks we have done. So, two sample t test we will do and in this case only data set I will change. So, I will just take arsenic and then second one arsenic and Arizona. So, in options what we can do is that assume equivalence that condition is satisfactory 95 percent that I have taken the difference should be equals to 0 and not equals to condition will be the what I want to test like that. So, whether they are same or they are different and graphically we can see the box plot like that and I click ok over here all the and in this case what happens is that you can see the box plot over here which will give you some idea about the data set. So, in this case what you see this is the average value this is the average value what you get. So, mean value is 12.9 over here and the mean over here what you see is around 27.5. So, there is a huge difference of slope you can understand that there is a huge slope between these two values over here. So, this should be prominent that means there is a significant difference between location of the mean of this and this over here and the median value is also we can see a significant difference exist between these two. So, our hypothesis testing should be able to identify this difference and it should reflect that there is a significant difference between the arsenic content over here as compared to the Arizona content that we are getting mean values that we are getting over here. Box plot also reflects that fact. So, to prove that one we have to go to the what is the value that we have got in hypothesis testing p values over here. So, I am just copy pasting the p value analysis or hypothesis testing final hypothesis testing that we have results that we have got and I will paste it over here and I will just enhance this one so that you can see. And what you see over here is the p value is 0.014 that is less than 0.05 that indicates that and the t value is around minus 2.73 and this is not close to 0 this is not close to 0. So, in that case our basic interpretation this p value is less than 0.05. So, in that case what we can assure is that these two values are statistically different these two values are statistically different and in that case you need to be cautious when it is content of arsenic. So, arsenic content over here you see 27. So, Arizona it is about 27 mean average value. So, it is much higher as compared to the as compared to phonics. So, there is a statistical difference that exists between these two. So, this is what we wanted to emphasize this is two sample details where I can I can just compare before improvement and after improvement. So, before we have implemented some some measures before we have done anything. So, what is the existing scenario? So, in quality what happens is that we try to see existing scenario and then we do some improvement and then prove that this improvement has real effects and that is statistically different like that. So, first phase of experimentation is that at one condition is it different from a second condition like that. So, as second condition improve the improve the CTQ values like that whatever is my target values like that whether it is whether any improvement has happened or not. So, to prove improvement happened or not happened what is required is that we need to do this type of hypothesis testing and two sample t-test is the most important hypothesis testing which is used in quality basically. But that may be the starting point, but we need to know two sample t-test and there are many other many other lectures where you can see two sample t-test like that and this is the way we do it in Minitab and the interpretation I have already told. So, up to this point so, we will discuss about pair t-test also after this which is also very relevant where to use two sample t-test, where to use pair t-test like that that difference we should know and based on that we can go ahead with the experimentation where we do real improvements like that. So, we will discuss about analysis of variance concept like that which is the fundamental or pillars of design of experiments like that. We will stop over here and we will continue with pair t-test in our next lecture. Thank you.