 To session 16 on Quality Control and Improvement with Minitab, I am Professor Indrajit Mukherjee from Shailesh J. Mehta School of Management, Diary to Bombay. So, in the previous session, what we are doing is that we are giving some basic ideas on statistics which will be used for quality control and improvement. So, I will only touch the relevant part which is required for our course and we will not go into much details about because that can be you already have a sense of that in statistical course or any other course on basic statistics like that ok. But I will give you hints what we are doing over here for in quality control is that we take samples let us say machine outputs are coming and in that case what we do is that infinite populations we can assume for that and in that case we take some sample and that will be giving us some snapshot about the process and based on that what we do is that we we try to say that this is the process capability and in short term process capability this is the long term process capability. So, because I do not have population information in that case some samples we or reasonable samples we collect and based on that we try to infer about the population. So, over here whenever I am taking a sample I estimate some values over here that is statistic what we call maybe average value what we are interested into and maybe also we are interested into standard deviation of the samples that we have taken and based on that we want to predict what will be the mu of the population and what will be the sigma of the population that means standard deviation of the population like that. So, this x bar s is known as statistic this s bar s is known as statistic and this statistic can also can follow certain distributions like that this can also follow certain distribution like individual this if we assume this is the target population and these are the random variables x that can follow normal distribution over here and over here statistics statistic that values that we are taken also also can follow our average value that we select because if I take a different sample it will be a different x bar that we get out of the population like that. So, next time I go to the process I take some 10 samples like that I will get a different estimation of x bar like that. So, x bar every time we calculate what we will get is that every time it will be different. So, x bar can also be considered as a random variable because initial ah assessment initial ah x bar initial x ah values that we have selected and that are selected based on randomness that means we have assumed that there is no bias in the sample selection like that ok. So, every time I selecting samples so, that is giving me a value of x bar n s and this values can change if I change the samples like that if I continuously take more samples it will be different. So, this is known as statistic that we are that we are measuring over here and this statistic is also a random variable this statistic is also a random variable over here and so, they they says I want to why I am doing this because I want to predict what is the behavior of the population because I do not have access to population. So, I want to predict that one what will be the behavior of population. So, one of the important statistical concept is over point estimation of the statistic that we are doing. So, if I have 4 samples like that let us consider a apple tree in that case you have taken 4 samples and based on that you calculate the average and you say this average I am expecting the population tree it it is unlikely that it will be exactly equals to that, but this is the estimation I can make and I have I have taken the sample unbiasedly. So, in that case I am predicting and we say that this is a point estimate of the population mu population mu over here. So, in this case although it cannot be exactly equals to mu because it is rare rarest possibility that the average that you have calculated will be exactly equals to mu ok, but I make an estimation over here. So, this estimation what we are making about the we are trying to predict the population mean over here. So, one of the estimation is point estimation that we are making and we are just saying that this x bar will be representative of the population mu like that ok. So, this is known as point estimation. So, when I am making point estimation about the population parameter. So, here x bar is calculated. So, it is calculated as one of the observation is 25, 30, 29 and 34. So, average of this this is the point estimation that we are getting for the population mu over here. So, estimation parameter mu over here ok. Now, this population parameter will keep on change like that it will it will differ because every time I take a samples it will be different. So, this will also be a random variable this will also be a random variable and this population this sample parameters can if it is random variable it will follow certain distribution it can follow certain distribution or maybe popular distribution like normal distributions like that ok. So, any other distribution we can think of. So, this statistic can follow certain sampling distribution. So, probability distribution of a statistic is known as sampling distribution because every time I am taking a samples and based on that x bar let us say every time I am calculating and x bar will follow certain distributions like that ok. So, that is the idea of sampling distribution. So, I am doing sampling distribution and also one important theory that comes into that is also important over here which is known as law of large number that means if you have large number of observations and mean of the samples gets closer and closer to the population mean like that. So, overall mean of the population if you take every time some mean some and you keep on drawing the samples like that it will happen that this fluctuation over here. So, initial samples that you have taken and the fluctuation will be converging to the population mean like that. So, more and more you take independent observations, average values and average will tend towards the population average like that. So, that is known as law of large number ok. And when we do sampling distribution of the means over here that is when we calculate the sample when we try to determine the probability distribution of the sample means statistician has given us a formulation over here it says that x bar will follow normal over here with mean will be same as original x follows let us say normal over here and with mu and sigma over here mu and sigma square over here. So, x bar will follow normal with mu and sigma by root n like that. So, this is and this will be sigma over here if you assume over here say x will follow normal with mean and sigma this is proved like that. So, over here what happens is that distribution is somewhat more we can say more closer as compared to the original x populations like that this or the variability of the variability of the statistic will decrease because here we are taking a factor of square root of n in the denominator like that. So, earlier this is sigma. So, sigma will reduce and sigma will be for the sampling distribution it will be sigma by root n over here. So, this is one of the things that we have to consider when we are analyzing afterwards using statistics like that sigma by root n is the standard deviation estimated standard deviation of the sample statistics ok sample statistic which is over here average value that means, x bar. So, x bar follows normal with mean remain same and only sigma changes to the sigma by root n like that because I am taking average. So, I am smoothing it out in that case fluctuations will be less variability will be less. So, sigma by root n is the formulation that statistician has given us ok and central limit theorem also is another important concept that is used that means, a underlying x variables can be of any types of follow any types of distribution, but if you take the average like sample statistics over here average over here is considered. So, in that case what will happen is that that will converge to normal these normal distributions like that it can be of any distribution it can be proved like that as you increase the sample numbers and you take the average and then try to plot the average what will happen is that the distribution of the average will follow normal like that. So, that is why what you see in quality is that we take more number of observations to calculate the average and then we assume that average will tend towards normal like that theory says like that way ok. So, like in control chart we take subgroup size what we have seen like that. So, five subgroups and I take the average and we plot the average like that and we assume the average follows normal distribution. So, basic underlying idea over here is the central limit theorem that we can consider over here ok and then another important concept that is also required when we move forward is known as confidence interval over here. So, confidence interval idea says that exactly I cannot heat whenever I am calculating some samples from the apple tree and I am trying to predict the population average values of the weight of the apples like that. So, in that case what will happen is that I can only do destructive testing or take 4 or 5 apples and I cannot do it several times like that that is uneconomical for me. So, in that case what I will do is that I will take one samples and based on that I would like to predict what where will be the population average values like that. So, population average so, for that one concept was developed over here by the statistician which is known as confidence interval over here with one estimation of x bar one estimation of x bar what we can say is that with certain confidence that the lower bound and upper bound where mu will lie I can determine over here. So, this L and mu L and upper specification let us say upper limits like that. So, some bounds can be given to this mu based on one estimation which is x bar over here. So, I have only one estimation and I have some information on standard deviation over here. So, if I have mean information and standard deviation information which can be also sampled standard deviation over here. So, if you have these two parameter information or some estimates over here, then I can say where the mu should lie that means where the mu should lie this confidence interval and this gives you better better assessment over here rather than saying x bar will be exactly equals to mu I am saying that if x bar is this much mu should lie within this to these values like that. So, I am giving a confidence interval over here and confidence interval over here depends on certain value which is known as z statistics over here and assuming that the sigma values of the population is known most of the time it may may not be known. So, in that case this z will be replaced by t statistics like that. So, there are different distributions continuous distribution very some of the important distribution used in quality for assessing and design of experiments are f distribution, t distribution and z distribution that is normal distribution like that ok. So, normal distribution, t distribution, f distribution these are the common distributions or chi square distribution. So, these are the distributions which will be used for quality analysis like that quality control and improvement analysis like that ok. So, one of the important concepts over here is shown over here. So, mu will lie. So, if you take an average one average over can fall over here one average can fall over here one average can fall over here, but whenever you are building the confidence interval you can be sure. So, let us say they have developed a 95 percent confidence interval and I have one average it is expected that and you do multiple times I take multiple average over here see if I have taken 10 10 10 times I have taken the sample. So, I can expect that over here 9.5 times. So, over here. So, if it is 90 percent over here we can assume that 9 out of 10 or more than that will be within the confidence interval that I have given over here. So, in this case. So, that is given as confidence interval like that. So, one sample with some confidence level which is given over here as alpha over here that is known as alpha is known as level of significance alpha is known as level of significance. And if you can define the alpha what level of significance you want that will define the width of this confidence interval over here that will define the width of the confidence interval over here. So, if you want to be more confident over here you increase the alpha values over here and in that case the calculation will show you give you a wider range of this. So, that means, you can expect that if I increase the level of confidence if I have to increase the level of confidence in that case alpha has to be decreased over here. So, in this case error error ah or ah committing error which is known as alpha over here. So, this alpha values needs to be decreased over here, you want to be more confident over here. So, you need to expand this band over here rather than 95 I want to expand this to 99 ah band width of this and in that case what will happen is that this area will go down over here and alpha will come down alpha is the rejection basically ah how much I I can ah I can be wrong like that how how much time ah you you can think of that probability of going wrong like that ok. My estimation of confidence level can go wrong in what is the percentage chance of that like that ok. So, if you define alpha and you calculate one average over here and you can calculate the standard deviation either by s or sigma over here I can define a confidence zone or lower limit and upper limit over here I can define over here that means, with a given estimation of x bar which may not be there can be error between x bar and mu over here, but ah what we expect is that maximum error I can commit when I am over here in this zone or in this zone. So, I am saying that mu is expected to lie within this zone and this zone over here this is the confidence interval ah confidence interval of mu over here ah or population mu over here ok. So, I am giving a ah with some confidence over here. So, ah chances is that 95 percent of the time ah I will be right, but 5 percent of the time I can be wrong over here that is why probability is used over here. So, this is a ah concept of confidence interval we can think of. So, this idea of confidence interval so, if I take an average so, what you have to do is that ah you take ah samples from the process and based on that I want to infer about the population mean or standard deviation whatever you can think of. So, in that case statistician has given us some formulas that if this is the mean or this is the standard deviation I can calculate what is the confidence interval of ah population parameter ah where the population parameter is expected with certain confidence level that means, ah I can be wrong, but that will be defined by alpha levels what we are saying as level of significance over here. So, you define the level of significance ah and give me x x bar values and estimation of variance over here then I can tell you what what is the confidence interval within which ah mu is expected or the or the population parameter is expected like that. So, that is known as confidence interval that idea is given as confidence interval over here ok. With that estimation so, even sigma can be estimated over here unbiased estimation is s that is sample standard deviation over here. Again with this estimation if you have s again lower and upper control this lower and upper limits or bounds of the ah ah mu can be given over here which is given in formulation and only t statistics is used over here where alpha is the level of significance that already I have mentioned ah what is the chance that I can go wrong like that and there will be some degree of freedom which is mentioned over here as n. So, these two ah if you can define and I can define what is the value like z values over here we can also define what is the t value for a given level of alpha and given level of degree of freedom I can always assume ah and this depends on the sample observation that you have taken. So, sample size and minus 1 so, this will give me the estimation of n over here. So, I I can define the confidence interval confidence interval over here only thing is that instead of sigma I have used s over here. So, sigma whenever I am using sigma in that case ah this this ah over here it will be replaced by z alpha by 2 over here and if s we have no estimation of sigma or population ah variance or variation population variation or standard deviation over here which is expressed as sigma. So, in that case ah what is possible is that I can I can just place the unbiased estimation which is s, s is a sample standard deviation over here which is calculated by individual observation by x bar divided by n minus 1 as a degree of freedom. So, in this case what will happen is that I can replace that one only thing I will use a t distribution instead of z distribution like that ok. So, when variance is unknown so, in that scenario ah t statistics will be used in that scenario t will be used to find out the confidence interval to find out the confidence interval. So, we have to remember ah two important things over here one is known as confidence interval one is known as level of significance that means what is the probability that I can be wrong. So, this this will be defined by level of significance like that if it is 5 percent 5 percent of the time this confidence interval that I have given can be wrong like that. So, that is the that is the interpretation I can make out of this ok confidence interval and ah if variance is known in that case we use z ah to define the ah bounds and if sigma a variance is unknown population variance is unknown in that case it will be replaced by ah t t statistic over here. So, t ah it and we can get this t value from tables which is provided in any statistical book at the end of the books you will find, but MINITAB will do it automatically for you. So, we will define the samples and MINITAB will do it automatically for you. So, this is one of the example that we will use to demonstrate how MINITAB does ah builds the confidence interval like that. So, this is a table which shows investigation of mercury contamination is large amount bus over here. So, a sample of fish was selected from ah 53 Florida lakes and mercury concentration in the muscle tissue was measured like that in ppm and this is the measures that you see concentration over here these are the values that we are getting and this data set is taken from Montgomery's book and the ah population standard deviation value is given over here. So, I want the ah mean upper bound and lower bound over here. So, lower bound. So, I want to build the confidence I want to determine the confidence interval of this. So, upper bound and lower bound over here. So, in this case let us assume that alpha equals to 0.05 like that and that is a probability I can go wrong. So, ah with that what we can do is that we can define the formulation over here which is given in the last slide also and this is the formulation for ah lower bound and upper bound calculation like that. So, this is the data set that we are having we can always calculate x bar average over here and sigma is known over here. So, in this case immediately and n is also known number of observations over here is given. So, I can calculate what is the lower limit even if I use the formulas and the final calculation what you see is that the confidence interval of nu is coming out to be 0.43112.6188 like that. Let us do it in Minitab and try to figure out that whether ah Minitab is also giving the same results or not ok. So, what I will do is that I will go to ah Minitab file where it is given. So, in this case what we will do is that ah. So, other than this file we may be having another file where we have confidence interval and the data set will be there. So, let us open the data set and this is the consideration what what example we are talking about concentration in PPM. So, in this case what is given is that this data set is given over here. So, this is the consideration data set that we are having and we want to find a confidence interval. So, when sigma is given. So, I will go to stat basic stat maybe and I will go to one sample Z over here one sample Z. So, in this case what I will do is that I will identify each in column. So, it is not summarized. So, concentration over here and known standard deviation I have to give over here. So, known standard deviation which is given is 0.3486 and that I will write 0.3486 and if I and then in that case everything I do not want to and confidence interval that over here one option is there 95 percent band you want or whatever you want that you have to mention. Now, this hypothesis testing we have not discussed. So, ignore this one do not click this one. So, I am not clicking anything over here. So, I am I am just trying to figure out if sigma is known and the data set is given where the population mu will lie like that. So, that is a confidence zone I want to calculate and for that in options what I have taken is that 95 percent confidence level that means confidence level is taken. So, alpha will be 100 minus 95 that means 5 percent over here. So, that means I can be wrong 5 percent over time. So, if you click ok over here and you click ok what will happen is that you will get this information like this. So, in this case what happens is that we can we can just enlarge this one we can copy this one and we can also paste it in excel let us say and we can just see the values maybe it is from here it is difficult to see. So, I am just copy pasting this one excel excel sheet over here. So, if I if I do that so in this case what we have done is that so once again just go back to the previous one. So, I will copy as picture if I if I do that and paste it over here now now it is possible. So, this is not required and I can just enlarge this one like that. So, you can now it should be visible somewhat distorted, but this is visible over here and number of observation is 53 over here mean of the observation is 0.52 that is the sample observation and sample standard deviation is given and standard error of mean over here is calculated by standard deviation divided by square root of n like that. So, you can calculate that one you will get standard error of mean that is the variability of the mean basically. So, then 95 percent confidence interval of mu that is calculated is 0.4311 and my calculation hand calculation also says 4311 is the calculation and 6188 is the calculation when I am doing this. So, 6188 is also same. So, minute I have calculated the same thing if I have done it by hand also I am getting the same values over here. So, known standard deviation is equals to 0.3486 that I have that I have to input over here. So, in this case it becomes easier for me to calculate the confidence interval. So, this is how we are calculating over here and confidence interval is very easy. So, I can calculate the confidence interval for a given sample observations. So, similarly what we can see now this z values of alpha by 2 what you see over here alpha by 2. So, this is 0.025 over here. So, in this case what happens is that I need a z table for determining for the z statistics that is written over here z 0.25. So, this value that you see over here is I also need over here. So, n is known sigma is known x bar is known, but z alpha by 2 is not known to me. So, how do I get that z alpha values over here? So, I have a standard normal table like that and standard normal table will tell me that where it is. So, 9725. So, if you have done some basic course on statistics you must be knowing that how to see standard normal distributions like that. So, in this case 1.96 this value is basically giving z value of 1.96 that will give you a area under the curve that means area under the curve over here. So, this is the area shaded area over here this is coming out to be 0.975 and 1 minus of 0.975 will give you 0.025 like that ok. So, this this is the value that we are looking for and so, z values can be can be seen from the z tables like that and from there we can calculate this one. So, if z is known then everything is known over here. So, then lower bound and upper bound calculation is easy. So, based on that I can calculate what is the lower bound and upper bound like that ok. So, in case you do not have this standard deviation estimation like that. So, that means only sample mean and sample standard deviation calculation is given. So, instead of z what we need is z t value over here. So, this is given by statistician that to build the confidence interval in that case z distribution cannot be used. So, t distribution has to be used. So, t is a specific distribution which is coming from z distribution. So, this t distribution concept has to be used over here to determine the confidence interval over here. But underlying assumption is that this data follows normality it is this data follows normal over here. So, this assumption should be verified and then only we can give the confidence interval over here. So, in this case confidence interval using t statistics is coming out to be this and this can be calculated this can be seen from tables t tables are there. So, t tables if you see. So, in that case we will get the values of this. So, x bar is known standard deviation of this is known x bar is known. So, in that case immediately I can determine n is also known over here. So, how mini tab does it for you? Let us go to mini tab over here and let us try to see this and you do not have to do anything over here and what you have to do is that we will use the same one. And this is the second data set sorry this is the second data set we are using which is on loads. So, tensile adhesive test that is that is given. So, this load is the data set that we are trying to see and we can go to the data set and I told you how to see normal distribution assumptions. So, immediately we can verify. So, go to basic statistics go to normality test and go to load variables over here do the understanding test and you will get the information over here. And when you see the information over here you see the p value of the is reflecting that 0.836 is the p value. So, it is more than 0.05. So, immediately we can interpret that we can interpret over here that the data set is adding to normal distribution assumption because p is more than 0.05 we have mentioned that if it is more than 0.05 it is normal distribution it is following normal distribution over here. So, this assumption comes out to be true. So, immediately we can we can use the t statistics over here to calculate confidence interval for that what we have to do is that we have to go to stat basic stat. And in that case I have one sample t over here. So, instead of this so, I have to mention load as the data information I will not perform any testing over here. In options 95 percent confidence interval I am keeping over here. So, I will click ok over here and I will I will click ok over here. Whenever I do that I have these values of confidence intervals. So, copy as a picture and we can paste it in excel and try to see what values it is giving like that. Earlier it was that is z distribution now the interval I am using for this t distribution over here to calculate. So, 12.138 15.289 is the values that we are getting over here and let us go back and check what is our calculation. So, 12.14 which is close to 12.138 third place of decimal 15.28 and here also 15.289 like that. So, MINITAB is also giving the same results when hand calculation what we are doing. So, in this case 95 percent confidence interval MINITAB is giving you directly over here ok. So, another option is if standard deviation is we want to see and similarly what we can do is that we can also calculate for this what is the confidence interval for the variability of this that means variation over here. So, what will be the population variance and for that what what we can do is that we use as a we use a specific chi-square distribution over here. So, in this case what you see is that a chi-square distribution can define the confidence interval of sigma over here for any given data and the what we need is that S information number of observations that is selected over here alpha values that we are selecting same way and this has to follow a chi-square distribution with alpha by 2 and n minus 1 degree of freedom like that. So, this is the level of significance over here alpha is considered as a level of significance. So, I can also check the confidence interval of variants over here. So, both the things are possible and here chi-square distribution which is coming from also normal distribution assumptions basic assumptions and defining a normal another random variables which will follow chi-square. So, in this case what they are doing is that they are representing that as a chi-square distribution. So, this statistics this can be can this values can we can we can get it from chi-square tables like that. So, in this case we have some tables from where we will get the values of chi-square over here. So, I can I can do that and MINITAB does it automatically for you for mean also for standard deviation. So, it will it will give you a confidence interval for the mean also it will give you a confidence interval for the variance also over here. So, if if I have to calculate variance confidence interval what I have to do is that I have to go over here and in this case let us go to stat and basic stat and I have one variance like that. So, in this case I will say load I want to estimate the variance I will not perform any hypothesis testing 95 percent confidence interval I will keep and same way if I if I click ok over here. So, this is chi-square interval that is giving. So, here you can see like that two points. So, if I copy this as a picture I can paste this one below this one and what you get is that confidence interval of using the standard standard process that is chi-square distribution. So, you have to see over here 2.7 into 5.08 like that ok that is possible from here also there is a graphical summary over here if you go to basic stat graphical summary if you go to graphical summary what will happen is that you click on graphical summary I want to see for load and confidence level is given as 95 percent. So, if you click ok what will happen is that you will get some graphical summary over here. So, in this case what is what you can see is basically all information together over here all information together over here. So, so understanding normality test was done and p value is more than 0.05. So, data seems to be normal. So, 0.838 is more than p that p value that alpha value that we have taken as 5 percent like that. So, this is coming out with more than 0.05. So, this is adhering to that mean value standard deviation variance skewness kurtosis of the distribution is given minimum maximum all this statistic as this basic statistic information is given over here 95 percent confidence interval of mean. So, over here I have not mentioned standard deviation. So, in this case it will automatically calculate based on t distribution and it can calculate interval for median it can also calculate interval for standard deviation over here. So, in this case only only thing is that what is the underlying formula you can see from minute I have helped like that and you can find out figure out you can do it by hand also. So, confidence interval of mean and standard deviation can also be determined over here can also be seen over here based on the estimation over here and chi-square distribution can be used over here and x bar in this case t distribution can be used over here ok. So, I can get it in one go over here what I have done is that basic statistics graphical summary over here. And if you go to descriptive statistics over here if I if I give this data over here and then I go to statistics over here, here also there is possibility that I can see standard error, standard deviation, mean median and loss information no missing variables over here. So, in the quartile range over here of the data set coefficient of variation also we can see which is the ratio between sigma by mean basically. So, how much is the standard deviation with respect to the mean how much is the magnitude of standard deviation with respect to. So, if I click ok over here and in graph also you can see the box plot of the data set that is given and histogram with normal distribution that is also possible over here is played a descriptive statistics. So, this is the descriptive statistic that you see this is the graph what we have seen earlier and this is the box plot of the data that it is giving. So, only thing is that over here you will not get the confidence interval. So, but that can be seen when we are going to basic stat and you are using graphical summaries over here or otherwise what you can do confidence interval. If sigma is known I will use one sample z, if sigma is unknown in that case I will use one sample t and for variance I will see one variance test over here. So, but when I am using t I am assuming the normal distribution assumptions and we can check that one when we implement and try to figure out what is the confidence interval of the mean based on one sample. So, what we are trying to do are trying to say over here in this in this session what we have tried to say is that with one single x bar information and s information we can predict the behavior of the population, we can predict the parameters or interval where mu will lie or sigma will lie basically ok. So, this is a unique thing that is given by the statistician and we are using that so that we do not have to do experiments time and again like that in design of experiments only when one go we have to we have to predict basically one go we have to figure out what should be the setting like that. So, in that case this confidence interval concept is very important and from here we will continue and try to figure out that more statistical information what is required like hypothesis testing that we will try to address in the next lecture. So, thank you for listening. So, in next lecture we will talk about basics of hypothesis testing which will be used for our experimentation which will be useful. So, I will not discuss huge hypothesis testing concepts like that. So, I will give you some hints what is hypothesis testing and based on that we will proceed in our course. So, this is quality control and improvement. So, we this is not a statistics course. So, what we are highlighting what is required in our course. So, that that we are going. So, we can stop here and we will continue from here. Thank you.