 To session 15 of our course on Quality Control and Improvement Using Minitab. So, I am Professor Indrajit Mukherjee from Shailesh J. Mehta School of Management at IIT Bombay. So, last time what we are doing is that we are discussing about process capabilities and in case it is non-normal what is to be done. So, some data sets we are just exploring and trying to see and I told also there are two types of transformation that is available. One is box-cox transformation which is a lambda transformation, optimal lambda transformation that we are doing. So, if the CTQ is Y represented as Y, so Y to be called lambda what should be the lambda values and that will transform the data into normality and then we can analyze process capability like that ok. And the specification will also be transferred like that. So, some examples we have taken and we will try to extend that lecture to some extent and try to see that how for the previous data what we have taken, how this transformation works in capability analysis when there is subgroup size is involved into that ok. So, so this is what we have done. So, I will I will go to the examples like that. So, I will go to the examples and so this was the ring data that we are using samples. So, these are the subgroups that we are having. So, what we did is that we went to stat and then quality tools and then we went to capability analysis. And there is a capability analysis six pack also over here which we can use in case transformation is rendered. So, I will go to normal over here and then in the subgroups I will mention that these are the subgroup information that we are having. And let us select this one and lower specification what last time also we have seen is 73.965 and the upper specification was 74.035. And then we went to transformation and we said that in case transformation is required use boxbox transformation over here and I click ok. And then in test what I do one point going outside that will be the control that we are using over here stability, stability criteria. And then we write that if it is more than one subgroup. So, I use R bar concept R bar by D2 concept to calculate the standard deviation. And based on that we will calculate capabilities short term. So, in options we have not mentioned any targets over here we can write 74 as the target like that. Capability and benchmark what is sigma levels that we have discussed last time. So, in case you want to check the sigma level you can do that. So, if I click ok over here what happens is that this analysis comes up. And what you see over here is that capability analysis over here and USL and LSL are transformed over here target is also transformed the values you can see are different. Because the MINTAP software used a lambda transformation of 5. And after lambda transformation what happens is that you can see the Sundance and Darling test over here and P value is more than 0.05 that means it is adds to the normality assumption over here. And X bar charts are shown over here on transform data. And range is also shown over here on the transform data like that. And this is the overall capability analysis and CP calculation what you can see is 1.17 and that it is saying CP k is about 1.13. The CP value is not changing much. So, I think transformation after transformation also the data seems to be more or less and very close to normality. So, in this case we may not have used transformation because it works well. I am telling you that even if some small deviation moderate deviation capability analysis is not much impacted like that ok. But if it is very skewed in that case we have to think like surface finish what we have seen like that and surface finish are generally skewed dimensions like that ok. So, those things needs to be those CTQs needs to be converted and then according to target we can we can see the process capability analysis. So, pp and ppk, ppk, cpk values are given over here and those information you can get and ppm levels are also given. So, total ppm within an overall that is also given at the end ok. So, in the graph you will get all observations and also the randomness of the observation last 25 subgroups. So, these are the observations and it shows scatter towards and there is no such pattern what is revealed over here. So, in this case we can we can consider that this analysis is quite perfect, but cpk value indicates that we need to improve if we have to improve beyond 1.33. So, that is the thing we need to we need to ensure ok. So, let us take another example over here. So, we will close this one and we will take the container example where lower specification is given as 200. So, in this case I will use another transformation. So, quality tools may be capability 6 pack I will go and normality analysis we are doing and container 1 to container 5 over here and I select those observations and then lower specification is 200 what was mentioned last time also this will be blank. Transformation let us go for Johnson's transformation now and click ok and then we test this one one point going outside and then estimation over here the same estimation process we will follow and in options target value is 74 we can delete and otherwise capability analysis we want to see ok. One sided so, cp cp values will not come. So, in this case anyhow. So, we have changed this data set we have given options for transformation as Johnson's transformation over here and we want to see what happens ok. So, when you click this diagram over here what you see is that it it has used is family of transformation. So, in this case what happens is that this is the transformation that is done on the y characteristics you can see x. So, x will be replaced let us say by y you can you can think about the data that is given is y information or CTQs generally be expressed like effect that is the CTQ. So, final outcome that that is the CTQ and that is y. So, Minidab says that we have transformed with a symbol x like that the data set is x over here ok. So, then what you see is that after transformation of the data set 0.033. So, y is transformed to 0.033 plus this with the function and then x minus this and this is the family of transformation that is used to calculate the parameters over here parameters are calculated and then the final outcomes what you see. So, data was transformed and why it has done transformation because the data does not follow normal distribution. So, how do we ensure that one that we will see and after transformation what happens the p value over here what you see Anderson-Darling test p value is more than 0.05 and I told that if it is more than 0.05 we have not explained on that aspect we will do next time in our subsequent lectures. And we will we will try to see what is the p value, but at present we assume that if p is more than 0.05 in that case it adheres to normality like that. So, the data set was plotted and p value was calculated as 0.729 and that adheres to our normality assumption. So, there is no problem with normality after transformation like that and the index was calculated over here ppk index was calculated as 0.61 like that ok. So, this is already given over here. So, this is what I wanted to show over here and this is the transformed data on which the analysis was done. So, I will click this one. Now, this data set is in subgroups so how to ensure that this is normality. So, what I did is that I took all the data set into one column C16 and then I want to went to stat basic stat and I wanted to check normality assumption over here. I gave container data all and in this case I wanted to do Anderson-Darling test and I click ok. And what I found is that p value that I am getting is less than 0.05. So, less than 0.05 and that indicates that data is non-normal scenario exist over here. So, normality is there that is why Boxcox or Johnson's transformation are used, but both the transformation will work everywhere it does not you cannot ensure that one. Sometimes it works and sometimes both can fail also. So, that is also possible like that. So, in case everything fails then we have to see some other ways to calculate non-normal process capabilities. So, there are fitting distribution and based on that you will get options in need have to do that ok. So, this is the data set that we are talking about over here and then go to the slides what we wanted to cover next. So, this is about process capability. Now, in case of attribute data in case of attribute data where we have defects or defectives like that. So, in this case how do we calculate the sigma labels or process capability that is there any conversion that we can do ok. So, for continuous data we have seen that earlier method works well and in case of process using count data that is defects over here some other units are used over here defects per units defects per million opportunity and then based on that we can we can signify we can tell what is the sigma label like that. Although normal distribution assumption is taken to convert this one. So, but basic assumption is that we can we can make a sigma label transformation from the defects per units defect per opportunities and from there we can calculate the sigma we can we can go back to sigma label and and based on that we can also calculate process capability like that. So, there is a interrelationship between all these metrics that we are using ok. But we are assuming over here process stability that is the basic assumptions that we are adhering and customer let us say specification given as sigma label should be greater than 4 or something like that in this particular case. But last last month what happened is that I have I have manufactured 16000 approximately units and over a period of 18 days and I want to check what is the sigma label of the process and we found out that 231 defect defectives were reported like that or we can we can say defects or something like that. So, you have to define. So, let us say it is defects over here 231 defect because 6 sigma methodology nowadays connects everything to defects not defectives. So, in this case let us assume this is defective. So, 231 defects over here reported out of 16000 units like that. So, do we meet the specification can we can we check the sigma labels of this and then accordingly we will take the actions like that. So, then the definition is that how do we do that we calculate defects per unit ok. So, 16000 units and 231 was defects let us say. So, defects per unit will be 231 by 16810 that is around 0.0137 like that ok. You can go up to 2 place of decimal or 3 place of decimal like that defects per unit. So, how many defects per unit like that ok. When we have done defects per unit then we can convert into defects per unit per opportunity in 1 units like that. So, one important terminology that comes into over here is opportunity opportunity ok. So, this term opportunity is explained in any 6 sigma methodology codes like that. So, I will only tell that this opportunity has to be defined and let us assume the opportunity in 1 unit is 1 over here. So, in this case what we are saying is that the defects per opportunity remains same. So, defects per unit is about same fraction and then divided by 1 also defects per opportunity also comes out to be same ok. So, then defects per million opportunities. So, in 1 unit 1, 1 opportunity this much. So, in million what we have 10 to the power 6 multiplied by 10 to the power 6 simple mathematics over here. So, 10 to the power 6 multiplication over here defects per million opportunity this comes around 13.741 like that ok. And now how to convert that into sigma labels what we have in excel this conversion formulas like that normsinverse and 1 minus dpo that values that we have got and plus 1.5, 1.5 that is added over here and that will give you the short term sigma labels like that. So, that will give you the short term sigma label. This function is available in excel and you can do that in excel only ok. So, we do not need, but there is a chart conversion what is also possible over here. There is a complete chart where where you can found that in dpmo is this much what is the sigma label like that. So, ours case is approximately equals to 13.741. So, let us try to see what is the sigma label 13 how much it is 13.741 over here. So, if you go over here in this in this graphically. So, what you can see is that this is approximately defects around 13 no. So, this is 13 and what was the value 741 13.741 13.7 approximately over here. So, in between somewhere over here 553 and 907. So, it is approximately near to 3.7. So, 705 or something like that we can assume like that. So, is it less than 4? Yes it is less than 4. So, we have not reached the sigma label which was earlier defined that customer specification is saying that sigma label for this specific process should be at least equals to 4 and here what we are seeing is that we have not reached that sigma label over here. So, this way also attribute data we can figure out and using this chart over here and there are many of the books will give you this chart of conversion of the sigma label short term sigma with dpmo. So, if you see the last dpmo over here which is a 6 sigma process this is around 3.4 that after shifting the distribution this is 3.4 sigma label is 6 over here. So, in this case it is a 6 sigma process we can say. So, this is a sigma label what we are getting 6 if defect is 3.4 here it is 13 7 something and in that case it is 3. something and that is below 4. So, we have not reached that sigma label yet what is required by the customer like that. So, this is all about process capability what we have discussed and we will now require some amount of statistics which will help us to deal with experimentation. So, when we are dealing with experimentation some basic idea of statistics is required and I will give you some brief on statistics and some of the things that is really important for design of experiments and those things I will highlight not I will highlight huge statistics over here, but whatever is required at least we should cover over here. So, that I understand basics of hypothesis although you can do some other courses where basic statistics are covered and I will I will give a brief introduction to that and because that will give some idea what we are doing next time subsequent lectures over here. So, that way we will try to deliver something. So, what is statistics we want to understand because everywhere there is strategy you see quality over here. So, and everywhere one of the values I have mentioned over here that means basic statistics X bar I have mentioned then sigma I have mentioned. So, this is coming what is the estimation of sigma all these things we have seen and now we have blindly assumed that whatever is told and we are assuming that that is that is the correct thing and we will do like that ok. But what is the basis of that what is the foundation of that. So, we want to understand that one. So, statistics can be divided into two categories over here and this is all about data science basically ok. So, over here what we are doing is that we we do not have ideas of population. So, in that case we need to take samples because a machine is running for many years and many many samples have gone out. So, here countably although it is finite, but it is countably infinite basically. So, we do not have all information all data sets over here. So, what we do is that we take samples out of that we take a screenshot and we say process capabilities is much like that ok. So, how we are making that inference basically how we are making. So, what we do is that we can we can see statistics in two ways one is known as descriptive statistics that means visualization of data what we have told then summary statistics what we have seen means standard deviation all these locations and that is one area of statistics which helps us to visualize which helps us to make some initial initial interpretation about the data and the second part of it is known as inferential statistics. In inferential statistics what happens is that we will make conclusions based on the data availability like that ok. So, decision making happens in inferential statistics and descriptive statistics shows visualization of the data over here ok. And for that using statistics it depends on what type of data I am having left like what we are mentioning attribute data and the continuous data like that. So, category of data needs to be identified. So, we need to know what are the different types of data categories. So, the highest level of data category is ratio scale data or interval data like that. So, these are the data which are having the highest level of quality highest quality level where we can do many analysis of statistics like that. So, interval data means this is the data where we have other than 0 numbers can take any infinite numbers in a scale like that. So, ratio scale means it includes 0 over here and then we have ordinal scale where like rating systems when you go do customer surveys like that. So, in that case rate in a scale of 1 to 5, 1 to 7 like that that is ordinal scale and there is nominal scale like color or where we cannot arrange the data into orders like that, but we can say that they are different like that although ordering is not possible. So, ordinal scale ordering is possible interval ratio possible, but nominal scale like marriage medial status or male or female like that. So, we cannot say which is bigger than which one like that which will come first which will come second. So, we cannot order that one like colors also we cannot do that ok. So, most of the analysis what we do in quality is assuming that CTQ is continuous because we are assuming normal distribution and for that continuous data assumption is required or it is interval scale or ratio scale where we are applying this concept of statistics basically ok. Because we have to make inference after doing experimentation. So, after experimentation we need to let us say select which variable is important which is not and which is to be changed like that. So, that analysis and to make an inference is only possible using statistics like that ok. So, statistics what we are doing basically is that we do not have idea of the population. So, what we are doing is that we are taking samples out of the population like that and based on the sample information we are making some we are extending our idea about the population basically ok. So, we get an estimation from the samples and based on the estimation what we say is that population parameters will be this much like if I get a mean over here. So, if I get a mean over here I want to extend that one what is the population mean or where should be the population mean like that. So, I take only one sample I cannot do it several times like that. So, with one sample I want to I want to predict what should be the new values or the population over here. I do not have population information over here like in a tree if you are if you are trying to predict what is the apple weights like that and in that case what will happen is that you will take some apples and calculate the weights and then because this is destructive testing. So, I cannot take all apples from the tree then we cannot those is unnecessary over here. So, we will take some samples and based on that we will try to estimate about the population over here. So, we will make an estimation about the population. This is what we are doing in inferential statistics we do not have information of population that is why we are taking samples that is why statistics is coming into picture and we take some samples and based on that we we try to try to estimate population parameters like that whether it is standard deviation or whether it is mean because mean and standard deviation is the only thing about a CTQ we want to infer about ok. So, where is the mean? So, where is the standard deviation or how much is the standard deviation like that of the population basically have I controlled when I have taken I have done experimentation let us say simple experimentation one time can I infer it in population. So, in population also behavior of the CTQ will be like that. So, what we are doing is that we will do experiments with small samples and based on that we will try to infer that if I keep this setting over here it will work for the population entire population whatever inputs you are giving and this is the process setting which we can adopt in future like that ok. So, that is why we are using statistics we will not do experiment each and every time and then fix the parameters like that. We will freeze it one go and then maybe after certain time if we feel that again experimentation is required we will do that ok. So, what we are doing is that we are talking about population estimation over here and from sample information that we are getting over here. So, some sample information and this is the histogram what you see. So, this is the histogram what we are doing over here and from this information I want to predict about the population. So, some sample information is collected from here and I want to predict the behavior of the populations like that. Population can have normal distribution as two parameters one is mu and one is sigma over here. So, these are the two parameters that define normal distribution a complicated probability function which defines the probability density function of normal distribution, but what we are doing over here is that assuming that is known and in this case I take some sample assuming that is normal and then can I predict the population parameters what should be the mu value what should be the sigma values like that can we do that. So, that is what we are doing in statistics ok. So, from sample information making inference. So, how many ways we can do sampling there are different ways of doing sampling when you pick the apples from the tree how we can do that and you have a huge target population like that and from that how to select that one there are probabilistic way there are non probabilistic sampling also methods that means, which does not follow basic randomness in that case like convenience sampling to I have shown over here convenience sampling and reference sampling like that. Convenience means based on my convenience I I pull any of the people and ask the producer what is what do you think about the restaurant food or something like that. I am standing just outside and whatever customer is coming and I will do the survey that is convenience because convenience means I will stand in one restaurants and try to infer about all restaurants like that. So, that is not recommended, but that is non probabilistic sampling and there can be referral sampling that means, I am I am trying to understand the philosophy of criminals in that case I will ask a criminal and he will give me some other reference of criminals like that. So, that will be our referral sampling that means, I go from one point and then the person I am surveying will give me hints who is the next one I can survey like that. So, that is also non probabilistic type of sometimes it gives you some lead, but that is not probabilistic sampling. There are different methods of probabilistic sampling one is simple random sampling, stratified sampling, systematic sampling, cluster sampling. So, simple random sampling what we do generally when we are doing experimentation and we call it as randomization we call it as randomization. Systematic sampling when we are doing like in control chart processes what we are doing is systematic sampling at a given time point we visit the process and that is assumed to be random. So, the first starting point and then maybe after half an hour or something like that I am going to the process though there is systematic sampling and stratified sampling you understand that age group why is I am doing sampling like that mostly in survey we use this stratified concept and cluster sampling like pooling and all these things exit pools or something like that there you can use cluster sampling like that. So, most relevant to quality over here simple random sampling and systematic sampling that is the only thing I can suggest over here which is which is of importance to us in this course simple random sampling randomization is and how do we ensure randomization we ensure randomization by random number generation like that. So, samples will be selected based on random number. So, if you have let us say 10 samples over here which one to be selected first maybe the first digit second digit of this number we can take 7 the next one 0 we will skip then 9 will be taken like that then 8 will be taken like this either I go row wise or I go column wise or row wise like that anyway you go it is basically random sampling that we are doing. So, we select the samples randomly and we do experimentation why we do that there is a proper reason for that why randomization is important like that in experimentation and simple random sampling is like that and systematic means which samples I will select that is random, but what time point I will go. So, at a defined interval we visit the process and take the data set like what we have done in control chart techniques like that. So, that is systematic sampling. So, these are the two types of sampling and what we are doing is that from the samples we are estimating some parameters x bar let us say average values over here. So, can I extend that where is the mu values over here and for that the concept of confidence interval is used. So, that we will discuss in the next session like that. So, here what we are doing is that we are taking some samples we are estimating some means and we are trying to define that what is the population parameters like that what is the population parameters like that. So, up to this point we will we will try to extend after this lecture after this one we will extend. So, what we have done is that in this complete lectures that we have covered what we have said is that process capability analysis can be can have non-normal scenarios in that case what can be done is that we can we can transform the data set is in normality and then transform the USL-LSL like that and then what we can do is that we can calculate the capabilities and that is the best way of doing and but if there is a moderate deviation in that case we can we can assume that normality assumptions are we can we can do that and without transformation also we can make inference, but plotting the data is always helpful and seeing that how much skewed it is. If it is not much skewed and we do not want to fit another distribution and try to increase the complexity because shop flow people will not understand so much complexities like that. So, what we do is that in case it is required we will say process capability is this much and in case transformation works we will just show that with this transformation this is the current capability do not do it assuming normality over here because another distribution may be quite correct over here and this is the non-normal situation we can address that one using MINITAB and there is a option in MINITAB. So, we have options over here to fit distribution and then based on that we can do quality analysis. So, if I go to capability flag or capability analysis there will be non-normal scenario, but you have to assign some distribution over here. So, distribution has to be fitted and based on that specification you give and it will use some estimation over here for the parameters and then options will be given the target same capability benchmark like that. So, I need to fit distributions like that if I can fit the distribution perfect distribution what what it follows like that and and it is skewed distribution that we are fitting over here there are different types of distribution and distributions can be fit and seen which is closest to that in MINITAB also it is possible and based on that what we can do is that then we can do capability analysis like that. So, that was the starting point where and then we we went to the discussion that statistics then we talked about attribute data and how to how to calculate sigma labels like that in case it is defects in that case how it is done. So, many of the scenarios can happen that this is assembly operations and what is the sigma label of this. So, everywhere we want to implement the concept of sigma labels of the process like that. So, can I define that one? So, I have seen you have seen the DP defects per unit and based on that we can calculate what is the sigma label of the process. So, then we talked about that whatever we are inferring in quality and all these things is based on statistics and we want to understand some amount of statistics over here. So, that will help us for making inference when we are doing experimentation basically when we are doing experimentation and for that basic idea of hypothesis is required and one of the value that we have used p-value we need to understand what is that p-value. So, that a brief idea of statistics will help us to introduce in some of the topics which are very relevant to quality like regression what what will be used and in design of experiment. So, design of experiment is also. So, how to make inference in design after doing experimentation how to make inference and for that some background is required. So, that background I will highlight, but you can always do some course on statistics and to understand more about statistics like that ok. So, thank you for listening this lecture we will stop over here and we will continue from here on statistics more about statistics. Thank you.