 course on dealing with materials data. For past couple of sessions we are going through a case study on design of experiment and learning the process how design of experiment methodology is applied. In the past the first session we talked about the historical aspect of design of experiment, when can we apply the design of experiments, what are the basic three principles of design of experiment and we introduced the general guidelines how to go step by step or in implementing the design of experiments and we introduced the case of optimization of efficiency of titanium production through microwave plasma synthesis. In the second session we worked out the complete analysis of the design of experiment as to how we did the factor assignment, interaction assignment in a design matrix L16. We also showed how to make a matrix L16, then we said that what is the experimental order that is standard ordered and a random order of with the replicated experiment, we got the result. We did not discuss the analysis of the part of this results in great details, but we quickly went through it to show that the range of efficiency by that I mean the interval estimation of the predicted efficiency under the selected random selected factors falls in the range of 90% to about 115%. The question we later on asked was that is everything okay and one can very easily say that everything is not okay because efficiency of any system cannot be 115% and therefore something has gone wrong. What we wish to do in this session is we would like to introduce in such situation logic transformation of response, then we will reanalyze the data and this time we will go through both the tables in great details. We will do the confirmation of assumption on error terms, we will show the selection of levels in order to come to the optimized values which would give you the best results. And then we will talk about the prediction interval estimation and the validation trials. So if you recall this was the analysis which I did not explain in great details which we will do now, but we came up with this interval as a prediction interval 95% prediction interval and for 5 validation trials and we find that there is something wrong in it. So we asked ourselves a question is everything okay and the answer is no because there cannot be 115% efficient system. So now I would like to emphasize yet one more thing. It is very important that when you do this design of experiment keep asking your question yourself a question now and then is everything okay. Because what really happens when you conduct the whole experiment you get so much involved in the experimentation and then finally in the analysis which is all variety of numbers that you deal with that sometimes you lose the focus on the main issue of the problem and therefore it can very easily happen that we say that I have done everything and this is my prediction interval for 5 validation trials but does it make sense? This question you have to ask time and again in order to see that you are going in the right path in the right direction for your analysis. So here the question is is everything okay and the answer is no. What went wrong? Well we assume this model you remember we assume the model that the response is a regression model with the parameters which are the 7 main effects and 5 interaction effects. So there are 13 parameters. I can simplify this model by saying that it is y is equal to mu plus epsilon and our assumption is that epsilon is a random error and it is distributed as normal with mean 0 and unknown variance sigma square which in turn implies it means that this assumption actually has made an assumption on the response variable y which says that y itself is a normal random variable with mean mu and variance sigma square and which means that we are assuming that y the response value is going to lie between minus infinity and infinity but our response is percentage and percentage lies between 0 and 100. Therefore we have violated this assumption itself in taking this model and therefore something needs to be done about it and the answer to such situation in this case is logic transformation. So if your response values are in percentage it means that it lies between 0 and 100 it makes sense to make a logic transformation before analyzing the data and we should analyze the data with the logic transform response. So the logic transformation is given as z is equal to logit of y which is natural logarithm of y divided by 100 minus y. This logit transformed y which is z lies between minus infinity and infinity and therefore in our the model assumption we replace the response y by the logit transformation of y z and then we analyze the data. So now after the logic transformation these are the two tables of analysis. This is the table of all the coefficients and this is the analysis of variance table after logit transformation of efficiency. You see all the results have changed. The tables have been created in the same way. Your response y which you have been working on is now logic transformed and therefore your effects values are different and it has the p values are different. Now let us see. I already told you what is the t distribution. This is the t statistic. This is the t statistic for testing the hypothesis that beta i is equal to 0. This is the t statistic which refers to the null hypothesis that beta i is equal to 0. Beta i refers to the each beta in your equation in your regression model and this is the p value which is actually the probability that of critical value when H naught is true and we say that this probability has to be less than alpha. You remember this is what we had learned. So if this is less than alpha we reject H 0. That means that the probability of critical region is less than alpha we reject H 0 and therefore you can see that this constant is only a coefficient. So that is of course accepted but these are the values which are shown in red which are less than our alpha is pre decided as 0.05. So these are the values which are less than 0.05. So it means that plasma flow rate, additional gas flow rate, feed rate, reaction chamber length and these three interactions are significant because we reject the null hypothesis. It means that for these values betas are not 0 which in turn means that if you sorry which in turn means that which in turn means that these some of these parameters, the parameters referring to plasma gas flow rate, additional gas flow rate or plasma glass flow rate interaction within additional gas flow rate. These are the betas which are not 0 other all other beta we accept the null hypothesis that these betas are 0 and therefore we find that these are the betas which play the important role. These are your significant factors all others are not significant. In other words your model actually your logit transformed efficiency depends on plasma gas flow rate, additional gas flow rate, feed rate, reaction chamber length and carrier gas flow rate only through the interaction. So it is plasma gas flow rate interaction with additional gas flow rate, plasma gas flow rate interaction with carrier gas flow rate and plasma gas flow rate interaction with the feed rate these are the only important factors all other white colored factors are not important. Let us look at the analysis of variance. In analysis of variance these effects are confirmed by the fact that again you take alpha is equal to 0.05 then these are the F statistic which also says that the variation due to main effect is more is significant than the variation due to error. Here what we are comparing? We are comparing the two variances. This is also an estimate of a variance sigma square and this is also a estimated value of a sigma square. This is estimated value of sigma square that is the main effect adjusted mean square is estimate of sigma square provided your beta 0 which says that main effects are important is 0. If that is the case then these two are exactly the same. So if we have two estimate of variances the estimate due to the main factors and the estimate due to the error. Error is always an estimator of sigma square. So if these two are same it means that the ratio F is not large enough then you can say that main effects are not important. But here we find that main effects are important. The column effect and the row effect if you recall in analysis of variance this effect is important because the p value is less than alpha. So the same criteria applies and similarly these effects are also important. So here we are though it looks small but these are significant enough. Now what is lack of fit? Lack of fit is calculated by using the three left out columns. So this is calculated that error due to the three columns which we have not accounted and the pure error this is we are dealing with only this. This is the ratio of 2. This is also an estimate this. Now remember maybe I should use a different color. Let us use a color green. So these particular factors these together these are your lack of fit and this these two are your lack of fit. What does this say? This part actually refers to the case of replica. This refers to the replicated replica. We have done two sets of 16-16 experiments to find out if there is a error due to this experimentation itself. And that error here it shows that this error is not significant because this is greater than alpha which is 0.05. So it we find that this is it is in the critical it is not in the critical region and therefore we cannot not rejected the hypothesis. The hypothesis is that there is lack of there is no lack of fit. Both of them are exactly the same. Our hypothesis is there is a lack of fit. So hypothesis is that there is a lack of fit and it is rejected you sorry there is no lack of fit and therefore the hypothesis we cannot reject it. So there is no lack of fit it means that your model adequately defines your complete data. It means that this model this model is completely able to explain the logic transform response in terms of the selected parameters which are given here. So this says that there is no lack of fit is your null hypothesis which is not rejected. So there is no lack of fit. The hypothesis the result is that it completely the model of a logic transformed efficiency completely is explained by these selected variables. We go to the next. These are the P I mean this is coming up. There are graphical methods also available to detect the factors with significant effect. In this the standardized effect you remember this is your effect and you can divide it by the appropriate value of standard error of coefficient. Then you get a standardized effect and standardized effect is plotted against normal score on a normal probability plot. It is a normal probability plot. If the values follow the correct normal path it should follow on this line but when it deviates it actually tells you that these are the values which have the beta value not equal to 0. These are significantly affecting the your result. If your result was only a random error see this also shows that this line would come if the effect is only due to epsilon. Let us try to understand this. When you say that beta 1 beta 2 beta n etc is 0 it means that your variation that you see in logic transformed efficiency is purely due to error. There is no effect of mu. So when that happens we say that all the result should fall in this line. When they go away from the line it means that your assumption that percentage efficiency is completely a random process is not true. There is a systematic error in it and those systematic changes are because of these factors. These are because of these factors which are farthest away from the line which shows that it is only a random effect. So please understand in this whole process we are trying to separate out the effect due to the systematic change in the factors versus the random error which we say that it is may have caused due to the human error, machine error etc. So that random error and systematic errors are separated in this particular model. This is the systematic error or systematic changes and this is the random error. So what it really shows is that if the error was only random it would fall all the data points would fall on this line. But as they are going further away it shows that there is these factors which have been very systematically changed from level minimum to level maximum has an effect on your percentage or logic transferred percentage efficiency. This can also be found by plotting an effect plot and interaction plot. These are very simple plots compared to the previous one. The plasma flow rate is varied from minus 1 and 1 and it is the logit of efficiency is plotted on y axis and the slope of the line shows that how severe is the effect, how large is the effect. So for example carrier gas flow rate or you look at evaporating temperature it hardly has an effect it is almost horizontal. While these has some effect but you can see that these are the ones which have a larger effect. Carrier gas flow rate is also almost flat like power so that is not showing up. When you look at the interaction plot it is plotting the same thing for the two effects together. So on one axis is additional gas flow rate on and the other axis for example plasma gas flow rate and the effects are plotted and whenever these lines cross each other it means that there is an interaction effect. If the lines are almost parallel then we say that there is no interaction effect. So you can see that these are all almost parallel but this has an interaction effect. This has a clear interaction effect as you can see. So you can also see it graphically generally most of the software package and I believe even in the R you can have these plots very easily and understand it. Now another thing we have to make sure that we are not deviating from our assumption, the assumption on error. So error is the logic transfer that is z value and the estimated z value the difference between the two is called an error. So z i minus z hat i is called an error and there in this error if this is plotted the observation order versus the standardized residual if you remember we did this exercise during regression analysis. If this is completely randomly plotted then we are very happy that it is indeedly a random error. If we plot the fitted value against the standardized residual or the residual well standardized residuals and if it does not show any pattern it also means that there is no pattern in it it is a randomized order. This is a normal probability plot. So we have plotted against normal probability the values of the logit the error caused the standardized error caused by logit transformed estimated values efficiency values and it shows that this error standard errors is also falling on the straight line and therefore it is our assumption of normality is also correct. Finding the level for maximum efficiency we find you can do it this way this is called the estimate tables where you take plasma flow rate at different values and you find the mean values of its logit transformed efficiency and its standard error of mean. Remember standard or error of mean is the standard deviation divided by square root n. n is the number of experiment you are conducting you are looking at only one level. So you have conducted only 8 times 2 16 experiments here. So accordingly this has been worked out. So this is a mean over the values when plasma flow rate was kept at minimum this is the same when it was kept at maximum you want to have a maximum efficiency. So you want to have the maximum mean. So you know that plasma flow rate will attain maximum at when plasma flow rate is at a minimum level it will give you maximum efficiency. Similar additional gas flow rate should also be at the minimum level to give you maximum efficiency. We can do the same thing with respect to the interactions. So the interactions will have 4 possibilities and we find that when plasma flow rate is minimum and additional gas flow rate is minimum it gives you the maximum logit transformed efficiency. While plasma flow rate at minimum but carrier gas flow rate at a maximum gives you the maximum logit transformed efficiency and similarly minimum value levels at plasma flow rate and feed rate gives you the maximum thing. So our selected variable levels selected factor levels are we have selected 4 factors plasma flow rate at low level, additional gas flow rate at low level, feed rate at low level and reaction chamber length at high level. The interactions are important plasma flow rate additional flow rate both at low level which is good. The carrier gas flow rate appears only as an interaction and it should be kept at a high level while the feed rate and plasma flow rate both are at low level and their interactions are also important. So this is what you derive looking at this. You remember last time we had done it by looking at this tables that if this is minus it means that the values are minimum of the plasma flow rate will give you maximum of the logit efficiency. Here instead we have done the actual calculations and found it out. So this way also you can do it. This way it becomes very clear particularly with respect to the interaction and then the comes that this results that we have got we must validate. It means that we must carry out a few experiments by keeping plasma flow rate at low level, additional gas flow rate and feed rate also at low levels but RC length at the highest level and of course, carrier gas flow rate at a high level. Conduct a few experiment and see that it gives you the maximum possible efficiency, logit transformed efficiency. Logit transformation is also increasing function and therefore if the logit transformed efficiency is high the efficiency is also high. So we must run a few trial at the recommended factors and level for maximum efficiency and we should see that the efficiency value should fall in the predicted 95 percent interval estimate of the maximum efficiency. So let us do the interval estimation. So this is the model we have estimated. Expected value of y it is actually logit transformed y is this. This is the coefficient and these are the red values which are shown as having the coefficient they are written here and these are the actual parameters which we have put in here and if you take you remember that this is a coded one. So therefore plasma flow rate at minimum is going to be minus 1. This is going to be minus 1. Reaction chamber length is the highest so it is going to be plus 1. This is not the values that you have calculated here. Please remember these are the coded values. I think we must write it down here. These are all four coded values very important. This is for coded values. So this is going to take minus 1 value. This will be minus 1 because we have to keep at minimum and this will be plus 1 value. This will be minus 1 times minus 1. This will be also minus 1 times plus 1 and this will be also minus 1 times minus 1. This is how you have to calculate. So this is what I have shown here. This is what you have to keep these values and then these are you have to do it in a coded value. So these are the values you will take and then how do you find the range? The range is found in this manner. This is your F value and adjusted mean square is your F value is at value at 90. Here something I have not written. I must write it down here. This is F. It does not show well. Let us change the color. It shows F at alpha. Please remember we have to take it at alpha level and therefore you take F at alpha 95 percent that is 0.05. This is the error variance which is shown here and then this is the so this is going to be the effective number of replica which is total number of experiments divided by degrees of freedom of selected factors plus 1. R is the number of validation trials. So accordingly if you calculate, we were to calculate four validation trials. So the logit efficiency interval is this. Interval estimate for efficiency for four validation trial is 80 percent to 96 percent. These are the four validation trial results which is 92 percent, 89 percent, 90 percent and 87 percent and the average of that is 89.5 percent which is very much within the interval and therefore we can say that our results are validated. In other words, this model is the correct model in which you select only here is the model. Only logit efficiency described in the form of plasma flow rate at the lowest level, additional gas flow rate this is the yeah plasma flow rate kept at low, additional gas flow rate kept at low level, feed rate kept at low level and reaction chamber length kept at high level with carrier gas flow rate at high level would give you the highest efficiency of the system and we validated it by going through the finding out the 95 percent interval of predicted value of four validation trials and the design is validated. Thank you.