 Welcome to dealing with materials data, we are looking at the collection analysis and interpretation of data from material science and engineering. We are in the module 6 which is on case studies and this is the fourth case study which is on design of experiments. We are going to use this microwave plasma process optimization to produce nanotitanium as the example case to understand design of experiments. Here the optimization is done through the design of experiments. And the data and the method are described in detail in the paper by K. Murugan et al. Materials and Manufacturing Processes published in 2011. This paper is about synthesis of commercially important titanium enopoder from low-cost titanium tetrachloride using microwave plasma process. And what they studied and tried to optimize is the parametric impact on process parameter of the process parameters on conversion efficiency and percentage annotates in the titanium powder. So they did the design of experiments so that they can optimize the parameters to get maximum efficiency and best percentage annotates in titanium nanopoder. So there are lots of R libraries to do design of experiments DOE.base, FRF2, DOE.rapper and there is also a plug-in which will allow you to use a GUI to do these analysis. Professor Ulrike Gromping has given lots of material including some slides and manuals et cetera. And there is also a CRAN page which is for design of experiments and analysis of experimental data which is what is shown here. So it is at the CRAN R project, experiment, design of experiments page, there are lots of libraries and material that is available and what you can do is given. And they also refer to Gromping's work and so they have this fractional factorial two level design which is very comprehensive R package. And there are other packages also which you can explore for doing the design of experiments. However, in this session this is not what we are planning to do, we want to carry out the analysis using linear model fitting and ANOVA. So we are going to do the most direct way possible so that you will understand it better and we will use this paper by Murugan et al to confirm that our analysis is okay and we are getting the same results as is reported in this paper. However, it might be a good idea to explore these and make some experimental design yourself. For example, the design matrix, how do you decide and things like that. So you can use these libraries and generate them yourself. So the paper has replication experiments, so two levels and two sets of experiments had been done and what is reported in the paper is lumped data and only the averages are reported. But we have the raw data available to thanks to Professor Gokhale who is one of the co-authors and so we are going to use the raw data and I am going to share that raw data also with you. So you can do the analysis on the data and confirm that the results that are reported in the paper is what we are getting. And I am not going to reproduce all the results, I am going to leave some of them out so that you can try and do it yourself. It might take a bit of effort and a little bit of reading and also some amount of practicing with R and thinking about what the quantities are and how they are calculated and so on. But you have been taught design of experiments in the other part of the course and so you should be able to take all that knowledge and use R to solve whatever is left out. But in principle you should be able to reproduce all the results in the paper using the lecture that you have heard on design of experiments and the code and the data that I am going to share with you. So let us do this, we will use, I am also going to keep this paper handy. So this is the paper by Murugan et al. So this is about microwave plasma process optimization to produce nano-titanium through design of experiments. And it also has some interesting conclusions which is worth going through. And our aim is to actually take the data and produce the results. By results we mean that this table of coefficients and this analysis of variance table for example and these figures, residuals figures and so on. So you should be able to produce this normal probability plot and the table of means plots. They might involve some effort but that will help you understand the methodology as well as R programming better. So I am going to do for one. So there are two exercises, they are repetitions of the same exercise but one is done for percentage efficiency, the other one is done for Anettis percentage. So let us do this. So we are going to reproduce using R. So I am going to take, so the first step is to read the data. So let us do that. So we want to read this data and what the paper reports is only the, so this is the design matrix and this is what is shown in coded form in the data that I will show you. And the data on efficiency and percentage Anettis is needed that is not here. So I am going to show you the data. So this might be familiar to you from the presentation and in the presentation what was shown for these two is the average of efficiency in the Anettis of 16 experiments. But here we are showing you the full data. So I am going to show for 32 experiments and so the first 16 and the next 16 is basically repetition of the first 16. And in each case what is the percentage Anettis obtained and efficiency percentage is what is given. So this is the data that we are going to use. So the first step is to read the data and we know that the different columns of the table give you this PFR, AFR, CFR, FR, RC, power, ET, etc. So I am going to use the same symbols because these were the ones that are used in the paper. So I am going to say second column is PFR, third column is AFR, fourth column is CFR, etc. And then there are interactions and the interactions are between PFR and AFR, CFR, FR and AFR with CFR and FR. So these were the five interactions which were decided to be important for this study and that is given here in this table. So PFR with AFR, CFR, FR and AFR with CFR, FR. So these were decided to be important and these are the other 7 parameters. So 7 plus 5 there are 12 parameters and the response is percentage efficiency and percentage Anettis which is what is given. So once we have this, so let us read and get this data. So once we have this data in place then we can start our analysis. The first thing we want to do is that we want to do the logit transformation. Which is very important and as you have learnt if you do not do in this case you might get wrong results like you might get percentages which go above 100 and things like that. So after we do the logit transformation we want to fit and we want to look at how the fitted parameters look like. So I am saying that this logit efficiency is a function of PFR, FR, CFR, FR, RCL, PWR, ET and these 5 interaction parameters and there is a constant that will show up anyway. So now you can see that the fit actually gives you the fitted parameters along with their standard error and now let us compare what we have in the paper. So here is what is there in the paper and you can see that this 4 3 7 5 is this and 3 2 7 2 is here and 8 8 5 4 is here and so on. So basically this column corresponds to the estimate of the parameter and this is the standard error. So it is 10088 which is 1009 up to 4 decimal places and if you take it up to second decimal place the T value is given 4.336 minus 3.24, 8.78 etc. and these are the P values and in the paper alpha was taken to be 0.05. So anything less than 0.05 here was considered to be important and the conclusion that was drawn was PFR, FR, FR, RCL and the 3 interactions of PFR with the other quantities were statistically significant. And here you can see we have the statistical significance marked by 3 stars, 2 stars and 1 star. So anything up to 1 star is 0.05. So point is actually 0.1 so it is a 10 percent significance level. So the other ones, so you can see that I1, I2, I3 which correspond to this PFR, FR, PFR, CFR, PFR, FR are considered to be important here also. So significance code indicates that with alpha 0.05 these 3 are significant. These 3 are significant, these 2 are significant even with alpha 0.01. So obviously for 0.05 they are significant. So that is the RCL and PFR and then with 0.001 level of significance these 2 FR and the intercept and FR are important. So these 3 are important and that is what is given now. So 0.4375 so that is this quantity plus minus 0.3272 PFR that is X1 and minus 0.8854 AFR and the next one CFR is not significant. So X3 is skipped and then it is X4 which is given 4943 so that is given with X4 and then the fifth one RCL is important so 2890 X5 is given but X6 is not important so we have left it out. And then 0.2513 X1 X2 so 2513 X1 X2 2590 2589 that is X1 X3 and 2230 X1 X4. So these are the parameters which are important. So it is the same information which is here and same table from which same conclusion is being drawn. Of course the next step is to do ANOVA on this linear fitting that we have done. So let us do that. So you get these values here and you can see that this is partly a reproduction of this table which is the next table. And for example residuals is 6.188 so that is the information here and they have also included the main effects and two way interactions separately with sum of squares you can do that too. So here is the command to do that. Let us do. So this is 40.5804 which is the same quantity which is given 40.580 and if you go from 8 to 12 which are the two way interactions so you will get 7.057 which is the quantity that you get here 2 way interactions 7.057. So basically the ANOVA table can also be reproduced and of course we also want to reproduce the figures 4 and 5 which is for scatter plot of observation versus residuals. Of course I do not have the data in observation order so this will be slightly jumbled up but you can look at the presentation where observation order is also available so you can get it in that format. And here is the fitted value versus residuals. So let us do those two plots now they can also be done rather straightforward. So this is for plotting the residuals so you get the plot. The other one is for plotting the fitted values versus residuals that is also a scatter plot. Let us do that and this is the plot and you can compare it with the plot from the paper. So this is the plot from the paper. So you can see that we get the same plot and you can of course draw a line at 0 to separate this data and look at how it looks. So this is for doing the up to residuals and I am going to leave the normal probability plot for you to explore as well as the total means for logit efficiency for you to explore. Once we have done this exercise we can repeat the same thing with the annutase percentage. So let us do that. So we take logit on annutase percentage and we again fit it for this thing and you get the fitted parameters and you get the significant ones to be I3, I4, PFR and the intercept and that is the conclusion that is also drawn from this namely that for annutase you see that constant PFR the rest of them are greater than 0.05 except for PFR, FR and DFR, CFR. So these are the only four which are important and that is the same conclusion you draw from here. So up to 0.01 significance alpha 0.01 it is I4 and PFR and 0.05 significance it is I3 and 0.001 significance it is the intercept. So that is what is given here in the paper. So they look at it and then they say that constant X1, X4, X1, X3 and PFR was already there. So then you can again do the same plots and do the ANOVA and so on. So those are the commands that are given here after you have the fit you can make those plots. So you can make the residual plot, you can also make the fitted values versus residual plot and you can compare this figure with what is given in the paper. So this is observation order so it is here. So you can see how this figure compares with this. So we are getting same results and of course the other quantity to reproduce is the table of ANOVA and you can do that by reproducing the table 7 or at least parts of table 7 by doing the ANOVA. So you draw the same conclusions and again you can just add up the sums of squares for these and add up the sums of squares for these and you will get the numbers which are the same so we did and you can add up. So you will get 2.952 and 4.963 and you see that it is 2.952 and 4.963 so that is the way you get and there is also of course residual which is 3.6314 and here residual is 3.6316. So it is the same result so we can see that we have reproduced. Now this is not the only command there is also another way of doing this analysis which is what I want to show here so let us do this. So this is the command AOV so annotates the logit transformed is a function of so this star symbol means that individual that is PFR plus AFR plus PFR colon AFR which is the cross term and any duplicate will be removed. So PFR, CFR means PFR plus CFR and PFR CFR but PFR is already there so that will be removed and so here this is another way of writing the same interaction so PFR has an interaction with all this AFR has an interaction with these two so those are the 5 interactions and the remaining terms without interaction and with interaction terms. So you can do and then you can do a fitting and you get the same result I mean the results are not at all different which is expected except that now instead of I1, I2 etc you can clearly see what the interactions are also. So this is another online command for you to do the same analysis and get the same sort of results and you can also do this not just for ANAVA you can also do it for efficiency so let us do that also. So you get the same conclusions namely that these 3 interactions are important and these 4 are important and these are not important from the point of view of the significance. So this is the paper and so we are able to reproduce most of the results and the remaining ones also you will be able to reproduce yourself and it is also a good idea to understand these significance levels and the table 9 validation trials but I am going to leave that to you to explore on your own and because all the data will be available and the presentation of design of experiments is available and this script will be available to you so it should not be very difficult for you to reproduce them on your own. So to summarize design of experiments is very important and you can optimize process parameters by carefully setting up the experiments and then analyzing them and such statistically planned experiments and the statistics that you get from them will make life easy for you in terms of optimization otherwise there are too many parameters and you need to take a call on how many experiments you will do and how you will change the parameters and so on. So this is a nice way of doing it and for doing that of course there are lots of libraries in R which you can use but I have also shown you that with whatever we have learnt we have learnt linear model fitting and ANOVA using just these 2 commands or combinations of such commands you can get all the information you want. So if you want to set up a new set of experiments for some optimization here is a way to explore and I also strongly urge you to go through the material on design of experiments and the libraries that are available and the other ways of doing things for example I have not shown how to make the design matrix but you can generate those things also using R which will help you set up your experiments. So we have looked at one two factor experiment and reproduce the design of experiments analysis that was done in the paper. I hope this will help you set up more experiments of your own along these lines. Thank you.