 Welcome to dealing with materials data, we are looking at the collection, analysis and interpretation of data from material science and engineering. And we are in the module on fitting and graphical handling of data, we are using R to do fitting as well as plotting to understand data better. In this session we are going to look at calibration, fitting and hypothesis testing. Calibration as we discussed sometime back is very essential and it should be carried out periodically. After calibration is done typically calibration table or curve or plot is generated. And sometimes for doing this fitting is used, so you will have a material or some process for which you know what the reading should be and you will use your equipment and you will read what the equipment gives as the reading and the difference is to be corrected because we know the calibration material we will use that to correct. So when you work on a new material when there is a difference then you can correct for the actual reading. And for doing this typically one also has to look at the data and generate calibration curves or calibration plots. One example is for example calibration that is done in nano-identor if you are doing indentation experiments. So we will in our case studies look at calibration exercise to give you an idea of how it works typically it involves fitting and that is why it is in this topic. The second one is that we have been looking at fitting and we first looked at linear fitting and then we looked at cases where it can be turned into a linear form and fitted and also non-linear. For example we fitted for specific heat which goes as A t plus B t cube. Now there are functions that can be linearized we have already seen some examples for example it is exponential you can take logarithm and if it is exponential with A plus B exponential then you can take Y minus A and then take logarithm that will again make it linear. And if it is power law we saw one example that by taking logarithm you can make it a straight line. But there are also cases for example if Y is A X B plus X you can take 1 by X versus 1 by Y and that will be linear that is known as line weaver Burr plot. You can also plot Y by X versus Y or X by Y versus X and they also produce linear plots. The idea behind these is to just see that there is a linear relationship that exists after that you can actually do the analysis and find the parameters for the linear fit. One more example where you will see which can be turned into a linear form is the hall pitch relationship for the flow stress in materials which is related to the grain size through this 1 by root d relationship. Of course if you do the transformation 1 by root d as X then you get sigma is equal to sigma naught plus K X so this is the intercept and this is the slope. These 2 sigma naught and K are known as the hall pitch parameters and we will do one exercise. So we will take the data from NIST monograph and which gives yield strength as a function of grain size at 295 Kelvin. And the monograph also tells you that the hall pitch fitted parameters are 18.6 plus R minus 1.7 for the intercept and 112 plus R minus 2 for the slope of the hall pitch relationship. And the standard deviation in the flow stress itself is 11 MPa. Of course once you know these 2 errors it should be easy for you to calculate what the error in sigma Y should be. So this is something that we have already looked at but how do these numbers come about? So we will take the same data which was used to produce these numbers and generate these numbers for ourselves in order to understand how it works. So it is a simple linear fitting that we are going to do. So fitting as we have seen is done for several reasons. One is to obtain parameters when functional form is known. For example we are assuming that hall pitch relationship holds and then we want to know what is sigma naught K is so you can do that. Or to identify if there are correlations. So you can do all these transformations like take log or take Y by X versus Y or X by Y versus X and things like that and plot. And if they show a straight line then you know that the functional relationship could be of the form AX by B plus X for example. So you can do a fitting exercise and that is where plotting is very useful, graphical analysis is very useful because you can just plot and see and if there is a relationship that you see then you can try to find the functional form. Or if the functional form is known then you can obtain the parameters. And getting parameters from data and estimating the parameters of the underlying probability distribution is something that we have already seen. So this is done and so it is part of the fitting exercise. But fitting can also be done to test if the data supports a given functional form. So this is the hypothesis testing. For example you can take the data and you can ask the question is it true that the flow stress is related to the grain size as power minus half. So that is a question that you can ask and to test whether the given data supports this hypothesis is the hypothesis testing. And so for example you can ask the question fit the copper strength data will be get an exponent of D as minus 0.5. And this problem is the problem of hypothesis testing. So we are going to deal with this in greater detail not just for copper we will take lots of materials for which data has been collected by one of the researchers. It is available in the literature available for everybody in raw form to download and do the analysis. So that is something that we are also going to do as part of our case study in the next module. But for now we will at least see how to fit and if you assume that there is a given form can we get the parameters that is the part that we are doing now and we will continue doing in this session and the next. So hypothesis testing is something that we have already seen at some level. For example we said okay what is the probability that the mean will lie within some range. So you can also have a hypothesis that the true mean is this or it is in this range and then you can test whether the hypothesis is true. You can also make hypothesis about variance and test and you can test the hypothesis about means from different experiments. So it naturally leads to analysis of variance which is something that we are going to look at in this module in one of the sessions and from there it also leads to design of experiments and so on. So we will do design of experiments case study also in the next module. So for now what we are trying to do is to do the fitting exercise. So let us do that. So let us start R version 3.6.1 and what we want to do is the hall patch for copper. So let us just try to do this. So what is the process? We first want to read the data and it is in CSV format. It is copper strength grain size at 295 Kelvin in CSV format. This data is taken from the NIST monograph and then we are going to take the grain size and store it in the variable capital D. We are going to take the yield strength and store it in the variable Ys and small D is 1 by square root of capital D. So this is D to the power minus half and then we are going to plot D versus yield strength and we are also going to fit yield strength as a function of D. So the intercept and the slope should give us the hall patch parameters. So you can see that the data, the D versus yield strength is like this. So this is actually square root of the 1 by square root of the grain size and so you can see that it is like this. So we can try to see where the fitted line is. So the fitted line goes through all these points and so it does seem to follow this relationship and of course you can also plot the residuals and you can see that about 0 on either side the data is spread and there are a few data points which are slightly away but most of the data is between minus 10 or between minus 20 and plus 20. So you can see that the residuals is random or it looks random. However if you plot the QQ norm for example, you do not see a straight line so there is, there seems to be some problem I mean it is not, the error does not seem to be normally distributed. You can also look at the fit and of course you get 18.5 and 12.4 and you can call for summary fit. So it gives you 18.5, 1.7 and 112.3672. So if you look at the fitting parameters that was given by the monograph, so it is 18.6 plus or minus 1.7, 112 plus or minus 2 and in this case we see that, so we get 18.5 instead of 18.6 the error is 1.7 and that is right and 112 so that is 112 and 2 is the error in the slope of D to the power minus half that is. So we see that we get the parameters that is described in Hall-Petch but there is a problem if you want to know if it is actually D to the power minus half because the Hall-Petch relationship is like this. So the Hall-Petch relationship is like this and so if you want to know whether it is D to the power minus half, one of the things that you can do is to take logarithm of sigma minus sigma naught and logarithm of D and try to see if that gives you a factor of half and logarithm of k then will be the, it will be sigma minus sigma naught, so you take log. So you will have log k and D to the power some minus n, so you will have log k minus n times log D, so that the n should be half or minus half and k should be given by the intercept. But we do not know sigma naught, so we assume that it is a constant, so we just take, so we can just take the data and it is not quite complete but one can try and do this exercise. So let us say that we have D, so we take logarithm of D and store it as y and we store as x the logarithm of so we can see whether we can plot this and does it follow a straight line. And so let us try to do the fitting and the fitting is y, sorry x as a function of y, so fit 2, so the parameters are like this, so if you take exponential 4.9680, so that is 143, when we fitted we got 112, so the k happens to be 143 and y happens to be minus 0.3, so it is not minus 0.5, so it is minus 0.4, so there seems to be slight deviation, so you can also look at the line that you get, so we can look at the error, so again the error, so we called it fit 2, so let us do this again. So you can see that this is a line and the data is different, so this is slightly off of course, but if you look at the, so fit 2, so you see that about 0 on either side, this is again normally distributed, we can probably normally distributed, we can look at q, q norm. So this certainly seems to be, the error now seems to be normally distributed, but we, so we assumed that d goes as power minus half and we fitted and we got some parameters and as far as fitting goes that is good enough, because if you look at the list data, this is from several different sources, the data is not even from a single source, so if you look at the grain size versus yield strength and all the existing data can be put in one single form and within some 11 MPa plus or minus you can actually predict the yield stress, which is very good. However, if you ask a different question, namely that is the power minus half and try to do the fitting, you see that it is not so, certainly approximate because there is still a problem sigma minus sigma naught and we have not taken that sigma naught into account or if you assume that the sigma naught is 18.5 like we got earlier, you can do this exercise, you can try subtract it and then for the remaining quantity you can try to do k log d kind of fitting, log k plus m log d kind of fitting and find out what the m and log k values turn out to be and see if that actually gives you this functional form. But we are going to do it in greater detail in the next module when we look at some case study, so we will take Hulpecch as a specific case study and check if there is enough statistical evidence and any physical reason why we expect it to be minus half or could it be something else, so that is a question that we are going to look at slightly later. But this is a data that is available and there is lots more in this monograph, it also gives the yield strength as a function of percentage cold work and temperature and so on and so forth and impurities, concentration of impurities and so on and so forth. So, it is a good idea to explore the data and try to see if there are trends and for example, if the two data are correlated and questions like that one can ask and one can use fitting as the starting point for exploring the data further. So, in the next session, we are going to look at analysis of variance and that will bring us to the end of this module and then we will move on to the case studies module. Thank you.