 Welcome to dealing with materials data. This is course on collection analysis and interpretation of data from material science and engineering. We are in module 5 which is on fitting and graphical handling of data. In this session I am going to be talking primarily about graphical handling but we will also do a little bit of fitting along the way. We have seen that if you have a set of data you can calculate the average and distribution of the data and from which you can estimate the mean and standard deviation and if you know something about the probability distribution from which this data is sampled you can even get properties of that probability distribution by looking at the data but those are very simple things that we have done. So, either parameter estimation as a point estimate or as an interval to say that okay this is the probability that the true mean will lie in this range and things like that. Now what we are interested is in a slightly more detailed analysis because when we do experiments typically we vary some independent parameter and make some measurement. For example you can say that okay I will change the grain size and I will look at the strength or you can say I will change the composition I will look at the lattice parameter or you can say I will change the temperature and I will look at the thermal expansion coefficient. So these are the kind of things that we do and the composition or temperature or grain size then becomes an independent parameter and the measurement that you make in terms of strength or lattice parameter or the thermal expansion coefficient then becomes the dependent variable. So here is an example so determine strength of pure copper you can do that by varying temperature grain size alloying addition or percentage cold working. Now our interest is typically in knowing how strength is related to these parameters but we cannot directly read it off from the data because the data consists of statistical errors. In some cases for example between strength and grain size you might think that okay I know what is the relationship, hall patch is the relationship. In some cases we might not know how does it change with temperature or composition or percentage cold work maybe we have no idea or somebody says that the relationship is hall patch I am doing the experiment to actually test whether it is hall patch. So in all these cases we have to establish a relationship between independent parameter and the measurement that we are making and there are several scenarios where the relationship is known or where you are trying to test that relationship or where you are just exploring and trying to find if there is a relationship and so on and so forth. So we want to do this quantitatively as well as qualitatively so you can do it graphically and analytically or computationally and from NIST monograph 177 properties of copper and copper alloys at cryogenic temperatures there is a whole bunch of data on all these aspects which we will be using in this session extensively and to describe what is it that we are trying to do so there are independent variables X i and we are making measurement Y i and Y i is related to X i but there might also be other unknown parameters that go into this expression and so we need to estimate this and we need to establish this relationship so that is what we are trying to do. So the first thing to do is always to plot the data and just looking at the data it is possible to identify trends and if the error bars are available you should always plot with error bars because without error bars sometimes you might read off wrong trends. Once you see the trend then you can fit and every time you do fitting it is always a good idea to plot the data and the fit together and it is also a good idea to analyze the residuals. What is a residual? So if you have made a fit and if you have data how far are these data points from your fit is a good thing to look at and if it is proper fit because the residual is an error it should be a random error which means it should show the characteristics of being a random variable. Now before we go into the copper database we are going to do some very simple analysis just to show how plotting is a good idea and after plotting how do we go about fitting the data in in R. So I am going to use two data sets one is variation of density and lattice parameter with composition in silicon germanium alloys. This is because we know that with composition the lattice parameter will change linearly and this is known as Weger's law. So we are going to check that and get the Weger's law coefficients and that is this exercise. The second one is a variation of linear thermal expansion with temperature in boron nitride in a given temperature in 77 to 1289 Kelvin. This is also a data taken from the literature and in this case we are going to do a little bit more exploratory than this because in this case it is known that lattice parameter and composition should be straight line. So if you plot and you see that it is a straight line then you can go ahead fit and get the parameters but here let us say that we do not know what the relationship is and we are going to explore and find out how it is done. So these are the two exercises that we are going to do. So let us start R and let us do the first thing. So we are going to first read the data silicon germanium lattice parameter and density and in composition I am going to store the mole percent of silicon and in the variable A I am going to store the lattice constant and let us say plot C and A. So let us do this as the first exercise. So we have plotted and you can see that as composition changes the lattice parameter changes and from the plot it is clear that it is a straight line. So then we can ask a straight line fit to be made and that is done using the command. So that is done using the command fit. We want of linear fit between A and C and we are going to plot a line using the fit coefficients. So these two I am going to not do right now. So let us do this. So we fit it and so it is as simple as that just say fit and you can look at the coefficients of the fitted line and then draw a line and you can see that we have a nice line running through the points and you can see that the error is random the points lie on either side but quite close to the data and so that is what is done. You can say that you can plot the fit residuals right. So this is the error you can see about 0 on either side the data is scattered very nicely and you can also plot the QQ norm of the residuals to know whether that is normal or not. Remember this is another way of knowing if your data is distributed normally. So if it is more or less a straight line you know that the error is normally distributed. So this is a rather straightforward exercise. Let us go now to the next one which is to get the thermal expansion coefficient of boron nitride. So in this case again we read the data BN linear thermal expansion and we store temperature as T and we store the linear thermal expansion as alpha and in this case there is also error that is given standard deviation. So we store that as standard deviation then we plot the data and then we put the error bars and that is what is done using this. So let us do this and it is telling that 0 length arrow that is because one of the data points is the reference with respect to which everything is taken so it has no error bar or error bar is 0 that is what it is telling. Now by looking at this data one can see that this could be a straight line and this could be a straight line. So there could be two straight lines for this data. So if we do not do that if we just fit let us do that. So you can see that you can fit a straight line but the straight line is not looking like a very good fit for this data and it would be easier or better if we can fit a straight line here and if we can fit a straight line here. So this is also clear if you look at the fit residuals. So residuals actually show a trend and you can also do the QQ norm of the residuals and you can see that that also is not quite a straight line. So one thing to do is that let us take the data points. Let us leave out the first four and take the remaining data points and do a fit at first. So what we have done is we take the same data now we leave out the first four data points and then for the rest of them we try to fit a linear curve and you can see that it fits very nicely and here the data is above and below and probably normally distributed. You can do a similar thing for the first four data points. How do we do that? So we can so you can leave out from 5 to 12 and then do the plot and of course also do the fitting and so you can see that it also fits for these two data points and of course we can do the other thing namely that we can take the data do both the fits. So what is it that we are doing? We are reading the data and we are leaving out the first four data points we are making one fit. We are leaving out the fifth to 12 data points we are making the second fit. We are going to plot the data and we are going to plot both the fit lines. So you can see that so this is fit for a straight line and this fit is for a straight line. So you can say that there are two regimes below 600 for example you can fit it with one linear curve and above 600 you can fit it with another linear curve. But if you just look at the data it is also possible that the data looks like a parabolic curve. You might think that oh this might fit a nice parabola. Is that true? Well one can check and that is what the last exercise now is. So we have the data and now let us do the we are going to do the so I am going to take t squared and I am going to fit alpha versus t squared and this is how I have plotted the data. You can already see that this is becoming more or less a straight line. So it will fit a nice straight line. And of course you can look at the fit residuals and confirm that they are you can see that about 0 on either side they are there and they are randomly scattered and you can also look at the okay so this is like a straight line again indicating that we have got a better fit. So the purpose of this exercise is that whenever you get data you should always plot it and see and most of the times by I you can do a very good fit. In the first case it was linear so it was not surprising that we thought okay it is linear fit. In the second case one could clearly discern that it cannot be fit for a single straight line then there are several different ways of fitting and we saw two one is to fit for two different straight lines so that you will get most of the data follow the trend. The other one is to see if you can make a higher order fit and in that case we just took the square of the temperature and we saw that actually it does follow a nice straight line if that is the case. So and you can do the analysis after doing the fit and it is always a good idea to plot the data along with the fit to see that it is nice and this again gives you an idea that okay the errors are all random and it passes through closes to most of the points so this might be a very good fit. So in this case we found that it is actually parabola that fits it better. So in many cases in material science and engineering you might already know what is the expected trend is for example sometimes it is a Arrhenius type. So we know that if you take one by temperature and logarithm of one quantity it should show a straight line and so on. So in cases where it is not a straight line there are sometimes methods to convert it into straight line and even if it is some generic power law logarithm always makes it into a straight line for example. So there are also other ways of exploring data so one should always plot and see and if you do not know the if you know the functional relationship you can try to exactly plot that and see and if you do not know then you can explore and by looking at the plot most of the times you will be able to get the fit correctly or estimate the fit correctly and or at least if you make a fit and if you plot it and see you will know whether it is good or not just by eye and then there are other ways of checking that your fit is good by doing the analysis on the data. This is just a starting so we will continue doing more of the fitting and linear regression and analyzing the regression results and so on and so forth. But this is just a starting point of some linear fitting and some non-linear fitting quadratic fitting a parabolic fit that we saw and we also see that somebody might as well fit it as two different straight lines for two different regimes and the data fit. So one cannot say whether one fit is better than the other except if you have physical reasons to believe that this is the right fit or if there are changes in mechanism that you do expect a linear fit in different regions and a slope change between these two regions. So these are things sometimes known from your knowledge of material science and engineering and if you do not know then you can do exploratory in which case you can try different things and whatever serves your purpose you can pick that. So we will continue doing more fitting in the sessions to come in this module. Thank you.