 Welcome to dealing with materials data, this is a course on the collection analysis and interpretation of data from material science and engineering. We are in module 6 which is on case studies and the first case study that we are going to take up is on data smoothing. Mechanical properties are very commonly tested and measured because for many materials these are important and universal testing machine UTM is the machine that is typically used to get some of these mechanical properties. And in these machines we impose a load and measure the displacement and from these we get these stress and strain and a typical experiment is consists of tensile stresses and the corresponding strains and then you can determine whole bunch of mechanical properties elastic modulus or stiffness of the material, strength which is the yield strength or ultimate tensile strength or 0.2% proof stress. So there are lots of measures for knowing the strength of the material, resilience which is the area under the curve in the elastic region and ductility which is the strain that you get when the material fails and toughness which is the area under the strain curve. So all these you can get and in most of the machines after test there will be a computer that is attached to the machine which also collects the data and the data will be analyzed and the machine will give you the modular strength etc. But in this exercise just to have better understanding of what goes on in calculating these measures and also to appreciate some of the nuances involved in getting it automatically in like that in a machine we are going to do this by ourselves, we are going to do this by hand. And I want to show a typical data that you will see for stress strain, for example here you see that this is for aluminum and this is stress and strain and you see that even though it is a tensile test stress is shown as negative value and what is more there is some stress but the strain is still shown to be 0. So for at the beginning of the test before things settle down and so here is the first time you will see a non-zero strain. On the other hand when you have stress to be non-zero you would expect the strain to be non-zero but at least the machine is not able to measure these quantities or it thinks it is 0 or it is measuring it as wrongly as 0. So in any case so this is the kind of data we have and on top of it after this also there are so you have several stress values and the strain remains constant maybe because it is not able to distinguish between the different strains for example 30.56 in fact 29.66 and 35.92 they also 0.6, 0.06 but probably there is some change in the third decimal place the machine is not able to measure that and so on and so forth and you will also see when we plot that the data is a little bit nicey so we will do this plotting and see. So the summary version of what we have seen is that the raw data is noisy and so it needs smoothing that is to remove noise and retain only the data which we are going to do. There are also things like negative stress and stress with zero strain and so on so forth. So this data needs cleanup how to clean up the data and smooth the data to carry out further analysis is what we are going to do in this case study. And we are going to use data sets from stress strain experiment on both aluminum and brass and I will do the exercise for aluminum and I will leave the brass data set with you so that you can do same things and see how it works. So our task is to clean and smooth the data, calculate the modulus and measure of strength so we want to get and smoothing also leads to machine learning and there is a book on data science has this aspect described in detail. In fact I strongly recommend that you produce this book like we mentioned long back it is a book that is freely available and you can see that there is a chapter 29 on smoothing and it comes in the part on machine learning. So smoothing is a technique and it is called curve fitting or low pass filtering and it is very useful and as you see here so there is a data which has a smooth trend and there is noise and when you add them up instead of looking like this the data looks like this. So our aim is to separate out this part and remove it and so get the data back to this kind of curve so that we can do the analysis on that. So there are several different ways of doing it and we will do one manually ourselves and then we will use some of the commands that is given in this book but in any case I strongly recommend that you go through Reza Reza's book and so it has more information than what we are going to discuss which might come handy for you when you do smoothing of your own data. So let us go back to our data and let us begin with plotting the data. So let us open R and let us do the reading of data and plotting it. So we have read the data aluminum tensile data it is in CSV format so we use ggplot and we plot stress versus strain and we collect the points through a line and we have labelled it as aluminum stress strain curve. Now you can see that the data at least here for example shows some kind of noisy behavior and there is some noise here also and even in this initial part there is some noise but we are not able to see it clearly because you know when we draw schematic stress strain curves we draw a linear portion and then show the deviation from linearity but that linear portion is exaggerated to show clearly how it looks but in most of the materials that portion is very small elastic strains are very very small compared to the total strain so this is a very small portion of the curve. So we need to basically zoom on this part and show it as an inset to see how this part looks from where we are going to get the modulus. So to do that let us do the next exercise. So here it is seen better so there is noise and here you cannot very clearly see but there is little bit of noise here too. So to do this we are going to do this exercise. So we are going to use the library table and we are going to use library ggp miscellaneous and so we are going to define the main plot which is the same stress strain curve and we have the data so we are going to plot the strain stress in the data. Then we are going to prepare inset and the inset gives you the limits for the x and y axis and from the main plot it is going to take this portion of the curve and it is going to prepare an inset and then we are going to of course plot them together main plot and with the inset so that is what this command does. So as you can see so you can see that there is this main curve and we have taken a small portion from here and we have expanded and if you zoom in you can see that what looks like a neat straight line here actually also has lots of these wiggles. So what we need is actually a smooth curve and you can see that the initial portion still has some problems because maybe the stress strain measurements are not really perfect here. So this looks like a straight line. So if you extend it slightly it should go something like this but it has a different slope and this is a common problem initial when you put the sample in and put the grips on and try to do the experiment initially maybe there are small adjustments that has to take place before proper loading happens and your measurements of load and displacement are reliable. So this is the so we need the cleanup part we have seen in the data itself that it requires cleaning up we also now see that there is a need for smoothing of the data. So the exercise now is to do both the cleanup and smoothing. So to do that okay so let us take this so we need to read the data and then this gives you the length of the data that is how many data points are there and then we are going to go through each of the data points and then what we are going to do we are going to say that if the strain is 0 we are going to keep track of what is the stress okay. So once you go through all data points so you will see beginning at the beginning where it measures strain to be 0 even though the stress is not 0 you will know what is that stress value and because it keeps rewriting to the same value you will know what is the highest stress value for which the strain is marked as 0. So that value will be stored as B for us and then we are going to take the data and whatever values which are greater than 0 in terms of stress and strain those are the only things that we are going to consider and we are also going to remove this so we are going to offset the stress in such a way that it starts at values when the stress is non 0 the strain will also be non 0 okay. So the previous stress value which shows some value so we are going to subtract it out so that it starts at 0 0 okay. So that is what is being done here let us do this okay so you can see that B is 3.37 and from the data also we see that 3.37 is the value at which it still shows 0. So if you subtract 3.37 from all the stress you will see that it is 0 0 and then 0.01, 0.01 and so on and so forth. So we are going to use this and you can see where the noise comes from so 3.38 and again it becomes 3.37 so it will give you as 0.01, 0.01 and 0 and 0.01. So this is the small wiggly thing that you see in the plot. So this part of the data now if you let us say head x okay so you can see that it gives you only positive and non-zero stress and whenever the stress is non-zero you also see that the strain is non-zero okay even though there is a small noise because here it is 0 so it should be 0 but you do see some noise there. So this is the data now we have using this data now we are going to do the analysis. So the first thing to do is to take out the linear portion of the curve and fit it to a straight line and from the slope of the curve we can evaluate the modulus right so that is the first exercise we want to do. So let us do that exercise. So how do we do that and here is a code which does the smoothing first okay so let us complete the smoothing first okay. So now because we have edited the data a little bit and removed the portions where we had some cleanup and negative stress and things like that so we have removed them. So we have to get the new size of the data so that is what A is and then we are going to use this M to be 25 that is this is the size of the box that we are going to use to do the smoothing and this can be different so you can actually play around with it and find out what is the right number which gives you a smoother data and the smoothened data we are going to store in the variable X so it is a data frame and it has so many data points and then it has two columns so stress and strain which are smoothened values which are going to be stored here and index is a sequence so it starts from 25 and it goes up to the length minus 25 and it increases in one and what we are going to do as you can see here is that we are going to take every data point and we are going to take 25 data points before and 25 data points after and we are going to average the data points for stress and strain over these data points that we have chosen and the averaged out value we are going to store as the stress and strain at that point. Suppose if I take i so index it starts with 25 so 25th point that I will take then I will take 24 points before that and 24 points after that including the 25th point so now you have 49 points so that is why it is 2 into m minus 1. So we have averaged this 49 points stress and divide by that and we are storing it as the stress at that point which is the 25th data point in the original data set the cleaned up data set and so we are going to then plot this stress and strain that we get from this smoothened data and then we are also going to see the linear portion of the curve. So let us first plot the curve smoothen and plot the curve so you can now clearly see that the data is very nicely smoothened and now you can look at only this portion of the curve which is where I am restricting the x limit to go from 0 to 0.2 so this is 2.5 so we are restricting ourselves to very small region and you can see that it is a straight line. So you can also do a little bit maybe 0.5 or something so you can see that this is the this is why I am plotting up to 0.2 because if you plot up to 0.5 you can see that the curve changes so you can for example let us see the full curve so this is the curve right so up to 2 if you plot this is the curve so up to 0.5 if you plot you see this part and from that curve you know that somewhere around 0.2, 0.25 is where the changes and if you plot for the smaller region you can also see that there is this small portion so this is the curve and if it is extended it should go like that but the curve goes like this so there is some small error here and so if you leave this out the remaining portion is actually the straight line response which is the linear response from which one can calculate the modulus. So let us do the modulus calculation. So to do that so we are going to take first 200 points and take the strain and stress and we are going to plot it and we are also going to fit it for a straight line and we are going to plot the fitted line in red. So you can see that these are the data points and these are the this is the fitted line and obviously there is some small region here which is to be discarded and the rest of the data actually fits very nicely for the straight line. And so if you look at the fit summary you find that the slope is 658.57 and if you look at original data then you realize that the stress was given in MPA so if you look at the value of 658 MPA so that is so it should give you the modulus in GPA. So you can fit the straight line and you can get the modulus value and of course you can also check your fitting by plotting the residuals. So you can plot the residuals and see and of course the residuals are not looking like randomly distributed so there seems to be some methodical errors and you can also do the QQ norm to check if the error is random it does not seem to be it is really not a straight line so there seems to be some deviation from linearity. In any case so we have the data and we can also calculate the other quantities for example so the modulus is nothing but the slope that we have calculated and UTS value is basically the maximum in the stress so you can calculate the modulus 658 so that is 65.8 GPA and UTS is 196 MPM. So you can see that we have got values and if you look at the stress strain curve that we plotted sometime back you will see that so UTS is about 200 and we can go and look at the plots and you can see that the UTS is about 200 and the slope of this initial portion of the curve happens to be 658 so that is 65.8 GPA. So you can carry out further analysis okay what is the other analysis so let us do one more thing so what we are trying to do now is that okay we are going to plot the stress versus strain and we are going to draw a line which with the coefficient so which will be in red so you will see the stress strain curve with a line drawn. So let us do that let us remove this plot so that we will not be distracted by and then what we are going to do we are going to calculate the x what is the length of it and then we are going to take the data points in 5s and we are going to calculate the slope and store it in the variable called slope so we are going to take the full data and we are going to slide a box of length 5 and a bin of length 5 and using this we are going to calculate the slope and that is what this portion of the curve does and so it is going to store in a data frame called slope the strain stress and the slope at the given strain value and the given stress value so that is what this quantity is calculating. And so this is calculating using I mean just a simple difference so you take the ith point plus m and minus m you take the difference and divide by the strain so it is dy by dx and that I am calling as slope. So let us do this exercise. So you can see that this is the stress strain data which is smoothened and this is the line that we have fit so one measure of the strength you can already see the deviation from yield strength happens somewhere around 130 that you can see very clearly and so you can say that yield strength is 130. We have already calculated UTS that is somewhere around 196 or something and from this plot itself you can make out that this is where the slope change happens but you can also do this using our other calculation that we have done in terms of slope and let us do that to know what happens. So let us plot the slope and also list out the first 70 values to know how it looks. So you can see that initially there is some error and then there is a constant value and then there is a slope change and then it becomes another constant value here. So from the stress strain curve it is clear what these values correspond to because if you look at it so there is some initial problem and then there is constant value more or less and then there is a change and once the change takes place and this portion again you can consider as sort of linear and so it will show you this is a much steeper curve it will show you a much smaller but constant slope and that is what is being shown. So there is initial some transients and then there is some constant value and then there is a change over and then there is becoming. So somewhere around 0.35 or 0.4 is where this other slope is coming in so this is the transition region and we know that the linear limit is somewhere till 0.12 or something and then there is a change over and here you can see about 0.4 you get the the change to slope so it becomes plastic. Now here in this data now you can see the slope is at different strains it is plotted and you can see that so there is initial transient 327, 290, 203, 158 etc and then it reaches a sort of constant value here we know that it should hover around 680 and that is what it is doing and after that of course there is a change that happens and somewhere around 0.3 right is where this becomes sort of straight line 0.3 to 0.4 you can see 0.3 to 0.4 the value changes and about 0.343638394 so it becomes sort of constant value here and you can see that at that point where the slope change has happened the value of stress is 132 which is what we also saw from the plot that the value the yield stress is this. Of course there is one more way to calculate the yield stress which is to look at the take line which is parallel to this line but starts from 0.2 percent and then wherever it intersects with this curve is also the 0.2 percent proof stress but we are going to leave that as an exercise for you to do. So this script now shows how to take the data how to clean up the data how to smoothen the data and once you have smoothened the data you can do fitting and which is also something that we have learned and from the fitting parameters you can evaluate the quantities and you also know the error now in the fitting parameters using which you will also be able to tell to what extent is your parameter that you are estimating namely in this case the modulus what is the error in the modulus estimation that also you will get from the fitting exercise and then you can do other things like measuring the UTS and measuring the deviation from linearity which indicates the onset of plasticity and so on and so forth. So we will also leave the brass tensile data the biggest challenge that one faces is that let us take a look at this. So you can load it plot it you can zoom in on the elastic part and you can clean up smooth and you can pick each data point and average and that is how smoothing is done and you can take the linear part fit a straight line and get the slope so that is the modulus you can check the fit by plotting the data and fit and residuals qq etc. You can find the yield stress you can get UTS but how do you automate it so that you know if I give some other different code the data the same code will work and give me the values. Now that is challenging because we have used some values like for averaging we use the bin size of 25 to smooth and for getting the slope we used a bin size of 5. Now how do I do all these things in an automated fashion so that the value comes to me directly that is a harder problem but the one that I will leave you to play with and we will put both the data sets so that you can reproduce the results that I have shown as well as do it on bras for yourselves. Thank you.