 Welcome to dealing with materials data, we are looking at the collection analysis and interpretation of data, specifically in this module we are looking at how to do descriptive data analysis using R and one of the aspects that we need to address is errors and we saw how to present data when we know what the errors are and in this session we are going to learn how to do the analysis and propagation of errors, how do we understand the error propagation. So, this is about errors and their propagation. The first question is why are there errors in experiments, typically you will find that any experiment that you do is prone to errors and uncertainties and the errors are of different types, first one is accidental error or a mistake and that is not discussed in detail in this course at all because they can be avoided. To avoid accidental error or mistakes one should do experiments carefully and if in spite of that if mistakes happen one should discard the data and redo the experiment and if you know that there is something that is not correct you should not take the data at all and you should periodically calibrate your equipments to make sure that the values that you are reading are actually the correct values. So, if the calibration does not give you right values then you should again discard the data and recalibrate and redo the experiment and it is also very important to repeat and replicate the experiments. Repetition means like we did in the case of copper conductivity for example, same sample you will make the measurement 20 different times. So, this is repetition, but replication means you will go through the entire process once more. Suppose you set 2.9 percent deformed sample, so you have deformed the sample and in that sample if you take 2 or 3 different pieces and did they experiment 20 times on each one of them, this is repetition of the experiment. So, you have a sample which has undergone 3 percent deformation and you take several samples and each time you do lots of measurements so that your statistics is good and if you did several such repetitions then if your experiment is repeatable and every time you are getting the same mean and similar standard deviation and so on. So, you know that your experiment is okay, but replication means you have to repeat the entire process once more. Suppose you had taken the sample again you do a 3 percent deformation and then you take several such samples and then do several experiments. So, this is also very important and repetition and the replication is one way of avoiding accidental errors or mistakes that there could be systematic errors. This happens if your calibration is wrong or if you have not done calibration, so you are measuring some quantity but it is systematically more or less depending on how the calibration went and it can also happen if you are not careful when you are doing experiments and it can happen for example in material science it is very common because there are impurities and for example the surface tension measurements have been improving over the period because people are able to do, people are able to remove impurities and get more and more pure materials and so the measurements have been improving. So, systematic errors can come because of the constraints, so you have some material and it has some impurities and there is no way you can change it then there is an error associated with it. And the most dangerous form of systematic error is some error because of some unknown reason because it is unknown there is no way you can correct for it, but one way that one can learn about such unknown reasons are correct for it is to reproduce the results in a completely different laboratory, in a different group for example if they are also able to reproduce the results then you can assume that there might not be errors of this type even though one cannot completely rule them out but at least if the experiment is reproducible in different parts and in different conditions in different labs one can assume that systematic errors are also not there. And the third type of error which is called random error or uncertainty is due to noise and due to precision of equipments, so this error cannot be avoided and it is also not possible to predict it. So, the embedded experiments and analysis will give some idea about these uncertainties, so like we did for example you take the same sample and everything is same you do 20 measurements you do not get the same number every time. This is because of random noise or there is some problem with the precision of the equipment, the equipment can only measure up to some precision, so you are getting numbers which are differing beyond that precision. So, there is an joke about engineers which says that you measure with a vernier caliper mark with a chalk and cut with a saw. So if you are planning to mark with a chalk or cut with a saw you should not be measuring using a vernier caliper you know just ruler will do. In other words when you are doing experiments you should know the different errors that you are going to encounter and so if it is not meaningful for you to measure something very very accurately because you later know that something else is going to mark that accuracy or it is not going to give you to that accuracy then you can save yourself lots of time effort money by doing things only up to the required accuracy and precision. So this is very very important and the next thing about errors is that let us say that you have some random errors or uncertainty in one of the quantities, but these errors actually propagate through your calculations to other quantities. The skin depth formula that we discussed earlier I mentioned at the end of that session that the formula is the delta is 664 by square root of f mu r sigma we can assume that f and mu r are known and they do not have any error and I have just some given number which might not be true, but at least for the moment let us assume that they are given f is 60 kilohertz and mu r is 0.999994 if so then any error that is there in sigma or uncertainty that is there in sigma measurement will actually also affect the delta measurement. Let us say for argument sake that sigma is 100 is what we took but it has an error bar the error bar is plus or minus 2.5 let us say that it can lie anywhere between 97.5 and 102.5 what is the uncertainty that you will get in delta because you have uncertainty in the value of sigma is a question that one can ask and with this being a very simple case it can also be answered very easily. So let us simplify the expression a little bit so use the values of f and mu r and you get an expression which is 2.71 so how do we get that let us do that r can also be used as a calculator so we want to get 664 divided by square root of 60000 and 99994 right. So you get 2.71085 and that is what is given here sorry so 2.710777 and that is the value that is given here. So that is divided by root sigma is the expression for delta now the value of sigma is reported as 100 so if you calculate delta for 100 you get 0.27 or it is 0.3 millimeter. Now you can use the other extreme values that you have let us say that you have values like 97.5 and 102.5 you can again substitute these values in this expression and you find that value turns out to be 0.2745 and 0.2677 in other words they also happen to be 0.3 millimeters right we are not going to measure beyond this accuracy let us say then all the three values happen to be 0.3 so the error within the accuracy to which we calculate delta becomes negligible. However if we are calculating delta to some fourth or fifth decimal place let us say we were measuring in microns for example then these different values will actually have errors and so using this very straightforward method I mean we have a range we just substitute for the min and the max the extreme values and so we know what are the extreme values that delta itself can take so we know the error is in this range the value lies in this range so you can know the what the error is. Now there is also another way of calculating the same quantity now the sigma of delta that is the standard deviation in delta so there are two sigmas so one has to be careful the left side is the standard deviation that is obtained by taking the partial derivative of delta with respect to the conductivity sigma and multiplying by the uncertainty in the conductivity. So if you take the and it is modulus so we are going to take only the positive value so if you take this 2.710777 divided by square root of sigma and you take the derivative then it is sigma to the power minus half on which you are taking derivative so you will get minus by minus half and sigma to the power 3 by 2 in the denominator. And so that is the expression that is written here and you can see that this is the expression and now you know what is the sigma that you are using which is 100 and you know what is the delta sigma that is 2.5 and plus or minus 2.5 so you can put these values and you will get the error and that is the same 0.007 is what we calculated even from the range earlier and you are getting the same value here also. There is one more way of doing uncertainty propagation that is using simulations and using simulations is a more generic approach it can be used even when functional relations between the result and several other variables are not known and there is a library called Propagate and so you can use that to solve any of the complex problems of uncertainty propagation and we are going to do a few examples of that later but for now I want to take the same simple case you know it is a very very simple case can we just use Propagate and find out how the error propagation happens and that is what we want to do now. So let us go to R and let us do the so first we have to so we have to get the library Propagate so the library is in place now we can get this command. So what is this command this is the error that propagation we are calculating and that is using the expression 2.71 by square root of x and for doing error propagation we have to give either the mean and standard deviation or some data generated using a simulation and this error propagation itself is going to be done using Monte Carlo simulation and we want to look at the resultant values. So you can see that this analysis also so it is much more complicated so we will come back again and take a relook at this. So the mean is again 0.27 and so Monte Carlo simulation also gives you 0.27 and the uncertainty happens to be 0.0064 which is similar to 0.007 that we calculated earlier. So what is this Propagate and so you can use the help you can see that this is what we use for error propagation you have to give the expression and it is clear what the expression is and you also have to give some data and that is what we generated here. We used normal distribution 100 numbers with mean 100 and standard deviation 2.5 because those are the parameters that we assume sigma had a standard deviation of 2.5 and the mean of 100 is what we assumed and so you have to generate data of that type. So this is explained here in the our documentation so probably it is clearer if we use and so you can see that Propagate requires the expression and it also requires the data it should be either a data frame or a matrix it can contain the means and standard deviations and degrees of freedom and degrees of freedom is optional and or you can have a sampled data generated from any of ours distributions and that is what we have done we have actually generated a sample data from the ours distributions and so this this command that we have here is basically generates from the the distribution R norm. So it generates 100 data points with this mean and with this standard deviation and it turns it into a data frame and the data frame has to be named appropriately it should be called the X you know this labeling is important that is how it knows that this is the X. So in other words instead of doing experiments because we know that this is the mean this is the standard deviation on computer we are generating pseudo data and using that data into this formula we are finding what is the range in which the delta lies and using those values we are calculating how much is the error that we found in the delta so that is what it is done and the result is given and result is given in greater detail so some of these things we will come back and take a look at later. So of course you can also change so let us say that we change the R norm to be 10 what do you see and you can change it to be 1000 and what do you see. So as you increase the number of data points there is no change in these values it still remains 0.27 and the uncertainty still remains at 0.007 so you can see. So this is a way of doing the analysis so let us go back. So simulations is a more generic approach and propagate library allows us to do that. Now beyond this simple case we looked at a very very simple case so there is just an expression that connects delta to sigma and you can calculate error directly or using the formula which is based on the partial derivative or you can even do some simple generation of data using a Monte Carlo simulation then you can find out how the error propagates. But what happens if uncertainty is a result of two or more independent variables let us say you have a function or you have a variable its value depends on some three different variables and those variables have their own uncertainties and how does this calculated parameter its error depend on the errors on those independent quantities. And sometimes let us say a particular quantity f depends on x, y and z but the uncertainties in x, y and z might not be independent they might also be related to each other. So if there is an interdependence on the uncertainty of these quantities how does it affect the uncertainty on the quantity that we are trying to calculate. And sometimes you might not even know the functional relationship between the quantity that you are trying to calculate and the quantities on which it depends on even though you know what is the error on those variables. So in those cases how do you deal with and understand error propagation or calculate the uncertainty or do the uncertainty propagation analysis. So that is the part of the descriptive data analysis using R that we will do and that will complete this module on descriptive data analysis using R. So we will do that in the following parts. Thank you.