 Welcome to dealing with materials data, we are looking at the collection analysis and interpretation of data from material science and engineering and we are in the module on data processing and in this session we are going to talk about bootstrap method which is one of the methods to get the accuracy or estimate of the quantities without actually knowing anything about the probability distribution because in the other methods where we have data and if we know what probability distribution it comes from then you can give better estimates for intervals and so on. But if suppose you do not know anything about the underlying probability distribution and you cannot make any assumptions is there a way to get an interval estimate and that is done using the bootstrap method. It is a distribution free method so we are not making any assumption about the distribution from which that data is sampled. Berenson in his students guide to error and data analysis gives details of the bootstrap method which I strongly recommend that you go through. The idea in bootstrap method is as follows, you consider the given data there are n independent measurements of equal weight but we do not know from which distribution these data is sampled. So this is the assumption that we are making we are assuming that there are n independent measurements and they are all of equal statistical weight just that we do not know the distribution. To accurately calculate the mean you need to know the distribution from which the mean comes which means you need several such samples and bootstrapping method is a method to generate such samples from the existing data set it does that by sampling the data with replacement and so you sample it thousands of times for example you will see an example and then from that calculate the distribution for the mean and so from each of this data set basically we are calculating the average and then getting the distribution for the average. Notice that because we are sampling from the same data with replacement you cannot really get any more information about mean or standard deviation from the data it is the same data. So this mean and standard deviation do not change at all the reason why we are using the bootstrap method is to get the confidence intervals for the given value like what is the probability that the mean will lie in this range. One also has to be careful because we are using the same data set the minimum value and maximum value are fixed. So all the data sets that we generate by this random sampling will also have the same minimum value and same maximum value which means if there is any contribution from the tail that comes you will be missing out on that information. Now having said that it is still useful to use bootstrap to get confidence intervals and we will show an example we will again take the ETP copper conductivity data will not assume that it is normally distributed because if you do then we have seen how to get the confidence interval assuming that the standard deviation is known and assuming that the standard deviation is not known using either normal standard normal or t-distribution. And now we just want to do it completely using data we do not want to assume anything about the underlying distribution so can we get the confidence interval. So for that we have to use the bootstrap method let us do that and for doing that I also recommend that you go through this chapter 3 a guide to R for bootstrap confidence intervals by professor Brett Larget and it is part of a statistical course and it is available. So please do take a look at it and this tutorial is basically based on that chapter. So let us first take our data and calculate the mean so that is the first step. So we read the data and we calculate the mean conductivity so that is 101.32 now we are very familiar with this data set so we know that that is the number so then what we are going to do we are going to generate 1000 bootstrap data sets right so let us do this so what does it do. So we are going to generate 1000 data sets those are the boot samples from the sample that the data that we have and in each one we are going to have the same number of data points 20 data points and it is done by sampling with replacement and then we are going to calculate the statistics for the data sets that we have generated 1000 data sets that we have generated right. So once we have done that of course we want to analyze what is it that we have got and for that we are going to use this so what is this let us take a look at it. So first we are going to plot the statistics from the bootstrap data that we have got and we are going to calculate the error and using the error we are and this strange formula here is to make sure that we have the error with the right significant digits and then we are going to say the confidence interval with so much probability that the data will lie in this range. So that is the so you can see that the data generated by bootstrapping and the distribution that histogram follows and you can also see that the confidence interval is 101.2 to 101.4 which is the same as what you got from the T and normal so this is the confidence interval within which the mean will lie. So to summarize in addition to knowing the distribution from which the data comes and so you can estimate the interval there are distribution free methods as part of robust methods you can use to get an idea about the confidence interval. So this brings us to the end of this module on data processing. So we will summarize it in the next session. Thank you.