 Welcome to dealing with materials data, in this course we are trying to understand the collection analysis and interpretation of materials data. We have done two modules so far, one is introduction to R, the second one is descriptive statistics using R, this is the summary of the second module on descriptive statistics using R. So we have learnt how to visualise data, so we have learnt how to use scatter plot, dot chart and stem and leaf plot for visualising data. Then we have learnt how to prepare rank based reports of data, which includes cumulative distribution, histogram and box and whisker plots. We also have learnt how to prepare summary reports for data, mean, median, variance, standard deviation and quantiles are the quantities that one calculates in the summary based reports. One has to learn about the significant digits in a given data, sometimes when you do this analysis computer returns a large number of decimal points, but beyond the point some of these numbers are not meaningful and so we should not report them. This is like for example, if you say that price of some three things is 100 rupees and each one costs how much, so we are not going to report the resulting number beyond second decimal place because below paisa there is nothing, there is no meaningful number that you can quote. So this is similar in all cases, for example we have seen that conductivity measurement itself is up to first decimal place, then it makes sense to report the means and standard deviations etcetera up to first decimal place. This does not mean that when you are doing the calculation you will always do only up to first decimal place, it is a good idea to keep the extra digits and round off only and the last step when you report the numbers, so but it is important to know the significant digits in any given scenario and report data only up to that. We have also learnt how to report errors, you can report it in absolute terms and in relative terms and we have learnt how to present data with error bars because in order to understand trends in any given data it is not sufficient just to look at the mean values but you should also look at the error bars in the data. And we have learnt how to classify errors and we have learnt that error propagates and we have learnt how to quantify this uncertainty propagation, either by using some analytical calculations or by using Monte Carlo simulation. So either way you can find out how the error propagates. So these are the things that we have covered. We have also learnt a few important things in doing this course in this module. First thing is you have to understand your data and the errors. So visualizing the data is a very good way of understanding the data and visualizing with error bars is a nice way to understand the errors in the data and it is always important to pay attention to the outliers, some amount of effort is needed to understand why they are there and it will help you improve the experiments or understand what is happening better. We saw one example where there was an outlier in the electrical conductivity measurement and little bit of analysis showed that the measurement methodology was not applicable for that scenario and that is why we got some meaningful and numbers which were not in tune with the rest of the numbers. So they were not consistent with the rest of the data. So we have also learnt that while analyzing trends it is important to incorporate the error information and we have understood the importance of propagation of errors and so if you measure some quantity and if it has some uncertainty any subsequent analysis that you do using that data also picks up the uncertainty from these quantities and we have also learnt that the materials data is broadly of two types, one which can be represented by summary reports like conductivity for example. So it follows nice normal distribution so it is sufficient to give the mean and standard deviation that completely describes the data. This is because every measurement gives one number and the fluctuation that you see is random, it is a noise so it can be very well described by the normal distribution. On the other hand some materials data can only be represented in rank based reports. You need to give things like histograms to describe these data and if you just use rank based reports like we found for example for phase 2 it looked as if there were so many outliers, it was as if you assume that the data is normal then obviously it is not normal because of which you find that compared to the mean up to 6 sigma you found data points and they were all only on one side. So it is obviously not a normal distribution. So it is better to describe such data using the appropriate distribution, it is not a good idea to assume that it is normal and assuming that it is normal or not can also have a say on understanding the further results. So we have found an example that if you are trying to calculate the uncertainty propagation using grain size data whether you are going to assume that it is normal and the errors are normal or it is not it is log normal for example is going to make a difference to your analysis. So it is important to know, it is also important to report these quantities sometimes in the literature these quantities are not reported then it becomes very difficult to understand or analyze the data. And finally wherever possible of course we should use both rank and summary based reports because that is the most complete information that one can give. The best is to actually give raw data and it is recommended to give it as supplementary data but in addition when you do any analysis we should give the analysis methodology and we should give as much of information as possible in terms of summary based and rank based reports so that a complete picture of data is presented. Thank you.