 Welcome to dealing with materials data, this is a course on collection analysis and interpretation of data from material science and engineering. We are looking at some of the R tutorials, so we had an introduction to R and then we learnt how to describe data using R and this is the module on probability distributions. And in this module we have looked at discrete distributions, we have also looked at uniform distribution which is sorry normal distribution which is a continuous distribution and we are going to continue with continuous distributions. In this case we are specifically going to talk about log normal distribution. So log normal distribution is a distribution x is said to follow log normal if log of x is distributed normally. So the probability distribution function for log normal is 1 by root 2 pi and 1 by beta x exponential minus log of x minus alpha whole squared by 2 beta squared where x is greater than 0, beta is greater than 0 and f of x is 0 otherwise. If you use the change of variable y goes to log x, the resulting distribution is actually a standard normal distribution with alpha taking the role of mean and beta taking the role of standard deviation. So Kalamagrao is the one who came up with the law of fragmentation, he showed that large collection of particles which result from particle fragmentation. This is very important in mineralogy and geology and such areas where you are trying to break and make smaller particles and in such cases the particles their size distribution is actually log normal. So this is what Kalamagrao showed and in the case of grain size for example sometimes it is said that the data follows log normal distribution. I am going to show you one data which comes from a paper of Underwood ferrite grain size which we will plot and see that it follows log normal. But if you use our FIT DISTR plus FIT distribution plus library and try to do the fitting you will see that it is not quite log normal and this is common. In fact many data sets that is expected to be log normal I have verified and rarely you get a good fit for log normal. There is one more data set which from Smith and Jordan so it says mathematical and graphical interpretation of log normal law for particle size distribution analysis from Journal of Colloid Science and they also say that log normal law is excellent for particle size distribution analysis and they also describe in their paper how to gather data and how to analyze the data for log normal distribution. So we will take data which is given in this paper and try to see if it follows log normal and also try to generate from our model the data and try to see if we can compare the distribution that we generate with the empirical data and say anything about the distribution. Of course the log normal distribution for in R the command is LNAM. So DLNAM, PLNAM, QLNAM, RNAM are the commands or the function calls. So you can get the probability density, cumulative distribution function and quantile function using these three functions the random deviates are generated using RLNAM. So we are going to use standard mean of two and standard deviation of one and we are going to generate these quantities just to check. So we will now do the R tutorial for log normal distribution. The first exercise as usual we are going to make three plots and we are going to plot between 0 and 15 and the first one is log normal the probability distribution function. The second one is the cumulative distribution function and as we indicated for DLNAM the mean log is 2 and standard deviation log is 1. So that is the value we are using. So you can see the mean log 0 standard deviation log 1 is what by default it uses but you can change those values and of course I am also going to do the quantile plot. So there are going to be three plots. So you can see that this is the distribution and this is the cumulative distribution function and this is the quantile plot. Of course you can plot just the plots individually to get a better idea how they look. So this is the distribution function of standard log normal distribution. So if you see some data follows distribution like this then you expect it to be log normal. So that is what we are going to see. You will see many data that looks like this but it need not be log normal because there are competing distributions which describe similar kind of data is what we are going to see and of course we will see the cumulative distribution function goes like that and the quantile function because it is the inverse of the cumulative distribution function. Of course one can generate random deviates from log normal distribution and that is what we will do and plot that data as a histogram. So this generates random deviates from log normal distribution again with the same mean and standard deviation and then we are going to have a histogram plot and you can see that the data goes like this. So it has a long tail but it peaks somewhere closer here in the beginning and then it goes down. So let us take a look at couple of data sets. The first one that I want to use is from Underwood and so let us read that data first. So it is for ferrite size versus numbers that is what Underwood has given. So this is the size and these are the numbers. So if you plot this, so you see that the data goes like this. So Underwood says that this could be expected to be log normal approximately and let us check. So we want to use the library fit dastr plus then we want to take this data and we want to check whether our data follows. As you can see if we try to look at the data then it does not follow log normal really. Log normal is somewhere here and our observation lies somewhere in beta. This is for V1, you can look at V2, in fact V2 is more or less like uniform. So it is clear that difficult to see that this data follows log normal distribution and there is another data which is from Smith and Jordan like I told you and let us try to load that data and see what happens. So we want to read the Smith Jordan log normal data, we want to plot X and then we are going to use fit distribution plus library and describe the data of size. Again here again the data seems to be in the beta, it is not really in, however if you look at the data so you can see that it does look like log normal distribution very nicely. So even though it looks nicely like this when we try to do the fit this dastr plus you see that it says that the data is not really following log normal, log normal means it should have been somewhere here but observation falls somewhere in beta. So this is a problem, it is very difficult to actually know and there are other competing distributions which I will also give and something like beta which by changing parameters you can fit the data well might do that. So it is really difficult sometimes to know which is the right distribution that the data follows even though if you know for physical reasons that the data is expected to follow distribution that is the distribution you should use. So we are again going to take a look at the Smith Jordan log normal data and I am going to calculate the mean and standard deviation of the size data and I am going to generate random deviates with that mean and standard deviation from the log normal and then I am going to plot it, then I am going to plot the data. Then we will see whether there is a better matching that we can see and of course you can see that the histogram of data that I generated with the same mean and standard deviation looks like this and our data also looks like this. So it does look like we have a data so every time I run you get a different distribution because the random deviates are different. So you can see that every time the deviates that you generate seem to fit very well which is not surprising because from by looking at the data for example you can see that it looks like the log normal. So in the case of grain size and such fragmented particle size etc. it is expected that the distribution is log normal so it is always useful to try to see how closely does the log normal distribution describe the data. So log normal is an important distribution so it is used especially in areas like this where there is reason to believe based on some of the theories like Kolmogorov's law of fragmentation for example that the data is expected to follow log normal distribution but sometimes for example grain size there are other competing distributions that will describe what is happening. So we have also seen in some cases the grain sizes it was very different. We have seen data while we were doing descriptive statistics so especially grain size data it is very difficult to say that it should always follow log normal but in other cases where you expect log normal you will try to fit the data to log normal and see even though if you do blindly and try to fit the data to available distributions there are other competing distributions which will show up and probably show that they have better fit to your data. So it depends on your needs and purposes if you know for sure that the data should follow given distribution that is what you should try to fit for. If you just try to get a description it does not matter whatever distribution that you can get then of course you can explore and find the distribution that describes your data the best. So this is log normal distribution and like I said I have found it very difficult to find any data that if you use fit DSTR plus will show that it is log normal. So I am not sure unless maybe if you just generate random deviates and give it to fit DSTR plus it will show that it is log normal in all other cases I have found that there are always competing distributions and most of the times it is beta that it shows to be better fitting. But it is a good exercise for you to go look up data as part of this course you should also train yourself to go look for data or generate some such data and try to do the analysis and see if you can get better data that fits log normal distribution. Thank you.