 Welcome to dealing with materials data, we are looking at the collection analysis and the interpretation of data from material science and engineering, we are in the third module, this is a module on probability distributions and we have looked at several discrete probability distributions and we are now going to consider one of the continuous probability distributions which is called as the normal distribution, as the name indicates it is a very common one and it is also very important one for several reasons. So we are going to spend quite a bit of time understanding normal distribution and so in this session we are going to look at one aspect of normal distribution namely that many a times if you make measurements because of random errors or thermal noise, the value that you would measure repeatedly if you make the experiment will not be the same number. The error will actually make it get distributed as a normal distribution. So this is the context in which we are going to understand normal distribution first. We will look at other contexts and also the importance of normal distribution and we have been using normal distribution in some cases earlier. So we will even revisit some of those and try to understand what we did and why that makes sense in the context of understanding normal distributions. Let us consider any measurement that you are making. Let us say that x is the measurement and let us say that the mean value of the measurements after you repeated the experiment some n number of times is mu and let us say that the standard deviation of the measurement is sigma. We are assuming that the reason for deviations from the mean is the random noise. So this rules out data such as grain size distribution because we saw that you can make a measurement and the measurement can give you mean and standard deviation but that might be because of some other distribution that is there in the system. It is not because of random noise. So it is not related to the normal distribution. It need not be related to normal distribution. We have also seen in the previous example for example that mixing to or convoluting to distributions can result in some other distribution. We saw that normal plus hyper geometric actually gives you binomial and so on. So normal can also be a result of some other distribution that is not the case that we are looking at at this moment. We are saying that if you have some random error or noise or thermal noise that will lead to a normal distribution in the measurements that you make and we say that x goes as normal distribution with mean mu and standard deviation sigma and we also define what is known as standard normal distribution for which the mean becomes 0 and the standard deviation is 1 and you obtain from x the z by doing the transformation that is you take all the measurements and subtract the mean and divide by the standard deviation the resultant variable will actually follow the standard normal distribution. Normal distribution is mathematically known as the gas function and here is its probability distribution function. It is 1 by sigma square root 2 pi exponential minus x minus mu whole squared by 2 sigma squared. So this is the normal distribution mean of normal distribution is mu and standard deviation of normal distribution is sigma. So that is how we actually built this distribution and standard normal distribution we are going to replace sigma by 1 and mu by 0. So and the variable is x minus mu by sigma is going to become z so it is z squared by 2 and 1 by square root 2 pi. So this is the standard normal distribution typically it is referred to as z and how do we work with normal distribution using r norm is the keyword so d norm p norm q norm r norm will give you the probability density cumulative distribution function and quantile function and random variates and let us assume the mu to be 20 and sigma to be 2 can we get the so as we did earlier. So let us look up norm so that is not the norm we want d norm. So mean standard deviation is what you have to give so if you want to have mean to be 20 and standard deviation to be 2 and so you can so let us make x let us say it goes from it is a sequence and it goes from 0 to 50 this thing so you can say x d norm x comma mu comma right. So this is the standard the normal distribution and you can of course make the cumulative distribution function that is just by changing it to p norm and the here is the cumulative distribution function. So for quantiles we have to make sure that the sequence runs from 0 to 1 so let us say that we want to say q norm and that is in terms of y this is y. So here is the quantile function and of course if you want to generate random variates so you have to say r norm and you have to tell how many random variates you want let us say we want 20 and the mean is mu and the standard deviation is sigma so we have r norm. So you can see that these are values which are centered around 20 and sigma is 2 so it will be between 18 and 22 is 1 sigma and 16 and 24 is 2 sigma so that is how these values will be distributed and of course you can make a histogram plot of this r norm and see that so you see that but if you generate more and more random numbers and you will see that it is becoming a nice bell shaped curve. So let us say that we want to make some 1000 so you can see now that it is a very symmetric curve about 20 and standard deviation of 2 so things are between 18 to 22 and 16 to 24 most of the measurements would fall. So you can make more and more nice looking the normal plots by generating more and more random numbers. So this is one aspect so now let us go back to normal distribution and so like I mentioned normal distribution in the context in which we are talking about is also because of errors that you see in measurement and we have already looked at one example of conductivity of ETP copper and we had actually plotted and we also tried to figure out what distribution it is and we showed that it is normal distribution. Now that we know the probability distribution function for normal distribution can we take the data and plot it and also get the do a simulation of random variants from normal distribution and compare the two. So that is what we want to do in this exercise so for that I am going to use this side of commands so here is the script. So we are going to read the data on ETP copper conductivity and mu is basically the mean so let us calculate from the data and sigma is the standard deviation so we have calculated from the data and z is random variate from a normal distribution we are pulling out some 20 numbers with this given mu and sigma and z, z is basically the probability density function using z that we have calculated and so we have made a data frame out of this simulated values that we have got. So we are going to make 3 plots first is of course to take the data and plot it as a histogram so we are using gg plot so this data is x and aesthetics is take x as a conductivity and make a histogram. So that is what there is a problem in reading the data. So this is the distribution you get and it says something about bins pick a better value with bin width and we can do that so you can give bin width let us say okay so you can make this and you see that this is the distribution that you get of course next you can do the plotting of the so this is the same mean and same standard deviation but we generated the data so it is basically a simulation and we are going to plot it as a line so this is the distribution that you find. Of course we can put them both together in the same plot that is gg plot allows you to do that so let us do this okay so you can see that these are the data that we measured and this is the simulated distribution that we are getting and of course here again it is complaining about so you can see that we have plotted both on the same plot and if you generate more and more random numbers so you will also get a distribution which looks like this of course the total number of data points we have is very small we have only 20 data points and it still shows a nice normal distribution that is because in this case truly the mean value or the actual value of conductivity is somewhere quite close to where the mean value is and all this distribution that you see is because of the errors. So this is one of the most important reasons why we are interested in normal distribution random errors are always distributed normally and that is what is going to allow us to do lots of analysis as you will see and we are going to derive lots of distributions from normal and we are going to use them also to understand data but there are other uses for normal distribution in material science and engineering in general everywhere you will get to see normal distribution but there are certain specific things which are very relevant to us one of them for example is that normal distribution is related to error function and diffusion typically gets solutions which are error functions. So diffusion problems is one of the problems and it is a stochastic problem so random walk of atoms is what leads to diffusion and so it is a naturally stochastic problem like nucleation it is one of the other important stochastic problems and so we will look at error function for example in the subsequent sessions we also had this probability scale that is based on normal distribution so we will try to understand probability scale. We tried to identify the distribution empirical distribution of a data whether it is normal or log normal or variable or things like that so that also is based on the normal distribution at some level so we always use normal as the sort of benchmark for understanding other distributions skewness and kurtosis for example is basically to tell how much the distributions deviate from some of the properties of normal distribution in terms of symmetry and in terms of tails and so on. So we will look at all these aspects one by one and understand normal distribution in greater depth and we will do all that using R in the sessions to come. Thank you.