 course on dealing with materials data. Presently we are going through sessions on introducing special random variables. What happens is that when we conduct experiments, the results that come out of the or the outcome of the experiment, sometimes you know by the nature of experiment that it follows certain underlying distribution and we are trying to define all those specialized distributions which may occur in many of our experiments and that would make us understand the results of the experiment the data behavior in a more simplistic manner. So what we did in the previous session was that we found that if the data or the result is countable in nature, then underlying distribution could be of a discrete nature and they are called discrete distribution. This way we introduced several discrete distributions in the previous session, the discrete uniform distribution then we introduced Bernoulli trials where there are only two results success or failure, then we introduced three derivatives or the three distributions derived from the Bernoulli trials that is you conduct n independent Bernoulli trials and you count the number of successes that becomes a binomial distribution. If you conduct the Bernoulli trials until you account, you counter the first success it is called the geometric distribution and suppose you conduct X Bernoulli trials until you come across the nth success in the trial then it will result into negative binomial distribution. Then we introduced a Poisson distribution which says that when the number of trials are extremely large it tends to infinity and the probability of success or the probability of failure whatever you want to call it tends to 0 in such a way that the multiplication of the two that is the product of n and p remains constant and we call that constant value lambda then it tends to follow Poisson distribution. Then we also introduced a hyper geometric distribution where you have total n objects of which m objects are of a certain kind and you are drawing n sample of size n without replacement from it and then you are trying to find out that there are exactly X items of the type m and this is what the distribution it arises is a hyper geometric distribution and at the end we gave an example of 3D atom probe field ion microscopy in which all these different kind of distributions occur naturally in order to estimate certain probabilities. So, now in this session we would like to have we would like to have a look at all those distribution which may arise when the experiment gives out data which is of continuous nature and therefore the underlying distribution are also of continuous type. We will introduce primarily uniform distribution, normal distribution and in further slides we will introduce some other distribution such as chi square, t distribution and f distribution which are derived from the normal distribution. But first in this particular one we are going to study normal distribution in details. So, if there is a random variable which takes on value between two fixed values a fixed numbers a and b then if the probability that X takes on value any value between a and b is given by 1 divided by the length of the interval b minus a and it does not take it takes a value 0 otherwise it is called a uniformly distributed very random variable between a and b. I have not written the notation. So, let us write down the notation. In such case it is said that X is distributed uniform between interval a and b. The expected value of X or as we were previously calling it mu is equal to a plus b divided by 2 and the variance of X which we also denote by sigma square is given by b minus a whole square divided by 12. As I have said in the previous sessions these derivations I will leave it to all of you to try it out yourself. The random number, this particular distribution application comes in a random number generation. Let us see how it happens. So, if you consider the case that X is a random variable and I am not saying it is a uniformly distributed random variable by the way X is any random variable and f of X denotes is cumulative distribution function. If you want to recall, please recall that f of X is nothing but probability that random variable X takes on value less than or equal to small x. In that case if we look at this f of X, the CDF itself as a function of random variable X then y is equal to f of X. This is also a random variable and this random variable is distributed as uniform 0, 1. This matter can be proved. We are not going to cover the proof here but it can be proved that any CDF of a random variable X itself can be considered as a random variable and in that case it is distributed as uniformly between 0 and 1 and therefore we can have X is equal to f inverse of y. Now, if a random number is generated from 0 and 1 uniformly distributed 0 and 1 distribution, then the random variable X can be calculated by finding an f inverse of the uniformly distributed random variate y. So, what it is saying here, let me clarify it. I think I have not clarified it properly, we go back. So, here we go. What we are trying to say is that if a random number y is generated from the uniform distribution which is here then y is distributed as uniform and then you take X as f inverse of y, then you will have this f is the CDF of X and then you will have a random number with a distribution of CDF f and this f could be any CDF that you are looking for. How do you generate this y with having a uniform uniform distribution between 0 and 1? Well, there are number of pseudo random number generators which generate such values of y and therefore the random number any random number with a distribution function CDF as f can be generated from this. So, this is the application of the uniform distribution which is playing a very major role in any kind of a Monte Carlo simulations. The next distribution that we wish to introduce is the most commonly used distribution which is called Gaussian distribution or normal distribution. This is actually a distribution which was realized first time by Gauss sometime in 18th century. If I am not mistaken, I might be mistaken on that front, but Gauss realized that this is a interesting function and it has a, it called it an error function. It arise in this manner. When you conduct any experiment in exactly identical conditions, the results are not always identical. There is an error in it and it found that this error itself is following a certain pattern and that pattern he called as a distribution which came to be known as a Gaussian distribution. It also arises as an error function and we will see the distribution, the relationship of the normal distribution with the error function in the next few slides. So, if a random variable X is say to is following a normal distribution, then its probability density function takes on a form of this nature which is 1 over sigma square root 2 pi exponential 1 half X minus mu over sigma whole square and the X varies from minus infinity to plus infinity. It takes on any value in R. The parameters that are used here are mu and sigma and these parameters are there the mean value and the variance of this distribution. So, if X follows a normal distribution with mean mu and variance sigma square, then it takes on a form of this nature and again I have not written here. So, let us write down what is the notation for that. So, in such case you say that X follows normal distribution with mean mu and variance sigma square. If you take a random variable Z which is X minus mu over sigma which is the exponential part of it, then this Z variable has a density which is 1 over square root pi exponential 1 minus Z square over 2. You see that there are no parameters and this is also a distributed as a normal distribution with mean, it means that Z is distributed as normal with mean 0 and variance 1. This is also called a standard normal variate. It is called a standard normal variate with mean in 0. So, let us review that is this X. We say that follows a normal random distribution if its pdf takes on this form which has a two parameters mu and sigma. Mu is its mean value and sigma square is its variance value. If you take a transformed variable Z which is X minus mu divided by sigma which we also call normalization, then it has a pdf of this nature. It is actually a standard normal variate with a mean value 0 and a variance of 1. Now, this distribution has a beautiful bell shape. This distribution takes on a varied beautiful bell shape, mean stands in the middle and it has a nice bell shape curve. If your mean is 1, it sets here. Suppose you have a mean of 2 and the standard deviation is same, say standard deviation of only 1, then it will also look like this. So, these are the different kinds of normal distributions with mean. So, with mean the normal distribution moves from right to left, it depending on where the mu is. If with respect to sigma you can imagine a sigma is larger, the distribution spread is larger, this should be sharper. This shows that sigma is probably greater than 1 and if the sigma is smaller, the distribution becomes even more smaller. This distribution also has another beauty in it. If you take this distribution, let us draw it again. Let us take a standard normal distribution or you take any this normal, this is not a very good nicely drawn curve, but let it be so. This is mean mu. If you take mu plus 1 sigma, mu minus 1 sigma, then this covers about 67 percent of your data. Please recall, we have worked with Markov inequality and then we worked with Shebyshev's inequality and in both the inequalities our idea was to estimate how much of data lies in a given interval from the mean value of the data. So, here if it is a normal distribution, if you recall that time we mentioned that if you have a normal distribution, it gets defined even more clearly. So, Markov's inequality, if you recall it says that probability of x minus a is greater than x is greater than k is equal to expected value of x minus a divided by k. So, it is less than or equal to sorry. So, this actually you can have a is equal to 0 also. So, this basically says that how many data points the it gives you an upper bound of number of data points that can lie in this region. Here you make it this says that if it is a probability it is a normally distributed. If x is normally distributed with mean mu and variance sigma square, then it says that probability of x minus mu divided by sigma is probability of this less than or we put it according to this. It actually says that this less than sigma is 67 percent. This is what it says. So, it gives you very clear answer. If you take mu plus 2 sigma limit, my scale may not be correct, please ignore that. If you take this limit, then it lies between this limits, it takes 95 percent of the data and if you take the values of all x lying between x plus 3 sigma and mu minus 3 sigma, then this probability is actually 0.9733. It means that sorry 9973, 9973. It means that more than 99 percent of the data lies between minus 3 and plus 3 limits of the plus 3 minus 3 of the mean value of standard of a normal random variable with mean mu and variance sigma square. So, we come back to the next issue or let us clarify what I wanted to say here in one go that if x follows a normal distribution with mean mu and variance sigma square, then we have a clear idea as to how many data points lie, what percentage of data points lie between mu minus sigma and mu plus sigma limits, which comes to 67 percent of the data would lie if the x lies between x minus sigma and x plus sigma. If it lies between sorry mu minus sorry it says mu minus sigma and mu plus sigma, if it lies between mean minus 2 sigma 2 standard deviation and mean plus 2 standard deviation, it covers 95 percent of your data and almost all data 99.73 percent of the data lies between the 3 sigma limits of the mean value mu. Now, we come to the next stage as I said the Gauss's error function is generally defined in this format. This occurs in a partial differential equation describing diffusion and as I said it is an error function. So, error function was defined in this manner and you can see that this error function has direct relation with the normal distribution. It is if the error function actually describes the probability of normal random variable y, when it lies between the range of minus x and x, when y is a normal random variable with mean 0 and the variance of half. So, now let us quickly summarize. We have introduced first continuous random continuous distributions follow arising from continuous data which comes out of experiments. We introduce two distributions here. One is a uniform distribution with a application for random number generation and we introduced a normal distribution and we said that normal distribution has another quality that you can define, you can refine the Markov's inequality or Shebyshev's inequality in a much clear way to say that the 1 sigma limit from the mu that is mu minus 1 sigma to mu plus 1 sigma contains about 67 percent of the data plus 2 sigma limit mu minus 2 sigma to mu plus 2 sigma value cover the 95 percent of the data while 99.73 it means that almost all data is covered with minus 3 and plus 3 limits from the mean value minus 3 sigma to plus 3 sigma limits of the mean value. Please recall when you talk about the 6 sigma limits you are talking about 0.69 times 99999 times data lying within the 6 sigma limits of the data. If we talk about it in future maybe you can just understand it that is why I am giving you and then we established the relationship between the normal distribution and error function. We said that error function as defined in physics and other theories it is defined as if y is a normal random variable with a mean 0 and a variance half then the error function is the probability that the y lies between the two quantities minus x and x. Thank you.