 In one of the previous videos, when we were discussing a simulation of Langevin equation, we sampled from a Gaussian distribution. We did not develop any algorithm, we used an existing mathematical algorithm, mathematical command for sampling. In this lecture I want to let you know how we can develop our own program for sampling from any given distribution. In case we need to do it because these are the unique and readymade commands are not available. For sampling from any given distribution of course, minimum requirement is that we have a random number generator, which generates in random number randomly from 0 to 1 uniformly distribution distributed. So, once that facility is available that you have a random number generator which generates number you as per uniform distribution between 0 to 1, then that is sufficient for us to be able to generate random numbers or sample from any other given distribution. Supposing we have a distribution f x, supposing in a distribution is given to you a probability density function, we must have clear when we say distribution it is a probability density f x of a random variable x. And let us say x domain is some a to b, it can be minus infinity to infinity 0 to infinity 0 to 1 anything in some domain. What we can do is since f x is a normalized distribution, we can construct a cumulative function, cumulative distribution f of x equal to let us say the lower limit a to some point x of the probability density. Since it is a cumulative the total probability is should be 1. So, we know that f of b should be equal to 1, since it is defined as with respect to a that is integral a to b f x f x dx is 1. So, if my f x has this form for example, some shape a to b then the cumulative plotted as a function of x, it will be 0 before a and it will have some shape approach 1. So, given this information and the knowledge that the maximum value of f x is 1, this enables us to be able to sample from the original distribution f x. For that we follow the following sequence of steps and I will explain you the basic logic. The basic logic is that if I have a random number generator which generates a random number r lying between 0 and 1, then this domain 0 and 1 perfectly matches with the y axis of the cumulative distribution function. Hence, there is a 1 to 1 correspondence between the cumulative distribution function and the random number. Since it is a random number coming from uniform distribution, so the density is constant. Hence, if we cut that constant along horizontally and cut the x axis of the cumulative distribution by a vertical line then the number that we get should then correspond to the random sample coming from f x. So, if r is now equal to let us say cumulative of f x, then d r should be equal to f prime x dx and we know that f prime f x is f x itself because it is cumulative dx and hence the density which was uniform in d r is now transferred to a density which goes as proportional to f x in dx. Hence, for every number r that we have obtained from a random number generator with the interval of 0 to 1, the corresponding abscissa cut by the cumulative distribution function is the random number generated from the given distribution f x. We now illustrate this, we have given it in a stepwise detail in this slide. So, you can see that the first step therefore, is the select the distribution from which to be sampled. Just an example, let us say f x is e to the power minus x. This is a very standard distribution, but it is easy to compare with the other distribution, compare with the other algorithms. So, let us use this, but it can be anything, it can be a distribution coming out of an exotic function which is not tabulated, but which were we should be able to somehow integrate it. Now, in the step 2, you construct a cumulative distribution f x of course, you should be 0 to x not infinity 0 to x. Let us make it f x prime dx prime. So, integrated from 0 to x f x prime dx prime 1 minus e to the power minus x which is the integral of this because the domain of x here is 0 to infinity. So, that is why 0 to x we integrate. Now, then you plot f x as shown in the figure. So, this is the cumulative that you are going to plot the green line. Now, step 3, generate a random number here generate a random number from a uniform distribution between 0 to 1. Let us say one obtains a number 0.75 like yellow line here, you got 0.75 let us say. Now, you draw a horizontal line and wherever it cuts the cumulative distribution drop a vertical and that will cut the x axis and some x number will come out of it. So, draw a line with the given ordinate and then mark the intersection with f x note the abscissa of the intercept and this is one value of the random variable let us say 1.5 has come out of it which is sampled from the exponential distribution. So, the cumulative distribution carries the signature of basically the original distribution function from which we have to sample. So, this is a very simple thing. Now of course, I have indicated it graphically for illustration the whole thing exercise can be done algebraically which is equivalent to finding the root basically dropping an abscissa is equivalent to finding a root of the cumulative distribution function. So, we can go on repeating this process now, you again run the random number program you will get another number it may not be 0.75, it will be let us say 0.2 then you will get correspondingly another number drawn from exponential. You can see that as the supposing your random number is very close to 1 then your x value corresponding to that will be very a large. So, that is the idea that it faithfully reproduces the distribution from which we have to sample. So, as I we did last time while discussing numerical program for the Langevin equation here also I have written this program in Mathematica which is a software which is now available with most institutes, but one could as well write in any computational language and MATLAB and other programs also will have this. So, you for example, first how to execute it numerically, you first define the function cumulative supposing it is possible to do it by some analytical formula in the case of let us say we take it as cumulative is e to the power minus 2 x. So, I am considering distribution functions f x as that derivative of this which will be 2 e to the power minus 2 x is my distribution function from which I have to sample. So, the cumulative is e to the power minus 2 x. Then you start a do loop this label 2 is basically beginning of a do loop. Last time we used for command here I want to use a loop labeling based looping. Then you generate a random number sampled from 0 to 1 and let us say let it be some value p, p is like the r that we mentioned in the previous slide. So, let us say we got some number p then you construct the function capital F x which is the cumulative x minus p, then you find a root of it. You can use a root finding algorithm of Mathematica which of course, will straight away give you root x naught which will be one of the random numbers generated from an exponential distribution. Supposing you want to develop your own root finding algorithm, you can use Newton Raphson method. So, here it is given we have to iterate in Newton Raphson you basically the formula says x of n plus 1 equal to x n minus f x n minus f prime x n right. So, this you can execute here f p means derivative of f prime which is the same as the distribution functions small f x that we mentioned. So, it is just a do loop once again it is a do loop within the do loop and the second do loop is for convergence of the Newton Raphson. This case it very smoothly it converges quickly it may not be always. So, you can put an escape criteria that if it approaches the difference between the successive differences in the roots if it becomes less than 0.001 it escapes. So, long as it remains more it again goes back and back hoping to converge, but supposing it is a oscillatory kind of a situation then one must be careful one must have some escape route to say after some 20 iterations it has to quick or something which we are not done here because this was a simple program. So, the first do loop or the inner loop closes here then it prints the random number that was generated the number of iterations that it took to converge the test of convergence that is just before the convergence value and the final converged value and a comparison with the Mathematica command based root finding which is x naught. You can just repeat this exercise now as many I mean until for all the random numbers you generate and so, we will get a sequence of random numbers generated from the exponential distribution. So, then the loop closes here. So, we can get any number of random numbers and this is just an example of some about 10 numbers sample output that Mathematica gives or rather we have asked it to give. So, these are the original random numbers which came from the uniform distribution 0.4 0.9 etcetera. In about 3 to 4 iterations the root finding algorithm that is the Newton Raphson converged and as a test you can see that in the second convergence itself it was about 2.268 and third one finally, 0.269 and by 4th it changed very little that is why it quit and it came here. So, you have now a series of numbers all of them are supposed to have been sampled from an exponential distribution. There is you will not find the correspondence between these numbers and these numbers is via the via the abscissa of the curve. So, it is not directly you cannot see it you can probably see it monotonicity, but that is all one can say. You can generate any number hundreds of such numbers if depending on the problem. Now, let us test that these numbers which we have generated indeed are not samples from the exponential distribution. So, what I did was I took those data took it to origin for plotting purposes, one could do it in Mathematica itself, but it was a bit tedious. So, I simply copied and pasted in an origin software and then grouped them origin I can immediately group them also and plotted it as a distribution. Very nicely you can see that it is about 500 data not just those 10 data points I think 1000 data points in a grouped in a 0.1 intervals and found that the finally, the obtained distribution is very close to the original 2 e to the power minus 1.998 to x which is almost 2 So, with the about 1000 data they are all following the exponential distribution which reconfirms that yes we have indeed sample from the exponential distribution and you must know that it looks straight line because it is a long linear scale. The plot is long e to the power I mean ln e scale and this is the x value. So, that is how you get it. So, this is one example one can practice it with various other distributions and develop expertise in sampling from any distribution you want and when standard software is available you can compare with the distributions from the standard softwares and convince yourself that your program is correct. Now, here we sample from various distributions use this for further purposes. The knowledge that we gained in sampling from a given distribution now we utilize it for demonstrating central limit theorem. So, numerical demonstration of let us simply say central limit theorem. While discussing random walk and various models central limit theorem was at the heart of our study. Central limit theorem is a heart of statistics and we discussed it at length the derivation of central limit theorem in our first few lectures. So, what was central limit theorem? If you sample IIDs randomly from any distribution which need not be a Gaussian. However, the mean calculated with finite number of samples drawn from those distributions that mean will hover around the true mean of the population in a Gaussian manner. So, that is the whole idea that is if y if we IIDs x i drawn from a sample drawn from a distribution having at least up to second moment having at least second moment. So, if x i then x bar obtained with some n samples is x bar which again will be a random variable, but it will be distributed in a Gaussian manner that is the probability let us say x bar based on n x bar this will be a Gaussian distribution. So, one often writes it as n with the mean value mu and standard deviation S e which is sigma by root n. So, this was basically our statement that it is normal distribution regardless of the shape of the original distribution. Let us try to sort of visualize it by actually conducting a simulation or numerical experiment and that is what we do in this we write a small program because now the process is very clear we have to generate a random number group them as a sample like doing an experiment I collected 10 points in each measurement. So, I have a group of 10 I calculate an average, but that is not exactly the true average. So, how far it will be? So, that has to be understood what shape it will follow that is what we have planned to do. So, here this is the program. So, in this program it is basically having 3 loops the idea therefore, I made it very clear that we have to test against a central limit theorem. So, we decide to samples from a distribution which is far from Gaussian. So, it has 3 loops the one loop is of course, we have to find Newton Raphson to be able to sample from the given distribution using uniform distribution. So, that will be one loop then we need to generate some values and that so, that we get an average and then of course, for a understanding the distribution of those averages of let us say sample of 10 I need to have hundreds of such values. So, I need another loop to generate large number of x bars. So, that is what this program is all about here we have for example, the unknown the distribution from which sampled we take it as 2 into e to the power minus 2 x. So, its cumulative is 1 minus e to the power minus 2 x its average actually is going to be 0.5 or so, sigma is also 0.5. So, it is a program which has which whose basic characters we know. Now of course, we can give various print commands etcetera. So, M max is let us say 11 here this is just for demonstrating it could have hundreds of them then we will have the number of 17 max is given as 17. So, which basically means x bar is calculated with 16 samples a max minus 1. So, then it prints out various values then it prints out the x bar obtained in k titration. So, the average it calculates. Now we need to list them that is what a table 3 does it actually calculates the random number sums each of them and then prints k and x bar and then it continues you can see that go to group 3 is the one where hundreds of samples can be collected. But right now just for a few printouts I have shown only 11 samples 11 replica of x bars obtained with 16 samples. So, these are the values number of samples with which x bar is estimated is 16 number of trials to find the distribution of x bar this is the just for this 16 values. So, you have 16 various values of x bar that is obtained we know that the true x bar is half, but you can see that values low values of 0.25 are obtained high values of 0.68 are also obtained and there is a distribution obviously. So, actually to get understand its distribution it is not sufficient to have 10 or 11. So, we have gone about 500 data points and that is what we have done here. So, 500 simulations of 10 samples drawn from exponential distribution. So, you can see that there are 10 samples the distribution is again the data is taken to origin it is sorted there frequency it is frequency counted and then it one has to be a little careful about the choice of intervals you can get different slightly different looking shapes depending on the interval. In any case this is just one interval I think the interval there is the whole thing was divided into some 20 or 15 groups I think 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12 about 20 groups you can see that the mean is supposed to be 0.5 and first thing you notice is the exponential distribution would have looked like a typical exponential or falling exponential like this on the positive side, but you can see that because the by virtue of the force of central limit theorem the mean calculated does have a Gaussian shape seems to be hovering around the mean value of 0.5 its width should be the standard error you can test every aspect of it. We can even repeat it with various numbers right now we did for 500 we can see with 100, 200 how it approaches a Gaussian we can also change the sample size it is 16 here 10 here it is shown. So, it could be any other number one can subject this to test whether the standard deviation here should correspond to the standard error sigma by root and all the tests can be carried out we are not done here because they are standard routine processes which I am sure one will be able to do it. So, we have basically tried to understand how to generate random number and how to apply it for validating the central limit theorem. We will the next lecture we will revisit this problem of generation of random numbers and understand its applications in random walk and also we will apply a numerical method of how to also solve some of the random walk models via differential equation formalism. Thank you.