 Welcome to dealing with materials data, in this course we are looking at the collection analysis and interpretation of data from materials science and engineering. We are in the module on probability distributions and we are going to talk about uniform distribution. Uniform distribution can either be discrete or continuous and UNIF is the function call in R for getting the continuous uniform distribution. If you include the extra DISTR library you can also get the discrete uniform function and these distributions are important because they can be used for simulations and we are going to look at couple of simulations to calculate the value of pi for example. So what is the idea? So let us say that you have a square region A and B are the sides, so rectangular region A and B are the sides of this rectangle. So you know the area of this region and let us say that you have a circular region in the middle with some area A. Now if you throw random dots at this then the number of dots that lie in this region compared to the ones that are thrown at this rectangular region the ratio will give basically the ratio of this shaded area shaded in red. So if you can use the uniform distribution and generate points and from them look at how many lie here and how many you generated the ratio will basically give the area and if you know this area you can actually calculate the area. So even if it is some complex shape for example you can take the image, you can pixelate it and you can generate and find out whether the generated point has the pixel values let us say 1 or 0 then the ratio will actually give you the area of that region. So there is a way to do the integration to get the area that is one aspect but suppose this is a nice regular shape like a circle and we know what the area is then you can use that to actually calculate the value of pi and that is the simulation that we are going to do. So let us do those simulations and come back and for doing those simulations I am going to use these two R scripts. So the R squared is basically the square of the radius of the circle that I am considering. So that is 0.25 which means 0.5 is the radius of the circle. So square of that radius is 0.25 and we will come back to this M later and so let us start from here and we will come back and look at what it is. So I am going to say capital N is 10000 and so we are going to generate 1 to 10000 a sequence of numbers and we are going to generate random variants, uniform variants between 0 and 1, 10000 of them so that is why this is there. So every xy pair then gives you a point in a square of side 1. So it gives you a points that are lying in a square of area 1 and if you then find out how many of the random points that you generated where inside this region I am assuming that at the center of this square is where my circle is centered. So for every pair that you have generated you can find out how many of them lie within this region and so the area fraction is basically the number of points that lie within this to the total number of points that you through that you generated. So the pi value is nothing but this area divided by the r squared because pi r squared is basically this area. So if you want to get the pi value you can divide by r squared and so what we are doing here is that so we do 10000 of these pairs and calculate pi and we are going to do 10 such simulations. So this m is basically an outer loop which generates many such pi values from simulations and x is a vector and so we are running another loop. So see how we are running loops so you have an index so that is 1 to n then for i in index means it will go from 1, 2, 3 etc up to n. Similarly here it is a sequence that goes from 1 to m so for k in ii means it will go 1, 2, 3 etc so it will take each k value and each k is one simulation so k equal to 1 for example is the first simulation and it will store the pi value in x1 and k equal to 2 is the second simulation and it will store the pi value in the second simulation in x2 etc. So finally to get the pi value we sum all these pi values that you get from different simulations and divide by the total number of simulations you have done so we can then get the pi value. You can also get the relative error pi value minus pi and divided by pi will give you the relative error that we are getting by the doing these simulations. So let us first start and run this simulation and so let us find out which directory we are in so we are in the right directory so we have to source and it is in the scripts directory and we have uniform continuous dot r. So we have sourced it and the pi value is 3.14196 is what it is giving you and if you calculate so you get the error to be 0.00011693. So you can also look at the x so these are the 10 simulations that we did and it gave 151413081464 etc and you can look at the so it gives you that it is speaking somewhere around 3.14 of course you can do more simulations let us do that let us generate some 100 points and run the simulation. So now let us get the histogram of x so you can see that it peaks around 14 and because we have done more simulations this time pi should be much better what is the pi value so we got this and let us calculate the error in the pi value so it is 0.00065 and so you can similarly go on doing more and more simulations so maybe 1000 of them so it takes some time because it is going to do 1000 simulations and each it is going to generate 10000 pairs of points so you get 142121 and the error is you can see it is 3 decimals so it is gone to the fourth decimal place and you can generate of course histogram of x and you can see that it is now peaking exactly at 3.14 and so let us do one more last simulation it is going to be very time consuming simulation but generate hopefully so error here is 0.017 percent or something so let us see if we can get error smaller than this using this large number of simulations. Let us do the histogram of x so it is here and what is the pi value we get this and let us calculate the error so error has gone to 10 power minus 5 see it was 10 power minus 4 here and it is become 1.5 10 power minus 5 so by doing larger and larger number of simulations and averaging out you can have more accurate calculation of pi now in this case we actually took an area of 1 by 1 and generated random numbers so if you look at the random numbers that are generated like x for example right so these are all continuous numbers so it is between 0 and 1 it is generating any lots of numbers right that is what you see here now you can also generate discrete random numbers it is okay so to do that we are going to look at the so here again the structure of the code is very same and but what we are going to do is that we are going to use this library extra D A S T R so that it will allow us to generate this discrete random numbers and we are going to do the same thing so we are going to have an R square which is 25 and we are going to have an m of 10000 it is going to take obviously long time and in length L is 10 so that is the size of my simulation box 10 by 10 and so 0.5 L 5 comma 5 is where the center of my circle is so the circle itself is of radius R so that is why it is 25 but now the random numbers are generated or discrete and they are between 1 and L okay and so we are going to calculate the area but now because the area fraction should also have L squared so J by n is basically the area fraction multiplied by the area which we know is L squared so that will be the area and if you divide the area by the R square you will get the pi value and we are going to generate some 10000 pi values and then we are going to use it to evaluate the pi so let us see how this simulation works this is uniform but this is now discrete. Of course there are 10000 simulations so it is going to take a long time to do and it is also slightly relatively slower than the continuous random uniform distribution variants so but the idea is the same and so instead of having circle of 0.5 radius on a 1 by 1 square we have a 10 by 10 square on which we have a circle of radius 5 so relative areas are the same so we should have the so let us now look at how the histogram of X looks in this case so it is also but as you can see in this case it is not peaking at 3.1 it is peaking at 3.16 and you can see the pi value is also 3.16 and so you calculate the error the error is about 0.6 percent. So the discrete distribution is not performing very well it is it could be several reasons one is the way the uniform variants are generated that can have a say and we are also trying to approximate a circle using this 1 by 1 kind of thing so the circular region unlike in the continuous distribution so there will be points that will either fall or fall inside or outside but it is only an approximation to the circle that you will make with this kind of units because we have taken an unit of 1 so that could be another reason why in other words if you go back to the figure so continuous is this but if you take a graph sheet kind of thing then you will have this step kind of thing so that also does not give you a very good approximation for the circle so if you want to get a better approximation for the circle you should go to much larger system sizes so that will probably give you a better calculation but it will also take longer to run so I do not know how much longer it is going to take if you try to run a larger simulation let us say that we get some 10, 2500, L is 100 let us source and see so we are somewhere around 3.14 so that could be one of the reasons why we are having and you need to do much larger number of simulations to get better values right so it is going to take time so it is okay now you can see that it peaks at and 3.137 is what it gives and you can calculate the error and error is now 0.13 percent unlike the 0.6 percent that you saw in this case so it is very important depending on the approximation you make and so you will see better results and we are going to use this idea later to look at microstructure so as you can see in this case because we knew that it was a circle we tried to calculate the pi value but if it was some random shape and if you generate these uniform variants and all you need to do is to just find out whether it fell in a region which was red or fell in a region which was blue typically you take microstructures binarize them one zero and you find out whether it fell in region one or region zero and so you can pixelate the microstructure and for every pixel you take random numbers see which pixel you reach and find out whether that pixel is zero or one and you can keep counting how many zeros to the total for example we will give you the area fraction of the zero face and one will give you the one face. So it is an idea that you can use to do the analysis typical analysis of microstructure for sizes of particles for example involves the other methodology so typically one draws lots of lines and finds how many times it crosses and there are stereology techniques using which you can get some idea about the sizes and the areas and things like that what we are giving here is another method it is a computer based method which will also give you same information for example estimating the area fraction of two phases you can use a similar idea. But as a simple example we just took a known shape for which the area is known so we use that to actually evaluate the pi value. So now let us go back to our presentation. So this kind of Monte Carlo simulations we have just used it for integration but there is something known as kinetic Monte Carlo. So Monte Carlo can also be used for microstructure evolution simulations but typical Monte Carlo simulation does not include any dynamics or time information. Kinetic Monte Carlo is a type of Monte Carlo simulation in which one can include information on energy barriers on the energy surface of the system and hence also the time information that is associated with the processes and the kinetic Monte Carlo is actually based on for some processes. So what is the purpose of this session? So in this session so we have been looking at probability distributions and till now for many different problems we showed where they are relevant for material science and engineering. But the Monte Carlo simulations for integrations or their extensions to actually look at microstructure evolutions and the extension of Monte Carlo method itself to kinetic Monte Carlo to also estimate the time or dynamics associated with the processes of microstructures or the chemical reactions and things like that. So they also have their own statistical underpinnings. So probability distributions play a key role in also understanding simulation data and the results are setting up simulations and so on and so forth. So the take home message from this session is that statistical tools are not just for dealing with experimental materials data but also for simulation materials. So in other words it does not matter what is the source of your data, statistical methods are valid for all data. So that is the so I have kept uniform to the last because it is one of those rare distributions which can be either discrete or continuous. I have also kept it to the last because it ties in with simulations and kinetic Monte Carlo which is related to the Poisson processes and the Poisson distribution is something that we have discussed also has some relevance and if possible later during case studies we will look at some of these simulation methodologies also and how to do statistical analysis on this data. So that we will come to at a later point. So this brings us to the end of the probability distribution module. We started with some discrete probability distributions. So we looked at Bernoulli trials and binomial, negative binomial hypergeometric and so on and we also found Poisson and so on and we also found where they are important and then we moved to continuous distributions of which normal is very very important. We spent some time understanding normal distribution and then there are several other continuous distributions like log normal for example and webu and exponential and so on and they have their own relevance but there are a whole bunch of them which we did not discuss. So but as and when they are needed for example gamma distributions we have not discussed but they might become important. Beta distributions we did not discuss it might also become important. So we will discuss some of these probability distributions later. It is very difficult to be exhaustive about all probability distributions and the idea is that once you know how to deal with a few of them you will be able to deal with rest of them on your own and then we looked at uniform distribution and in all cases we have looked at the practical importance or significance of these distributions. Then we also looked at some distributions like chi-squared t and f which are relevant for confidence intervals and regression modeling and things like that. So we will discuss those distributions when we look at the analysis and so this is the summary slide for the probability distribution and we are going to look at how to analyze the experimental data and present and understand and so on and when we come there again we will continue with probability distributions. Specifically chi-squared t and f will find lots of use here and in any case we will keep talking about other probability distributions and use of probability distributions for the rest of this course. So probability distributions is probably the core of this course. So we have taken lots of time to understand several of them because we are going to continuously use them to do more and more detail than complex analysis in the sessions to come. Thank you.