 Welcome to dealing with materials data, in this course we are trying to understand the collection analysis and interpretation of data from material science and engineering. We are in the third module which is on probability distributions and we are going to begin with some discrete probability distributions, specifically we are going to talk about Bernoulli trials and binomial distributions. Let us consider an ideal solution of A and B or random solid solution of A and B. If you pick a random atom is it B if you ask the question then the answer is either S or no. In other words there are only two outcomes and if you are looking for B atoms and if you find a B atom then you can say that the outcome is a success and if you do not find a B atom you can say that the outcome is a failure. So, but success and failure are within quote marks because suppose if your interest is not in B atom but in A atoms then finding B atoms will be considered as a failure and finding A atoms will be considered as a success. Because there are only two outcomes success is complement to failure and failure is complement to success. And what is the probability that the answer is S. So, like I said it is a random solid solution and you pick a random atom what is the probability that it will be a B atom that is basically given by the composition of the alloy. Suppose if it is a 50 atomic percent alloy then the probability of finding the atom to be B will be 50 percent because we have assumed the random solid solution or we have assumed that it is like a ideal mixture. So, any random atom that you pick the probability that it will be of type B will be given by the alloy composition itself. So, X B will be the probability and this probability is the same for any of the trials. In other words implicitly we are assuming either that it is a large number of atoms from which we are picking some and so this process is not going to change the composition or we are assuming that we are just probing and finding that it is either B or not and then we are going to leave that atom there. So, it is not going to change our probabilities. So, any number of times you do this experiment. So, these are the assumptions that have gone in and we are also going to assume that the outcome from different trials are independent that if you have made one measurement it is not like the second measurement is going to be affected by your first measurement. So, under these conditions so that there are only two outcomes and the probability of the success is some P and it remains the same for all trials and different trials are independent. Such a process is known as a Bernoulli trial. You can write the probability mass function for the Bernoulli trial because we have assumed independent then the probability should multiply because X B is the probability of finding B atom and if it is success then it is X B itself. So, if it is failure then 1 minus X B is the success. So, if you say that success and failure are determined by 1 and 0 then X B power K 1 minus X B to the power 1 minus K where K is actually either 0 or 1 describes the result of every Bernoulli trial. If it is X B it is a B atom then the probability is X B and K is 1. So, 1 minus X B will have 1 minus K in the exponent so because K is 1 that is 0 so that will give you 1 so you will get X B. If it is not a B atom then because K is 0 X B to the power 0 will become 1. So, 1 minus X B to the power 1 minus 0 so it will become 1 minus X B so that is the probability of finding a non-B atom in an alloy of composition X B. So, this is a Bernoulli trial based on Bernoulli trial you can now ask the question suppose the alloy was not random it is an ordered alloy say let us say that there are specific sites which are occupied by A atom and there are specific sites which are occupied by B atom. Now will the trials be Bernoulli they will not be because if you do different trials if you pick a random atom depending on where the atom was picked from depending on the site the probability of finding a B atom will be different for different sites. So, this cannot be considered as a Bernoulli trial if it is a random solution or if it is an ideal solution you can consider this as a Bernoulli trial. If you now conduct such Bernoulli trials suppose you conduct N independent Bernoulli trials of which K are successful what does that mean you say try to look at some 100 atoms and you find that about 55 atoms are B. So, the result of such an exercise of conducting N independent Bernoulli trials of which K are successful is known as a binomial distribution and the probability mass function of binomial distribution it depends on K how many successful trials you got and how many total trials you conducted which is N. So, it is N factorial by K factorial minus N minus K factorial XB to the power K and 1 minus XB to the power N minus K you can understand this because we are assuming everything to be independent and one Bernoulli trial had probability and because there are many different ways of getting the other results when you do N times this exercise and K of them are successful. So, the N factorial by K factorial minus into N minus K factorial basically gives you that different combinations and so this is the probability. So, they are all multiplicative so you just multiply and you get the result. The mean value of the binomial distribution is N times XB whatever is the probability that you are assuming and variance of the binomial distribution is N times XB times 1 minus XB. You can notice that the variance is proportional to N because of which if you take relative uncertainty that will go as 1 by square root of N because sigma will go as root N and you are dividing by N because that is the number of trials. So, you will get 1 by root N and this is a very nice information to have because this shows you that if you do larger and larger number of experiments your uncertainty goes down your relative uncertainty goes down because your variance is proportional to N. So, this basically gives you the assurance that you can improve your accuracy by doing large number of experiments. This is not surprising because if you take 10 atoms and find that some 3 of them are B and if you conclude that the composition is 0.3 because you had the probability as XB and you did 10 and the mean value that you got is 3. So, 3 divided by 10 will be 0.3 that will be the mean that you will decide but if you do probably 100 maybe you will get 35. So, you can get better accuracy maybe you do 1000 and then you get some 367 or something. So, you can improve the accuracy by doing larger number of experiments. So, that is what this gives this information on variance gives. Suppose if you choose a random alloy and let us say that consists of not two types of atoms but M different types of atoms. Such alloys are known and they are equimolar multi component alloys they are also sometimes known as high entropy alloys. In such cases what happens again you can describe using a similar distribution function which is known as a multinomial distribution function. Binomial is for 2 and multinomial is for more than 2. So, if you have M different types of atoms and sometimes these type of materials are made for example 20 percent of 5 different components you make an alloy. So, in that case you will get the multinomial distribution. Now, binomial distributions are common whenever we have N independent events and each one has 2 outcomes and the success is with probability P and failure is with probability 1 minus P and these events are independent and the probability is also the same for every trial. Whenever this happens you will see that the result will be a binomial distribution. In microstructure analysis and in the case of geometric probabilities for example, these will again be binomial because they are like Bernoulli trials. Suppose you pick random points from a microstructure let us say that consists of 2 phases like the steel that we considered earlier which had phase 1 or phase 2. So, if you pick random points and if those points happen to be either from phase 1 or phase 2 then you can think of the process as I pick a point is it phase 1 or phase 2. So, the answer is S it is phase 1 or no it is not phase 1 in which case it is phase 2 and what would be the relative probability of this that would depend on the area fraction because if phase 2 has larger area fraction and random picking will most of the times end up with phase 2 and if phase 1 has more area fraction you will. So, some amount of quantitative information about the microstructure you can get by doing exercise of this sort. This also happens suppose if you have n components identical components and if you pick and see if it passes quality test for example or if it is in working condition for example. The answer again is S or no and depending on the component that you are choosing and its process or property it might or might not pass with a probability P and if you keep doing n such experiments and each one is independent and the probability does not change as you are doing your experiment for success then that will also be a binomial distribution. So, in many many different cases one comes across binomial distributions and these two are just two cases the random alloy picking and geometric probability in microstructural analysis are just two examples. You might think that the random alloy example is a bit farfetched but we will look at a case and understand that it is not so it is very relevant and we are going to look at one such microscopy technique where this has relevance. In any case so how do we deal with binomial distributions using R? This is a set of common theme that you will see that runs through the rest of this module. There are always four commands in this case it is binomial distribution. So, binom is common and you have D binom which is for probability density, P binom which is for the cumulative distribution function, Q binom which is the quantile function which is the inverse of the cumulative distribution function or binom which is for random variate generation. We have seen some of these commands earlier R norm for example we used to get random variates from a normal distribution R L norm we used to get random variates from a log normal distribution. So, we have seen some of these commands earlier and so we are going to do this. So, it is for binomial so you have D norm P binom Q binom and R binom and similarly for other distributions you will have this D P Q R along with the distribution so that is how these commands work. So, let us now go to R and work with some of these commands what do the probability distribution function and cumulative distribution function the quantile function look like for binomial distribution and how do we generate random variates from the binomial distribution. So, that is what we will do now let us go to R. So, here is the first set of commands to use D binom. So, X is a sequence so let us say that it goes from 1 to 1000 in steps of 1 and we want to plot X and we are going to get the binomial distribution density and so there are 1000 is the n and 0.5 is basically the probability P. So, this is how the distribution function looks of course you can change the probability and you can look at how it changes so you can keep shifting from where the peak is going to be. So, this is the D binom and so we can do the next one which is the P binom again it is the same sequence and so X is from 1 to 1000 in steps of 1 and now we want to plot X with the cumulative distribution function P binom for when the probability of success is 0.5. So, this is how it looks and of course if you shift then the probability distribution or the cumulative distribution function shifts and so this is also expected. Now, we want to get the quantiles but remember for quantiles the value should run from 0 to 1 because it is the inverse for this what is the X is what we are trying to get. So, the X sequence that we should get should go only from 0 to 1, 0 to 1 in 0.01 and so let us call that as Y it should go from 0 to 1 in terms of 0.01 and we want to get Y and this is Q binom Y and let us say for 0.5. So, this is the probability the quantile function which is the inverse of the distribution function and of course you can get it for 2, get it for 4, you can get it for 0.6, for 0.8 and so on. So, this is basically the inverse of the function that you saw earlier. Finally, if you want to generate random variates of course you can use that using our binom and let us do that. So, I am going to make 3 plots and I am going to pick random variates 100 or 1000 or 10000 with 0.5 as the probability value and plot the histograms. So, that is what you see this is the case when I picked 100, this is the case where I picked 1000, this is the case where I picked 10000. So, this is with 0.5 then what happens if I do with 0.2. So, this is what you get for 0.2 what happens for 0.8. So, you can sort of see that for example, 0.8 for small values skews like this and then for 1000 it is like this and when it comes to 10000 that is not making much of a difference. So, probably if you do maybe this as some 50 or something we can clearly see the difference let us see. So, you can see the distribution how it changes with more and more of random variates that you are generating or more and more number of times you are repeating this exercise. So, we will come back to this there is an interesting theorem which we are going to look at later. But this is just to show you how to deal with these distributions and work with them using R. So, we have just looked at binomial and Bernoulli trials. We are going to go through each one of these distributions. So, it is a whole zoo of distributions that we have of which we have chosen only few there are very large number of distributions that are available. And so, we are going to deal with a few of them as we go along which are of importance and relevance along with the information as to where they are useful or where they should be used and why if possible. So, that is what we are going to look at for the rest of this module on dealing with probability distributions using R. Thank you.