 Now, another distribution is called a beta distribution, this is another expansions we are doing like gamma with gamma distribution we broaden our scope of distributions like like similarly we do beta distributions. Beta distributions we are going to denote beta AB there are two parameters A and B and this pdf is defined like this and notice that for beta distribution it is only defined in the positive interval but restricted to 0 1 ok. Now, if you make this A and B and same as 1 this suppose now put A and B as same what you are going to get f of x equals to gamma 2 gamma 1 gamma 1 x 0 and y 0 0. So, what is the value of gamma 2? 1. So, and what is the value of gamma 1? 1. So, this is like 1 if x is between 0 1 and this is like 0 otherwise what this distribution corresponds to uniform this 0 1. So, by putting A and B equals to same to 1 you have recovered uniform distribution. Like that you can choose any distributions you like in this and again that is what like I have put some special cases here. The first thing I have put is like let us say take this one AB 2 2 and AB 5 5. So, in these two cases I have put both A and B to the same value. If you see that when AB are 2 2 it is like this and if I am increasing the value of AB to 5 this is like a kind of narrowing down. The peak is always at the middle 0.5 and similarly you can imagine that if I increase AB value to 10 it may be like may be looking more and more concentrated around that point. Now, the case where A and B are not different like I have again considered two symmetry two cases when A and is 2 and B equals to 10 you can read the value on the screen ok and let me write here. So, here ok now consider this case this is corresponding to this curve. In this curve A is 2 and B is 10 and this curve here is the here A is 10 and B is 2. So, notice that when they are assume the A and B values are not the same some kind of skewness is happening in the peak of this curve like when B is large the peak is mostly towards your left and when A is large the peak is towards your right and when A and B are same the peak is exactly in the middle ok and as a very special case when A and B are equals to 1 it is like a flat curve no peak is there because we just showed that is a uniform in this right. So, again this is like a generalizing thing like from uniform distributions now we have parameterized and you are able to capture so many different distributions and this is often used in Bayesian statistics like I said earlier like if I do not have any prior information I am going to take uniform like everything is equally likely, but if you have initially some prior information that smaller values are going more likely than the larger values what they are going to do you are going to take B as larger than A or A larger than B, B larger than A right like in this case right like here I told you like the ones which are smaller or more likely that is why here B is going to be larger. On the other hand if you have some prior information that larger values are going to be more likely then you will put take a A to be larger than B ok next even though we talked about so many different distributions and we talked about their various parameters it actually happens that most of the distributions we have talked so far they can put in a very compact way and they all belong to one special class of distributions called as exponential families ok. Now, let us say let us write down generic probability density function let us say I have been given a distribution which is parameters by theta and that I am going to express in this format ok. Now, let us try to decipher what I have written here there are two things H function, C function, W function and T i functions so H is a simply a function here which depends on the point notice that here deliberately I have written given theta. So, this distribution this PDF is a parameterized this depends on a given theta and now we have to define it for all possible values of X. Now, this is this H function is going to be positive for all X and the parameter for that parameter that C of theta function is a positive value and now this W i here this W i is a function of theta and this is real valued and this when I write it like this this W i cannot depend on X i this W i is only function of theta and this last one T i this is again a real value function, but this only depends on theta, but not on X. If a PDF function parameterized by theta I can write like this this is called exponential family we are not hard coding what is should be H, C, W i and T i we are just saying that H of X should be positive for all X, C of theta should be positive and W i should be just depends on theta not on X and T i should depend just on X and not on theta. Now, you will see that all the distributions that we talked in the discrete case and all the continuous one that we have talked so far Gaussian, gamma and beta they actually follow fall into exponential family they can all be expressed in this format for some values of H, C, T i and W i. So, this exponential family that is what pretty handy because it covers large distributions that we have already studied. Now, let us look why is that is the case why is that we are saying that binomial belongs to exponential. Now, let us try to see this binomial distributions I am able to write it in the form distribution associated with the common template of the PDF for exponential distribution. So, binomial is what n comma p. So, the parameters in this case theta are n and p. So, now, assume that among these two components this one component is known. So, my theta parameter is actually only p this is just to simplify things. Now, if you write your PMF we already know what is the PMF of a binomial distribution right this is what is this this is basically like a probability that X equals to X given that your parameter theta is p. So, what we are basically saying that my random variable takes value X under this parameter p that is n choose X p to the power X 1 minus p n minus X right this is the definition of binomial random variable or this is the probability mass function of your binomial distribution. And now let us manipulate this can I write p power X as e to the power X log p nothing changes right this I have also written like this. And now this product of two exponentials I have just written as exponential and taken e to the power sum of the exponents this is also correct right this is just a property of exponents. Now, let us see that I can write it in this form that I wished what was that that is h of X c of theta then e X p my i equals to 1 2 let us how what was the index we use i and then we said w i of it was theta right theta and t i of X. Let us say this can be represented as now we said k let us say this can be represented as in this format ok. Now, let us say my parameter theta is p now first decide so exponential so this exponential we will map it to this exponential terms within but now first let us the this part I have to map it to the factors multiplying the exponential. So, h of h of X c of theta is simply n choose X but here n is known only and only thing X this. So, then in that case I will choose h of X equals to simply this n choose X and this is true for X 0 1 to n this being a binomial I will choose X to 1 0 1 to up to n and 0 otherwise and this requirement that my h of X has to be greater than or equals to 0 now that is full plate this guy is positive. Now, for c of theta this c of theta has to depend only on theta now here there is nothing so I will just take c of theta to 1 ok and this is also positive quantity that requirement c of theta is positive is also 1. Now, let us see that now let us focus on the terms inside exponential I know that w 1 so each of these w i's has to depend only on theta what I will do is for that these things I am going to now define w 1 theta to be log p and w 2 theta to be log 1 minus p and now both w and w 2 depends only on the parameter p they are not dependent on X and now to define t 1 X t 1 X I will take this X and t 2 X I will take it as this n minus X. Now, this only depends on X and not on the parameter. Notice that n is a parameter but we are assumed it to be fixed so the parameter is only p and this does not depend on this p. Now, have I put it in the format that I want and here k is 2 because I am adding only two terms in the exponent. Now, that is why I have now put this probability mass function as h of X c of theta in this form this is the required form for it to be called exponential family. So, that was for the discrete case let us now look into one continuous case. So, for this we will take Gaussian. So, Gaussian is going to be parameterized by two values here mu and sigma square. So, theta here is that is what like this f X given theta and this theta is mu and sigma square here that is what we have written. We know that its PDF is like this. Now, let us see this could be expressed in the exponential family. So, to do that we have first I have simplified this exponent inside I have expanded the square you will get this. Okay. Now, let us see this term here depends on sigma square which is my parameter. So, I will take this c of theta. Now, one more thing I have done when I expanded this square mu square 2 sigma square is there. So, I have also pulled out this. So, this entire thing here it depends only on the parameter mu and sigma square and not on X. So, that is what I am going to take c of theta to be this entire quantity this quantity and I will simply define H of X to be 1 for all X. Okay. So, notice that when I pulled out this this did not depend on X, but the other terms this and this this term depends both on X and sigma square and this depends on X and parameters. So, whatever the things both depend on X as well as the parameter I kept it inside my exponential and where it only depends on the parameter I have pulled out out and this helped in the simplification. And now if you now look into this part I can take mu and theta to be this 1 by sigma square here and mu 2 theta to be mu by sigma square part this portion here and take t 1 X to be minus X square and t 2 X to be X this portion X. Now, if you notice that with this definition I am now able to write F of X given theta exactly like this again for k equals to 2 this is what is the definition of exponential family. So, that is what like we have again verified that a Gaussian is also belongs to exponential family. Now, quickly check whether this gamma function we just discussed today is also belongs to exponential family ok. Now, we said gamma is parameterized by alpha and lambda and this is its function when X is greater than or equals to 0 right. And when X is less than 0 we also had to handle the case where it is 0 by appropriately defining. Now, what I have done here this first thing we have kept it same, but now X to the power alpha minus 1 I have rewritten as e to the power alpha minus 1 log X. Can I do this? This is the property of exponents and log that I have used and e to the power lambda X I have written. And now let us see if I can come up with H function, C function, T i function and W i function as required in my exponential family. So, one can define C of theta to be alpha to the power lambda to the power alpha gamma alpha H of X equals to 1. Now, W 1 of theta to be alpha minus 1, W 2 to be minus alpha. So, here W function only depends on your parameters and T 1 X can you take as log X and T 2 X you can take it as simply X. Now, if you could expand that now this is this F of X given theta where theta is alpha lambda you are able to write it like this. So, this is again exponential family. Now, an exercise for you check F beta AB is belongs to exponential family for all AB. And see that in particular F uniform 0 1 belongs to exponential family. So, please check this, don t us skip it.