 So, let us get started with the other part. Now, we are going to talk about other possible way of estimating the parameters. Now, there is another method called method of moments. We all studied about the moments, right? We know about first moment. What does first moment give? Mean, and we know second moment. So, second moment is related to the variance. Like that we can keep on computing the different method moments. How you are going to compute moments? What is the easiest way to compute moment? Yeah, if you have moment generating function, you can write quickly compute your moments by differentiating those moment generating functions. So, how and this is one of the oldest method. You will see why this is so obvious and this is like people are using it since 1800s. So, what it does is, method of moment estimators, they are found by equating first k sample moments to the corresponding copopulation moments. What I mean by that? Suppose let us say you have a sample x coming from a PDF, you can compute the sample mean which is a proxy for your first moment. Can I interpret like that? Sample mean, can I interpret sample mean to be an sample estimate of my first moment? Now, I can also go ahead take the square of the samples and average them, can I then I can interpret as a sample estimate of my second moment. And similarly, I can take as many exponent and of the samples and take their average right. I can go on up to taking kth exponent, exponenting all the sample by k and then take the average that can I can take it as kth moment, I mean proxy for my kth moment or sample estimate my kth moment. This is I have done with my sample path, using samples only I can compute m1, m2 up to mk. Now, from my underlying population distribution, it has certain parameters, the mean will be related to those parameters right. So, and somehow I will write that and the second moment the actual second moment will also be related to the parameter, I will find that relation. And all this exactly how the moments are truly related to the actual relationship with the parameters I will find out. And all this actual moments the first moment actual second moment and the kth moment, they will be function of my parameter theta 1, theta 2 up to theta k, the underlying parameters. Now, I can equate them, I can equate these two to find one equation, these two to find second equation like that I can equate the k, I can also equate it to get the kth moment. So, this is what? I have this m1 computed from my sample, I know the true value from the corresponding parameters. Now, I am equating them and here by equating them I get k equations and I am going to solve this k equations to get those k parameters. So, my theta here consists of k components and I need at least k equations to solve to get this k points ok. So, that is what I am deriving this k equations by finding this k moments. The exact values are given by this and the actual value given from the samples are given by the left quantities ok. Now, let us look into the example. How does this work? Again let us take samples x1 to xn which are IID and which are coming from Gaussian distribution with parameter mu and sigma square and now I am assuming that my parameters are mu and sigma square both are unknown ok and I am going to represent them as theta 1, theta 2 actually theta 1 is mu and theta 2 is sigma square ok. Now, what does the method of moments say? First you compute your m1 which is basically sample mean, then compute your m2. Now, how many moments I need to compute here? 2 here right because I have 2 components in my parameter. Now, I have this compute my m2 also from the samples. To compute m2 what I did? I just take the square of the samples and average them. Now, I need to compute the true values. What is mu and hat in this case? So, mu and hat is the first true moment. What is the yeah true first moment? What is the first true first moment in this case? It is going to be mu itself right which is actually theta 1 and now the second moment true second moment. How it is related? It is going to be mu square plus sigma square right. So, why is that because we know that x minus mu whole square equals to sigma square, but this is nothing but expectation of x square you can simplify this this is like this. So, expectation of x square is simply sigma square plus mu square. The true second moment is going to be mu square square sigma square. Now, I got 2 equations. This is the first moment I got from the samples and this is the true first moment and this is the second moment I got from the sample and this is the true second moment right. Notice that theta sigma square I have denoted as theta 2. So, that is the theta 2 and theta 1 square. Now, I have 2 equations and 2 parameters. Can I solve this and get the value for theta 1 and theta 2 right. So, I will get whatever the values I will get I will get theta 1 and theta 2 hat and that is the estimates whatever I get is the estimates. So, notice that in this case also theta 1 is already this sample mean and theta 2 is what is going to come from that now ok. Now, let us do a quick example for binomial distribution ok. Let us say I have this sample first I am going to compute m 1 which is summation of x i i 1 to n 1 by n and now I am going to calculate m 2 which is 1 by n what x i squared. Now, what is the m 1 hat for me the true first moment yeah I want now this quantities here right mu 1 prime mu 1 prime is nothing, but mu n prime is nothing, but what is my k that is expectation of 1 which is k p and mu 2 prime I do not know why we have written prime prime here everywhere ok. And this is second moment true second moment what is the true second moment of a binomial distribution k k p 1 minus p. So, now, what I am going to do I am going to take this to be equals to k p and so, right k p whole square right. Now, this is going to be k p and this is going to be k p 1 minus p plus k p whole square. Now, I have two equations to solve in two parameters ok. So, by the way how do you are going to solve this? You are you can replace this k p by this quantity you will still end up with k p into 1 minus p right that product is equals to something how you are going to solve this everybody see that. So, this k p you can replace by this value and also there is one more k p you can again replace that by this value then this will become only a function in 1 minus p that you can solve for p ok. So, you see that right method of moments is so, natural ok all you need to do is estimates the moments as many moments you want and then equate it get that many equations and then solve them ok then what is the other any question about method of moments ok. Now, something we will see by estimators. So, in the classical approach what we did we assume that there is some underlying probability density function with parameter theta which is generating my samples. We assume that this function we assume that this function f is fixed the structure of that is fixed maybe it is Gaussian binomial or something, but just like the parameters are unknown, but we just said that that parameter was fixed. Now, in the Bayesian approach what we do is we assume that this theta itself is a random quantity drawn from some prior distribution ok. So, we are going to say that this theta itself is coming from some distribution which we do not know and we are going to assume that which we do not know, but we may assume that it is coming from some fixed distribution it is being some drawn from some distribution, but what is that exact value I do not know. In the classical example we what we did is we did not assume that theta is coming from some distribution what we assume that it is some something fixed it is always fixed and it is the constant value, but here we are assuming that it could be coming from some we did not put any prior distribution on this basically, but now we want to put some prior distribution saying that ok this theta itself is coming from some distribution and this prior distribution is a subjective belief about how that theta may be arriving coming from ok, but in this case we may initially assume that ok this thetas are coming from certain distribution, but when we observed a sample generated we may improve according to which distribution this thetas are coming and that is what we are going to update and call that updated distribution as posterior distributions. So, initially we will make some assumption saying that ok potentially this theta may be coming from some distribution which we call as prior distribution, but when we start observing some data we should we want to revisit and update that distribution and when we update the distribution that is what we are going to obtain as posterior distribution ok. So, let us say my thetas are coming some from distribution and the distribution I am going to denote it as P of theta and once some theta is fixed the samples are generated under that parameter theta according to this probability P x given theta. So, theta are coming from this distribution once that you have a theta your samples are going to be distributed as per P x given theta and this is the distribution of your samples ok. I call it joint because this is like the probability that this is the entire sample we are going to talk about and now given your x you may want to talk about the conditional distribution of your theta itself. So, notice that you start with some distribution of theta and assuming that theta let us say I mean whatever the underlying theta it is going to generate some samples and now using some samples you want to improve your see update your distribution about theta ok. Now, let us try to understand this formulae ok. So, this is the joint PDF of x and theta right I can write it like this or like this ok. Now, what I will do is I will try to manipulate this P theta given x is I have just bought this quantity guy at the bottom and now P theta x now I know that this unconditional probability of x I can write in terms of conditional probability and then integrate it over P of theta. So, you recall the steps that we did at the beginning when we did Bayesian formula when we applied a Bayesian formula. So, you recall probability of A given B is equals to probability of B given A divided by let us say we this is like this was like probability of B ok. That is right we did this right we are trying to do the same thing here, but on this conditional on my parameter space and sample space and here since I am treating this theta itself is a random quantities I can do this ok. Now, we already we also say that if there is a partition here let us say A i's are some I want to condition on some particular thing this I could write it as probability of B given some A j into probability of A j and where j equals to 1 to n. This we did which we call it as this we call it as total probability and then we use our Bayesian formula exactly that is what I am doing here ok and now if I am assuming this to be discrete here, but if it is a continuous case replace P by f x given theta here ok. Now, let us try to see how to use this binomial estimators ok. Let us say I have to deal with a binomial distribution which has parameter n and P assume that n is known P is unknown and this P is what I want to estimate using the Bayesian method now. Now, if I have to use Bayesian method as I said I need to start with the prior distribution ok. What could be a good prior distribution here? We know that P has to be between 0, 1 right, but I do not know where it is. What could be possible prior distributions I can take? We can take uniform, but more generally more I am going to start with the beta distribution ok. I was somebody said normal, can normal be a good prior distribution? No, why? Yeah, we want it to be P to be between 0, 1 right, normal is looking into the entire range, range of real number. So, I am going to take here instead of uniform beta with probability sorry with parameter alpha beta and now I want to see that ok. I have observed one sample and I want to see that how to compute my posterior probability. So, what is the posterior probability? Posterior probability is going to be this probability of P given by. So, P is my parameter and now I have observed why I want to see that initially P is assumed to be beta distribution. Now, after that I have assumed one sample what is a new distribution of my P that is my posterior distribution ok. Let us compute that. So, to do that I will start with my probability of observing y and P together. That probability is probability I can write it as this conditional probability and this unconditional probability. Now, if you tell me I know that y is binomally distributed and you have been observed some you have been given P. Now, I know that y is a binomial and if you are already given P this is the probability of y given P, everybody agree? This is true because y is binomial that has been assumed and now you have been already told what is the parameter P. So, then that is simply n choose y P y P to the power y 1 minus P n minus y. Now, what is the probability of this small P that is assumed to be beta distributed with parameters alpha and beta and notice that I should have been careful maybe I should have written it F because this is a PDF now. This is a PMF because binomial is a discrete whereas, P is now assumed to be continuous random variable and it is PDF is this. This is a PDF of beta distribution. Everybody agree? Now, I have simply organized that I have just taken I have just clubbed this P alpha minus 1 and y I have put together and this beta minus 1 and n minus y I have put together that is it I have just simplified this. Now, I have the numerator I need to compute my denominator P of y. Now, how to compute P of y? I know I have obtained my joint probability here and I know that P is taking value between 0, 1. Did I miss something here in this? Nothing right I just need to integrate it between 0, 1. So, if I integrate this quantity over P taking 0, 1 it looks like I can write it is in another gamma function if I am going to integrate. So, only P appears is in this product if I integrate this between 0, 1 it looks like this appears to be an another gamma function which I have written here. Am I correct? Is this a gamma function if I integrate it between 0, 1? I am not able to recall the gamma function definition now, but I hope this is correct. So, can you tell me what is the gamma P definition integration 0 to 1? x to the power that is it x to the power P minus 1. Yeah, we have to write gamma. So, what I want here? I want the beta function. What is beta function? x to the power this is just this right and what was the notation gamma of alpha beta like this? A beta ok fine, beta alpha comma beta. Yeah, it is exactly in this form right now fine. Now, let us plug in we have this numerator. Now, we computed this denominator. If I compute to this now I will end up after this simplification after taking this ratios yeah, after taking the ratio of this by this you see that this n choose y knocks of this gamma of alpha plus beta, gamma alpha this knocks of what remains is only this ratio which is this quantity. Now, by definition is this, can you check? Somebody can you verify beta of y plus alpha n minus 1 is exactly this? Why is that? What is this? This is right. Forgotting all these formulas. So, if you are convinced about this fine I do not recall this gamma beta function now ok. Now, what we got is we started with a ok we started with a prior on beta on alpha assuming this is beta alpha comma beta. Yeah, we assume that this is a beta distribution parameter alpha and beta. Now, we are saying after I observed a sample the distribution of these p given y has now changed to beta of y plus alpha and n minus y plus beta right. Now, the parameter alpha has gone to alpha plus y and beta has gone to n minus y plus beta and now you have this new after observing this y you have this new distribution and now what is the value of the parameter estimator you want to take? Maybe one possibility is what is that now you have this new distribution take the mean value of that posterior distribution as your estimation for p and what is the mean value of this new estimator it is y plus alpha and the sum of these two that is alpha plus beta plus n. Now, what I have done is I have basically reorganized this what now it is saying this is beta hat can be actually looked into the average of these two quantities. This is the prior information that I am going to basically take the weighted value of alpha plus alpha plus beta this is my when I have prior information this is the mean value of the prior distribution right and y by n is the new value I have try out. So, the new estimator I am taking is basically the weighted average of these two and the weights are n upon alpha plus beta plus n and alpha plus beta this quantity. So, if you see that these two weights they sum up to 1. I am basically taking a weighted average of these two quantities I am giving this much of weight to my prior mean and whatever the observation I got new observation the new value I am going to give this much of it ok. So, one thing you have noticed is I have started with the beta distribution as a prior, but the posterior also happened to be beta distribution and when that happens they are called conjugate family ok. And you will saw that this is not just happens with a binomial even if you start started with a normal distribution this would have happened prior you start with normal posterior would have also resulted in normal, but yeah depending on what is that you want to see like in my case here P was between 0 1 making beta made more sense rather than taking Gaussian depending on application you will choose either beta distribution or Gaussian ok. So, let us stop here.