 on dealing with materials data. We have come a long way in understanding the statistics while we want to understand the materials data and its component both in the theoretical part that I am trying to cover and what has been covered in the R sessions by Professor Guru Rajan. In next few sessions we are going to consider the case of parametric estimation. Let us recall what we have done so far. We introduced random variables and their expectations. Then we also talked about certain discrete and continuous special random variables. Then in the last few sessions we discussed what is a parametric and non-parametric case that is if you now we are in the realm of the population. There is a population for which we are trying to make an estimate or to judge or to understand what that population is like and this we can do through a small sample that we draw from the population. This sample we expect that it is a fully random sample and it gives us on a small number all information that we may require from the population so that we can estimate or we can judge you what the population is like. So if we are aware that the population is coming from a specific distribution function then we need to estimate the parameters of the distribution. We need to talk about the parameters of unknown parameters of the distribution but suppose the form of the distribution itself is not known then it is called a non-parametric case. As we mentioned in the previous sessions in this particular course we are going to concentrate on the parametric session that is all the cases which are referring to parametric cases. Then we also studied under parametric cases the sampling statistics in which we talked about sample mean and sample variance and we showed that the sample means expected value is actually mean of the population and we also showed that sample variance its expected value is also sample sorry population variance for the population. Now we want to get into what is called estimation the parameters. See in sampling statistics what we did is that every distribution or every population has its mean value and its standard deviation which is it is a tendency of central tendency where the data will be generally located as we said in the descriptive statistics it talks something like the center of gravity of the data and the dispersion actually says how the data is spreaded. But suppose I say that a particular data is coming from a log normal distribution or I say that particular data is coming from a viable distribution. I also specify that it comes from the two parameter log normal distribution. Then we would like to know what these parameters are because once you define these parameters the whole distribution is known so in a way you know the whole population. Now when you want to find the parameter you need to estimate it again from the small sample that we have drawn from the population and therefore in this few sessions we are going to talk about parametric estimation. We will be assuming that a population has a certain distribution form. We say that population distribution has a certain form and it has certain unknown parameters and our endeavor in the coming few sessions is to estimate these parameters in various ways. So we would like to consider here the case of point estimator. There are other methods of estimation for example there is called an interval estimation also that will also follow but in this particular session today's session we are going to consider the methods of estimating parameters through point estimation in which we will cover maximum likelihood estimator and method of moments estimation then we will also talk about interval estimation in the future sessions. Then we will have estimation of scale and shape parameters of two parameter Weibull distribution. We are going to give examples of Bernoulli as well as normal but they are very straightforward. So we would like to give an example of two parameter Weibull distribution which is bit involved and we will also talk about methods of evaluating point estimators that is unbiased estimator and robust estimator. Though I must mention that this interval estimation we will cover in subsequent sessions and similarly the methods of evaluating point estimators we will also cover in the subsequent session. In this particular session we plan to cover maximum likelihood estimation for parameters. So let us start. We have already talked about it let us repeat it y estimation as I said f theta be a distribution function completely defining the population only the parameter theta is unknown. So if we know the parameter theta we know the parameter theta then we know the whole population and therefore we want to consider the case of theta not known and we would like to estimate theta in order to draw inference on the population. There are two types of estimators one is called a point estimator in which it finds a single quantity as an estimate for unknown parameter theta. While the interval estimator gives an interval within which a value theta may lie with a probability attached to it. This probability which is attached to it is either called a confidence level or it is called a significance level I am sorry it comes in hypothesis testing but actually here it is called a confidence level of the interval estimator. So for example we can say that a theta will lie in the interval between say minus 3 and plus 3 with a probability of 95 percent or I may say that a theta will lie between minus 4 and 4 with the probability of say 68 percent I mean you can give variety of probabilities but this probability is to be derived. So you can have an interval estimator with a different level of confidence. As I said in this session we are going to talk about maximum likelihood estimator but in completeness we are going to talk about two methods maximum likelihood estimator along with method of moments or moments matching estimator which is also known as MME. I have particularly chosen this because it has become very common with many softwares and also with respect to the R programming. So it is better to know its merits and demerits with respect to methods of moments. Of course when we study when we talk about evaluating the point estimator we will go in further details. So let us start. We start this session we are going to talk about maximum likelihood estimator. So let us begin. Let X1, X2, Xn be a random sample of size n from the unknown population distribution where unknown is only the parameter theta. Thus X1, X2, Xn are independently and identically distributed as f theta. If f sub capital Xi, small Xi theta is given a PDF of random variable Xi where i varies from 1 to n. A typical random a typical element or a typical member of a random sample has a distribution function a probability PDF that is probability density function as f of Xi theta then the joint because they are independent and identically distributed the joint PDF of all the n members of the sample. Let us call it f of small x1, x2, x3, xn given the value of theta that the parameter is theta will be a product of all the PDF of each member of Xi which is which are identical except for the value of Xi. Here X1, X2, X3 I have the you please note the difference between the capital Xi and the small Xi and please recall capital Xi are the random variables small Xi are the realization of respective capital Xi. So it is a small x1, x2, xn are the realization of a random sample of x1, x2, x3, xn all x capital here. It is important to realize that joint PDF that is this function has x1, x2, x3, xn known only unknown is theta and as far as the population is concerned we can recognize the population only through its random realization of the random sample and therefore whatever information that you wish to have with respect to theta is all contained in this joint distribution function and therefore this joint distribution function is given a special name and it is called a likelihood function of parameter theta because it contains the likelihood where both the parameter theta can take a value because again let us repeat what I am trying to say is that the population is unknown. Only thing we know is a random sample that we have drawn and the realization that we have made. So it is an abstract phenomena which we say that a random sample of size n is drawn in reality what we see is the actual realization the small x1, x2, x3, xn that we observe this is our data small x1, x2, x3, xn is our data. So data has all information possible with respect to the unknown parameter theta and therefore this joint density function is called a likelihood function of theta. Now we would like to find out theta and one of the philosophy is that take the value of theta which maximizes this likelihood take the value of theta which maximizes this likelihood this sounds a very logical argument. So here we wish to find out the maximum likelihood function and at which point it will become maximum that theta we would like to call a maximum likelihood estimator. So we have that so we would like to maximize the likelihood function of theta and that estimator we are going to called a maximum likelihood estimator. Now this is a purely little math that if likelihood function is maximized then log likelihood function is also maximized and we will see in future this log likelihood function makes life easy in order to find a maximum mathematically. Generally the maximum likelihood estimator of theta is denoted by what is called theta hat as it is shown here. Let us see how do we actually do it. So I am going to consider three examples in this session. The first example is that of a Bernoulli parameter. So consider n independent Bernoulli trials x1, x2, x3, xn with a p as a probability of success and p is unknown. In that case you know that the xi can take a value either 1 if the trial is success or it will take a value 0 if the trial is failure. So we can say that the probability mass function we do not have a probability density function this is a discrete case and therefore we have a probability mass function. So probability mass function that xi takes a value x where x is either 0 or 1 is probability to the p to the power x multiplied by 1 minus p to the power 1 minus x and now we want to find a maximum likelihood estimator of unknown parameter p. So we must find first a likelihood function of p which is nothing but the joint density of x1, x2, x3, xn and that we can find out by simply multiplying the densities because they are all independent and identically distributed. So we get the likelihood function of p as a product of p to the power xi multiplied by 1 minus p to the power 1 minus xi. Simplifying it we say that it is p to the power summation of xi multiplied by 1 minus p to the power n minus summation of xi, n comes by summation of 1. Take log on both sides remember we said that maximizing likelihood or maximizing log likelihood it is going to give us the same answer with respect to the parameter theta. So here we take a log function of the likelihood and you see how it nicely simplifies to take a derivative of this is a very tough this is very simple. So we get the summation of xi multiplied by log p plus n minus summation of xi multiplied by log 1 minus p. Recall that in this whole equation only p is unknown, xi's are realization either 0 or 1 and we already know them. So taking the derivative with respect to p and equating it to 0 we get the following equation and if you solve this equation you get that p hat remember that now this maximizing likelihood function value of p is called p hat it is MLE it is maximum likelihood estimator and that value p hat is 1 over n summation of xi in other words it is x bar it is a mean value of the total realization of sample of size n. Let us consider another example this is a continuous distribution we take a normal distribution with two unknown parameters mean of the distribution and variance of the distribution mu and sigma square respectively. So here again the likelihood function of mu and sigma square given the realization x1, x2, x3, xn of the n sample size is multiplication product of 1 over sigma square root 2 pi exponential 1 half minus 1 half xi minus mu sigma whole square and if you take the logarithm life becomes much easy it is n by 2 log of 2 pi minus n log sigma minus summation of xi minus mu whole square divided by 2 sigma whole square. I am not going through the derivatives of it but if you take the derivative with respect to mu and with respect to sigma square please remember you have to take a derivative with respect to sigma square and not sigma because here sigma square as a single entity we want to estimate. You will find that when you take the derivatives and equate them to 0 you get two equations and they will beautifully solve as mu hat that is maximum likelihood estimator of mean of the population is the average of the sample and the variance of the maximum likelihood estimator of the variance is the 1 over n summation of xi minus sample mean whole square please remember that this is not capital S square. Please recall this is very important to note that this is not equal to capital S square. Remember that capital S square is 1 over n minus 1 summation xi minus x bar whole square now this is x bar. So, mu hat and x bar are same but they are different in the denominator. So, please note this that maximum likelihood estimator is not the sample mean it differs in the denominator sample variance has denominator as n minus 1 maximum likelihood estimator of population variance has a denominator of n. Let us take the next example now you see so far we have I have given you examples in very beautifully you get a closed form solution for maximum likelihood estimator. Well these are the lucky cases this does not happen all the time here I would like to show you the case of two parameter Weibull distribution in which we will at the end we will get the two equations or one equation that we need to solve iteratively to find the value of maximum likelihood estimator. Let us go through it please note that I have given a reference here. Please note that I have given a reference it is good to read this is freely available on internet for downloading you can download this particular pdf and go through it you will get a taste as to how people tend to derive this kind of estimator. Anyway briefly I am going to discuss it here. So, let the random variable x be a Weibull distribution with two parameters the scale and shape scale is alpha and the shape is c then it is denoted as fx is equal to this where x is greater than 0 alpha is greater than 0 and c is greater than 0. Now we would like to find MLE of alpha and c here it is easier to make one transformation so that the calculations become easier. So, if you take another parameter theta which is a function of alpha and c alpha to the power c then this density function very beautifully simplifies to this and this is easier to work with to find the maximum likelihood estimator and once you find it you can always revert back and find out the actual estimator of alpha and c. So, here c is as it is only alpha is replaced by theta. So, let x1, x2, xn be a random sample with a Weibull distribution with the changed parameter theta and c then the likelihood function of theta and c is given by this it is simply a product of the function that we have pdf that we have considered here. So, it is a product of n such pdf's. If you take the logarithm it becomes n log c minus n log theta plus c minus 1 summation of log xi minus 1 over theta summation of xi to the power c. Now if you differentiate log likelihood function with respect to theta and c again I am leaving this calculations to you please verify them. It is important that you have a practice for doing this it gives you the feeling what goes on inside instead of memorizing it. So, it says that in that case theta hat can be expressed in terms of parameter c which is 1 over n summation of xi to the power c and c can be estimated from this equation. It says that 1 over c minus summation xi to the c multiplied by log xi divided by summation xi to the c is equal to average of log xi it is equal to average of log xi. So, in a way if you look at it this is a weighted average of log xi where the weights are xi to the power c and this is a simple average of log xi and this is what the equation is and therefore you please note that this equation based on the data has to be solved iteratively. You can use Newton-Raphson method you can do secant method I mean you can choose any method and solve this particular equation to find it 0 that is going to be your maximum likelihood estimator of c which is c hat. And then you replace in the equation of theta the c hat here actually in this equation I should have everywhere written c hat. So, let me make that correction. So, here it should be c hat this is c hat this is c hat this is c hat and therefore if you solve this equation numerically you can use Newton-Raphson's method or you can use a secant method Newton-Raphson works out very well, but you can use either method and find a 0 of this function you have to take this day on the other side and you find the solve solve this iteratively. Once you find the c hat you plug that c hat you plug that c hat into the expression of theta hat and what you get is an estimator maximum likelihood estimator of theta hat. So, with this two three examples let us summarize what we learned about maximum likelihood estimator and what we talked in this session. There are two types of estimators we say there is a point estimator and there is a interval estimator. In the point estimator we introduced a maximum likelihood estimator where the likelihood function of unknown parameter in terms of joint distribution of the data is taken and it is maximized and the place where it takes the maximum value the parametric value which makes it maximum is the called the maximum likelihood estimator. The examples of Bernoulli trials and the we estimate we found the estimator of probability of success. Please note that that estimator is also the sample mean which is same as the it is also the sample mean is the estimator which expected value of sample mean turns out to be the probability of success. Remember that MLE may not have all the time this property. Then with respect to normal distribution we talked about maximum likelihood estimator mu which is same as sample mean. However, we found that the maximum likelihood estimator of variance differs from the sample variance in the denominator. We then showed that every time finding maximum likelihood estimator is not a easy task at times you end up with a set of equations or a single equation which you need to solve iteratively. Now in the next session we will move on to discuss the method of moments estimator. Thank you.