 So, as I was mentioning we had I was mentioning about estimating parameters you can have if you know the form of the density function, but if you do not know the values of the parameters then how do you estimate them well there are very many distributions in the world. And here I will give you example of only one distribution that is normal distribution normal distribution or Gaussian distribution I will tell you ways of estimating the parameters. Then the next question is what is the meaning of estimation what is the meaning of estimation there are several estimates several types of estimates possible there is something called unbiased estimate there is something called maximum likelihood estimate there is something called consistent estimate like this depending on the uses are depending on that type of the data I mean one can have several types of estimates since I am only going to talk about normal distribution not about any other distribution and if you have got finitely many points from a normal distribution be it univariate or multivariate normal distribution then the mean of this finitely many points it is unbiased estimate consistent estimate and also maximum likelihood estimate of the population mean I will repeat it and I will write on the board. If you have finitely many points from a normal distribution the normal distribution can be univariate or multivariate then the mean of this finitely many points is unbiased estimate consistent estimate and maximum likelihood estimate of the population mean these are end points coming from Gaussian distribution independent and identically distributed this is one terminology you are going to find at many places independent and identically distributed do you understand the meaning of this let me tell you the meaning the meaning of one value being or one variable being independent of another variable you take a coin you toss it once you will get had the same coin you toss it second time you will get some other outcome do you think the first outcome has any role to play on the second outcome the answer is no. So the result of the first trial does not have any role to play on the result of the second trial then these two trials are independent whatever may be the outcomes of these two trials they are independent now the meaning of identically distributed you are tossing the same coin the probability of head of the coin has not changed the probability of tail of the coin has not changed so the distribution has not changed so in the first trial and second trial you have the same distribution but they are independent so then we say that these are independent and identically distributed random variables and why the word variable is used because you are going to get the value as h or t if you are going to get a vector having some number of components then we would call it independent and identically distributed random vectors independent and identically distributed random vectors so let now let me write the statistical way of writing it let x1 x2 xn be independent and identically distributed random vectors following normal distribution with mean mu and variance covariance matrix sigma where mu and sigma are unknown you might be having a question the question is I wrote here random vectors usually we get the observations usually we get for one data set you let us just say you have got 10 vectors usually we get let us just say each one of them is four dimensional vector so you will get 10 such four dimensional vectors okay but then here it is written random vectors you see those vectors that you have got they are taken to be realizations of these 10 random vectors if you have got 10 vectors in your data set then the small n value is 10 and those vectors are taken to be a realizations of these 10 vectors I will tell you once again with an example say I have tossed a coin my first toss is resulting in head say second toss is also resulting in head this is one realization of that means my x1 value is h here x2 value is also h but note that my x1 value could have been t my x2 value could have been t so this x1 x2 pair I might have got hh ht tht any one of the four pairs there possible but then in that instant I have got x1 as h x2 as also h so they are realizations of the vectors but then all possible values of x1 they can be many all possible values of x2 can be many I hope you have understood the subtle difference I hope you have understood the subtle difference between what a variable is and the data points that you have got so independent and identically distributed random vectors following this but mu and sigma are unknown but we have these are following this so we have some realization of this then how do you estimate this mean mu and variance covariance matrix sigma on the basis of this then so the answer is you write x bar this bar is denoting the mean this bar is denoting the vector 1 over n summation i is equal to 1 to n xi you take the mean of these n vectors so in one realization you will get one mean in another realization of these n vectors you might get another mean are you understanding that is why this is also a variable in one realization you may get one mean in another realization of these n vectors you may get another mean so this is also a variable then this is the statement that I made unbiased un consistent and maximum likelihood estimate ofizations do you think I am getting there are n trials there are n trials small n trials so in one set of n trials you will get one where one set of values in another set of n trials you will get another set of n values so so that is why this is a variable this is a variable then you might then your natural question is what is the meaning of unbiased estimate what is the meaning of consistent estimate and what is the meaning of maximum likelihood estimate I think that I will not go into it I will not go into then I have to go deep into statistics which is not my aim okay I just want to state the results so about mean there is actually no confusion in the sense that you just take the sample mean and observations you just take the mean and that is a good estimate of population mean whatever may be the mean but about variance you have small problem which was what I was in my one of the previous classes I was mentioning okay. So this is about mean now about the covariance matrix let me write this vector xi as let me write this vector xi as xi 1 xi 2 xi n i is equal to so can we just change then maybe I can take this thing as Gaussian ga u ga u Gaussian distribution ga uss Gaussian distribution with mean mu and variance covariance matrix sigma in general I am using capital N for the number of features capital N for the number of features so so there are capital N number of features that means you are in capital N dimensional space and your number of points is small n number of points is small n so this ith vector I am writing it as xi 1 xi 2 xi capital N now this covariance matrix sigma will be this is covariance of what this will be the covariance of here you have basically you have one vector x which is occurring which has the same distribution you have one vector x which has same distribution throughout so it is giving rise to the first one x 1 the second one x 2 the third one x 3 and so on and so forth. So you are trying to find out covariance between the ith component of this vector and the jth component of this vector like that you are going to find the covariance between the covariance matrix will be a capital N by capital N matrix since you have capital N number of variables your covariance matrix will be a capital N by capital N matrix. So if your covariance matrix is capital N by capital N matrix then you are basically going to have covariance between the jth component and kth component you might be wondering what this j is and what this k is j is 1 to N k is 1 to N and this j and k are your original vector is x you have x 1 to xn and the ith realization xi it is basically here it is x 1 i here it is xn i this is your xi this xi is same as these xi's this xi is same as this xi x 1 if you write i is equal to 1 here you will get x 1 1 to xn 1 i is equal to 2 x 1 2 to x n 2 and i is equal to small n x 1 small n x capital N small n and you will find the covariance between the jth and kth component of the original vector x of the original vector x the jth and kth component of the original vector x which I am representing it as xj xk and your covariance matrix will be covariance x1 x1 covariance x1 x2 and covariance x1 xn and this is covariance xn, x1 covariance xn, x2 and this is covariance xn, xn which is nothing but the variance this is the population variance covariance matrix which is not known to us this is what we would like to estimate what we have got are these things what we have got and what we have we have these things now from these we would like to estimate this now how does one estimate it here again you have consistent maximum likelihood and unbiased estimates now there is a small difference here between unbiased and maximum likelihood estimate the unbiased estimate of the covariance matrix I mean covariance between ith and jth and kth component covariance between jth and kth component the estimate is represented by a hat okay this is equal to 1 by n-1 summation i is equal to 1 to nj j is coming here where this is unbiased estimate this is unbiased estimate these are the means of the small n observations for each feature unfortunately this is also a variable for different sets of realizations you might get different means so these are all variables so I had to write represent them in capital letters so for the jth variable I have taken the jth sample mean for the kth variable I have taken the kth sample mean and I wrote here n-1 this is unbiased estimate whereas if I write here small n it is going to be maximum likelihood estimate which was what I was telling you in my one of the previous classes if I write there small n it will be maximum likelihood estimate mle that is the mle is the standard form maximum likelihood estimate mle this is how you estimate that again small n-1 by small n-1 is unbiased estimate if I write there 1 by n it is going to be maximum likelihood estimate mle now in my previous class one of the previous classes I was mentioning these two things without giving the proper mathematics here this is the proper mathematics I could have written here COV xj xk there is a hat this is 1 by n summation i is equal to 1 to n xji-xj bar into then this is mle for a very large n whether you write n or n-1 it does not matter and I also mentioned in that class that statisticians would like to take 1 by n-1 not 1 by n and I also mentioned that I was telling you about 1 by n because that is what generally people have learned in their childhood when you learn mean and variance you write there 1 by small n not 1 by n-1 anyway since I have done this thing from now onwards I would expect you to write 1 by n-1 not 1 by n do you know the meaning of unbiased estimate the meaning of unbiased estimate is just one minute please yeah so I will tell you the definition of unbiased estimate suppose t is a function of the sample observations and you would like to estimate ? then t is said to be unbiased for ? if expected value of t is actually ? expected value of t is actually ? now what is the meaning of expected value of t you are going to find the mean of this t expectation means the mean that means integral t capital T is taking the value small t with its density function suppose the density are represented by f t and suppose it is ranging over – 50 a set of observations are a a set of a function of the sample observations t is a function of the sample observations ? is the parameter that is to be estimated then t is said to be an unbiased estimator of ? if expected value of t is equal to ? the expected value of t means you are finding the mean of t you are finding the mean of t if that is ? then you say that t is an unbiased estimate of ? now I said that the sample mean if you have say forget about multiple dimension let us just say we are in single dimension x1 x2 xn this mean x bar which is 1 over n summation xi this is a this is an unbiased estimate of the population mean ? population mean ? there is no bar what is the meaning of this you take all the realizations of this x1 this is realization of the first random variable capital X1 Xn it is a realization of the nth random variable capital Xn which are independent and identically distributed which are independent identically distributed. So you will get many values of this x bar you will get many values of this x bar in one realization you will get one x bar in another realization you will get another x bar and you will take the average of all these realizations that is actually I hope you have understood the basic feeling of unbiased estimate if it is not this if it is something else that means our estimator is not able to estimate this thing properly a statistician would generally like to have would necessarily like to have unbiased estimates because if it is not unbiased then there is a bias you are not able to you are going shifting away from mean you are shifting away from mean there is a bias which a statistician generally does not like that is why statistician prefer unbiased estimates I hope I have answered your question maximum likelihood estimate is different maximum likelihood estimate the intuition is if you have got this set of observations which particular value of ? will provide with maximum probability this set of observations I am repeating which particular value of the parameter ? will provide you this end points maximum likelihood that is why the thing maximum likelihood so that is maximum likelihood estimate and unbiased estimate I have already told you consistent estimate means as the number of points goes to infinity this function that you have taken is it actually going to the actual value that is consistent estimate as the number of points goes to infinity this function that you have taken it should if it goes to the actual value then that is consistent unbiased means on an average you should get this ? maximum likelihood means which particular value of ? will provide will give maximum probability for this sample or maximum likelihood for this sample that is maximum likelihood estimate it is not necessarily true that all these three things should be same and in the case of variance in covariance they are not same any other question I think we will stop it for today.