 Welcome back. After a brief break, statistics and mathematics is indeed a heady combination for those who are mathematically inclined. And to top it or an icing in the cake is there is lot of application in whatever we are doing. It has a lot of implications in decision making. In order to show to the outside world that our decisions are made in a scientific manner, in an impartial manner we resort to statistics. The decisions are made in a manner which is not arbitrary but that can be defended by sound scientific principles. That is why numerous journal publications are insisting on statistical analysis of the experimental data that are being reported. They do not really care about the scatter or the lack of coincidence of the experimental data but they just want to assure that a proper statistical analysis has been carried out. Now coming back to our lecture, we were talking about joint probability density functions. Without too much of a preamble, let us straight away go to the expected value of a combination of random variables x and y. So let us look at the expected value of x times y which is given by x into y f of x, y xy dx dy. The small x is the representative of capital X and small y is the representative of the random variable y. So the value of x is not specified to be a point value but it is within a certain interval so that the probabilities can be calculated. Here we are putting the limits as minus infinity to plus infinity. If you did not have x and y here, the probability would be such that the integral, multiple integral f of xy dx dy would have been equal to 1. But since you are multiplying it with x and y, the integral need not be 1. It can take any other value. Even if you multiply it with only x into f of xy xy dx dy, as in the case of expected value of x, the integral would not be equal to 1 because you are multiplying it with x. Similarly with y, y into f of xy xy dx dy will not be equal to 1. So what I am trying to say is expected value of x and expected value of y need not be 1 all the time. Now we are talking about covariance represented by COV between the variables x and y. Now to understand covariance, let us think in terms of something we already know. Instead of covariance, if we have only variance and we also have x and x instead of x and y, we have x, x. So the covariance of x, x, there is no sense in talking about the covariance between the same random variable. So x, x would be rather variance. Since we are talking about a single random variable x, it becomes variance and here it would be x-e of x into x-e of x f of xy xy dx dy. So that would be x-e of x whole squared which reminds us of the original definition of the variance for continuous functions, for continuous probability distribution functions. Similarly, covariance of x, y may be defined as x-expected value of x into y-expected value of y f of xy dx dy. This is very interesting. We are drawing back on our knowledge of variance to understand covariance. But since here we are having 2 independent different random variables x and y, we do not call it as variance of x and y. We call it as covariance of x and y. And so we have x-e of x into y-e of y f of xy dx dy. This can be simplified. You will have xy, x-e of y, y-e of x, e of x, e of y. So xy, f of xy dx dy is what you have here. And then you have e of x into e of y and nothing else. This is interesting. What really happened? Expected value of x is a value. Since you are defining expected value of x between the lower limit to the upper limit after the integration has been carried out and the dust has settled, you will have a number. So the expected value of x is a number. Expected value of y is also a number. So with that background, when you multiply e of x into e of y and then f of xy dx dy, e of x and e of y are nothing but constants. And then you have into 1 because the area under the curve or the multiple integral – infinity to plus infinity f of xy dx dy is equal to 1. I would like you or request you to expand this particular expression and carry out the necessary steps to arrive at the final answer. I am deliberately missing out on these steps hoping that you would do them and understand it better. So e of x into e of y times f of xy dx dy after the integration is done. Since integration of f of xy dx dy is equal to 1, you simply have e of x and e of y. So you should have 4 terms. We have accounted for 2 terms. What happened to the remaining 2 terms? It is very interesting. If you look at it, x into e of y, f of xy dx dy will become expected value of y into x of f of xy dx dy. So that would have become e of x into e of y. And this combination multiplied by this function would lead to again e of x into e of y. So you have x into – e of y that is a negative y into – of e of x which is again a negative. So you have – 2 e of x e of y and then you have 1 plus e of x into e of y and so you have – e of x into e of y. So the covariance between x and y finally is e of xy – e of x into e of y. So rather than leaving the derivation to yourself, I thought I will use the board for a change and do the steps myself. Hopefully I have not made any mistakes here. So what we do here is x – e of x into y – e of y f of xy dx dy. So I am multiplying first these 2 terms or these 2 expressions in the 2 brackets xy – x e of y – y e of x plus e of x into e of y f of xy xy dx dy. Even though in the slide I have put – infinity to plus infinity, x can vary from – infinity to plus infinity – infinity to plus infinity for y also. As a general representation, it has been put as – infinity to plus infinity in the slide. So now we can multiply each and every term in the bracket with f of xy dx dy. We get this term and then we know that e of y is a constant. It can be taken outside the integral. So we have x into f of xy dx dy. Similarly, e of x is a constant. So it can be taken outside the integral – infinity to plus infinity – infinity to plus infinity. Here you will have y into f of xy dx dy and this is interesting. These 2 are constants. So you get e of x into e of y into f of xy dx dy. So this is equal to 1. By definition, this becomes e of xy. This becomes e of y and this becomes e of x. So we have e of, this is e of xy, e of xy and this is e of y into e of x, e of y into e of x into e of y. So this becomes – 2 e x e y and this is e of x into e of y. Once you subtract e of x into e of y from – 2 e of x into e of y, you get – e of x into e of y. So you have e of xy – e of x into e of y. So this completes the derivation even though it looks very cluttered and highly mathematical. It is basically very simple. This is a very important result which we will be using pretty frequently. Those of you who are curious may wonder what will happen to the covariance between x and y if x and y are independent. If x and y are independent, they may not have a combined action, a similar action. One variable determining the other variable or influencing the other variable. So intuitively you will have to question what will be the covariance if x and y are independent. What will be the value? Will it be – infinity, 0, 1 or plus infinity and can that be proved from e of xy – e of x into e of y? If x and y are independent, what will happen to e of xy? Will e of xy be e of x into e of y? So please look at this. Look at the covariance between x and y. If x and y are independent, what will happen to the covariance and what would happen to e of xy and what will be the relation between e of xy to e of x into e of y? So to save you the long wait, I am giving you the answers right away. If x and y are independent, it can be shown that e of xy is equal to e of x into e of y and the covariance becomes 0 between x and y random variables. Since x and y are independent, they behave independent of each other, one does not depend on the other. The covariance between x and y is also denoted by sigma xy. Now let us assume that we have the distributions being normal or Gaussian. Let us say that we have 2 independent normal distributions. It can be shown that the linear combination of the random variables based on these 2 populations will also be normal. If there are 2 independent normal distributions, important thing to note here are independency and normalcy okay. So if there are 2 independent normal distributions, the linear combinations of the random variables based on these 2 populations will also be normal. When 2 random variables are independent, their covariance is 0. So what is the significance of covariance? Covariance of 2 random variables x and y indicates how x and y vary with respect to each other. It is a measure of the linear relationship between the variables. So the correlation between variables x and y denoted by rho xy is defined as covariance between x and y divided by square root of the variance of x into variance of y which is sigma xy divided by sigma x into sigma y right. So what we have understood is if the 2 random variables are independent, the covariance vanishes or becomes 0. Now we are going to talk about 2 independent normal distributions. So if you combine 2 normal distributions that the resulting distribution is also normal. If the parameters of the 2 normal distributions are mu1 sigma1 squared and mu2 sigma2 squared, what are the parameters of the resulting normal distribution arising out of the linear combination of the 2? So when you are having 2 normal distributions and you are combining them, it also becomes a normal distribution. What are their properties? The properties of the original normal distributions where mu1 sigma1 squared mu2 sigma2 squared. So the resulting normal distribution what mean and what variance would it have? That is the question we have to answer now. So we look at a general case involving n independent random variables. We will assume that all of them have come from populations that have the same mean mu and variance sigma squared. So we are talking about n independent random variables and all of them have come from identical populations of the same mean mu and same variance sigma squared. So let us say that the variance of the random variable x is v of x and is equal to sigma squared. So x is taken out of a probability distribution function, a normal probability distribution function and the variance of this random variable x is sigma squared. So what would be the variance of x bar? x bar we know is defined as x1 plus x2 plus so on to xn whole divided by n. Sigma x bar squared will be for independent random variables x1 x2 so on to xn. Simply variance of x1 by n plus variance of x2 by n plus so on to variance of xn by n. So we have already seen one of the example sets. If I remember the first example set, if you take a variance of a quantity x1 divided by n, you cannot put n directly outside. It will be 1 by n squared and then variance of x1 will be sigma squared and variance of x2 by n will be again 1 by n squared into sigma squared because x1 and x2 have come from identical distributions of the same mean mu and same variance sigma squared. So we have so on to xn will also be represented by sigma squared by n squared. Variance of xn by n will be represented by sigma squared by n squared. So when you add up all these things, you have n entities n sigma squared by n squared which is nothing but sigma squared by n. This is a very important result. What is the implication or meaning of this result? Don't look at the mathematics. What is the inference you get out of this particular result? You are having a sample and that sample is having a mean x bar. If I take many such samples, not all of them would have the same sample mean, they will not have the same sample mean. So there is also a distribution of the sample means. Different samples will have different means and so there will be distribution of the sample means. It is hardly surprising because x bar is also a random variable and it is also associated with the probability distribution. We are talking about a distribution of the sample means. What is the variance of that distribution? If x, the random variable x came from a population of variance sigma squared, what is variance of x bar? From now on, we will be shifting to a slightly higher level. Instead of talking about x, we will be talking more about x bar. We know that x is a random variable which came from a population of mean mu and variance sigma squared. It might have been a normal distribution or a not normal distribution but properties are mean and variance mu and sigma squared respectively. Now we are shifting gears or moving to a slightly higher level. We are now talking instead of x, we are talking about x bar. x bar is also a random variable. It will also have its own mean. It will also have its own variance because it has a probability distribution. There is a distribution of the sample means. Hence that distribution will have a variance. It will also have a mean. What is the variance of the distribution of sample means? The variance of distribution of sample means is not sigma squared but sigma squared by n. Variance means spread. If I am taking a large number of samples, then there will be a distribution of the sample means. And that spread, if I want to curtail that spread, I do not want that much uncertainty. I want the values to be precise. What should I do? I will increase the sample size n. If I increase the sample size n, you can see that the variance of x bar will reduce okay. So the spread of the different possible sample means will reduce if I increase the sample size. So lot of physical basis is there in this seemingly simple derivation. We have taken a linear combination of random variables which is what we stated at the outset and we try to find its variance. And we wrote x bar as x1 plus x2 plus so on to xn and then divide by n. It looks very simple. It looks too easy to be true. Variance is an operator. If you consider as an operator and we are operating it on a combination or a function of random variables, it appears to be linear operator because it is equal to x1 by n plus x2 by n plus so on to xn by n okay. It looks very simple but this is only applicable when the random variables were independent of each other. If they had not been independent of each other, what would have happened? That would lead to again cluttering the slide or the board with more of these multiple integrals. But all of you may not have the time or patience to do this integration. I am sure there will be many of you who would like to carry out the integrations on paper using pencil and paper. But it is not necessary to do all those to understand the simple basic concepts. If the random variables are independent, this variance of x bar can be represented by v of x1 by n plus v of x2 by n plus so on to v of xn by n and that becomes sigma squared by n eventually okay. The variance of the distribution of sample means about mu is sigma squared by n. So, the question is where did this mu come from okay. We are talking about variance of x bar. You know that expected value of x, x is coming from a probability distribution. Expected value of x is equal to mu. What is expected value of x bar? We will be looking at the derivation also in one of the slides. But I request you to write down expected value of x bar and try to see what would be the resulting value okay. Expected value of x bar would be expected value of x1 plus x2 plus so on to xn by n and what is that value going to be okay. I will just use the board again. So, we have been looking at variance of x bar. Variance of x bar was x1 plus x2 plus so on to xn by n. Expected value of x bar would be expected value of x1 plus x2 plus so on to xn by n. So, this will be expected value of x1 by n plus expected value of x2 by n plus so on to expected value of xn by n. So, unlike the variance where when you take it outside the bracket it became 1 by n squared it will become 1 by n. Similarly, for all other expected values and you will have expected value of x1 plus expected value of x2 plus so on to expected value of xn okay. This is expected value of x bar. What this means is all these random variables are coming from identical distributions of mean mu and variance sigma squared. So, we have 1 by n into mu plus mu plus so on to mu. Expected value of x bar is also equal to mu. A very interesting result and much more simpler than the multiple integrals we did earlier and also simpler than the variance of x bar. So, it indicates that the variance of the distribution of the sample means about mu okay is sigma squared by n. The mean of the random variable x probability distribution function is mu. The mean of the distribution of the sampling means is also mu okay. The variance of the random variable x's probability distribution function is sigma squared. However, the variance of the sampling distribution of means is not sigma squared but sigma squared by n. So, these are very important results and we will be applying them in many problems from now on. I told you that it is not a simple linear operator. There is something more involved here. So, variance of x bar is equal to expected value of x bar minus expected value of x bar squared okay. There are two expectations but one expectation is within the bracket and another expectation is outside the bracket. I hope I am not expecting too much out of your mathematical knowledge. These are pretty straightforward and terminology. variance of x bar is equal to the expected value of any variable about the mean. So, now putting it in terms of the original probability distribution function, multiple distribution or joint probability distribution, we have x bar minus expected value of x bar whole squared into this form. So, when you look at this, it is a matter of expanding the terms inside the brackets or parenthesis, x bar becomes x1 plus x2 plus so on to xn by n and since you are having a squared term that would become 1 by n squared. So, we write x1 plus x2 plus so on to xn minus e of x1 plus x2 plus so on to xn. This e of x bar also had a 1 by n term x bar also had a 1 by n term and when you are squaring it, it became 1 by n squared and that was removed outside the integral. So, you have this form just more convenient to handle and then you have this probability distribution function for x1 to xn. I am as of now not assuming independence between the random variables. This can be written as 1 by n squared into x1 plus so on to xn minus e of x1 plus so on to xn whole squared f of x1 to xn, the random variables x1 to xn dx1 to dxn. What we do here is, we will collect individual group terms squared plus the binary product of the individual group terms. So, I am having this x1 minus e of x1. I am just simplifying this term. I am collecting terms of the deviations. So, x1 minus e of x1 x2 minus e of x2 xn minus e of xn within the bracketed term that is eventually squared. So, you have x1 minus e of x1 deviation of the first random variable about its mean, the second deviation, the nth deviation and so on into the joint probability distribution function. So, when you do that, after taking the square, you will have the squared of the deviations and also the cross product of the deviations. This is very important. You are having the squared of the deviations and this cross product of the deviations. The squared of the deviation times the probability density function will represent the variance and the cross product term times the probability density function would represent the covariance. So we get V of x bar equals V of x1 plus V of x2 plus so on to V of xn plus twice the sum of all the covariance terms between xi and xj okay. This may be a bit difficult for some people to follow. You may carry out the same derivation with 2 random variables x1 and x2 and you can see that it will reduce to variance of x1 plus variance of x2 plus 2 times the covariance between x1 and x2. When you have more random variables it is a simple extension and you will get the sum of the variances plus the sum of the cross product terms or the sum of the covariances divided by n squared. If the covariance between xi and xj was 0 because xi and xj was where independent random variables and if all the random variables were independent of each other so that any combination between them will lead to a 0 covariance then the entire sum of cross product terms or the covariance terms will vanish and you will have V of x1 plus V of x2 plus V of xn by n squared. So much of mathematical background is behind the simple expression for variance of x bar okay. Suppose you had difference of random variables V of x1 minus x2 then the covariance term here would have a negative coefficient okay but the actual variance terms would all be having positive coefficients or positive unity in this case. So if you had V of x1 minus x2 it will be variance of x1 plus variance of x2 not minus variance of x2 but plus variance of x2 okay. The negative sign corresponding to that x1 minus x2 would have come in the coefficient of the covariance okay. This is very important and follows from rigorous mathematical background. So if the population distribution is normal with mean mu and variance sigma squared then the sampling distribution is also normal with mean mu and variance sigma squared by n. We assume that the random sample entities or the random variables constituting the random sample are independent of one another. Now we have been talking about a lot of variances okay. So we need to be sure that we have understood them properly. What is the difference between sample variance s squared and variance of x bar? The sample variance is the variance of the sample you have taken okay and that sample is averaged to give the sample mean x bar okay. S squared is the variance of the sample okay and variance of x bar is the variance of the sample mean okay. S squared refers to a specific sample. Variance of x bar refers to the many different samples that have been taken okay. So each sample would have its own sample mean. Each sample mean may be different from one another and hence v of x bar denotes the variability of the sample mean okay. S squared is a random variable and is an estimator of the population variance sigma squared. We do not know about the population variance sigma squared. So we take the sample find its variance because a sample will have x1, x2 so on to xn. So we take the values of the random variables and find the s squared and the sample variance is based on a single random sample of size n drawn from the population. It is defined as for that particular sample i equals 1 to n xi-x bar whole squared by n-1 okay. Now when you are talking about sample mean, we add up all the random variables and then divided by the sample size to get x bar. Now if you draw many samples from the population, you calculate the sample mean for each of those samples using the entities of those samples using the same formula x1 plus x2 so on to xn by n okay. So each sample will be calculated for its sample mean using the same formula but each sample will involve its own entities okay. So you are calculating sample mean for the first sample as that first sample's entities x1 plus x2 plus so on to xn divided by n. The second sample will have again n entities and using those n entities, you will use the same formula to find the second sample mean. The first sample mean and the second sample mean need not be the same. Similarly if you draw many such samples, those sample means will not be the same. So there will be a distribution. So the variance of that distribution of sample means is denoted by V of x bar or sigma x bar squared okay. So we have now distinguished between the sample variance s squared and the variance of the distribution of sample means variance of x bar or sigma x bar squared. How did that come about? We have already seen. We assume that the sample means are comprising of entities that were independent of each other and they were identically distributed. So taking a particular sample mean, we saw that variance of x bar was defined as 1 by n squared into variance of x1 plus variance of x2 plus so on to variance of xn plus the sum of the covariances between xi and xj taken and turned okay. So since the random variables were independent, all the random variables were independent of each other. All the covariances disappeared and you had variance of x bar as sigma V of xi by n squared. And we also made the further assumption that all these entities have the same variance. So sigma 1 squared plus sigma 2 squared plus so on to sigma n squared will become n sigma squared by n squared or sigma squared by n okay. So finally to interpret variance of x bar, many random samples may have been drawn from a population. Each of them may have a different sample mean. So there will be distribution of sample means and the variance of this distribution is V of x bar. V of x bar is given by sigma squared by n. Larger the sample size, smaller would be the variance of x bar. So it means that if you take large enough sample, there will be less difference between the different samples that you have taken, different samples that you have taken. So the distribution of sample means will become more narrow if you increase the sample size. We also are now defining another term called as statistics. Any function of the random variables x1 to xn is termed as a statistic. Since x bar and s squared are taken from the sample random variables by using a mathematical definition for each case, they are referred to as statistics. Since they are also random variables, they have their probability distributions associated with them. For the time being, our focus is on the sampling distribution of the sample mean x bar okay. Later on, we will be looking at the distribution of the sample variance s squared. Each is described in terms of unique probability distribution functions which constitutes the fascinating variety in the field of statistical analysis. So this sort of concludes our discussion on the distribution of sample means. We have seen what is meant by a sample, what is to be done in order to make the sample random and what are the properties of the random sample. And if there are many random variables in the random sample, they have to be considered mathematically together in terms of a joint probability distribution function. Fortunately, in our case, the random variables were independent. So some simplification was possible to the joint probability distribution multiple integration. And we were able to find that the expected value of the sample is mu and the variance of the sample is sigma squared by n. Here the sample size n also plays a very important role in determining the shape of the distribution. So we will conclude at this point and we will proceed to the next phase in a very short period of time. Thank you.