 Hello, welcome back to the course on statistics for experimentalists. Today we will be looking at the random samples and we will be looking at the sampling distribution of the mean. As before, the prescribed textbook for this course and especially the current lecture is the one written by Montgomery and Runger and I also referred to Oganaik's book on random phenomena. The notation I am following are from the Montgomery and Runger's book. What have we done so far? We have looked at random variables, the discrete and continuous probability distributions, their properties like mean and variance. We also looked at median and also to mode, but more frequently we will be looking at the mean and variance. These form the basis for our analysis of experimental data. We have seen that the experimental data may be scattered and we need to hence find the average value of the response and also quantify the scatter. We can talk of a large population from which the experimental data are sampled. Hence, we have to look at the properties of samples, what are their desirable characteristics, how we should sample and their attributes. So, let us now look at the population, random sampling and their properties. The population may be fictitious or real. We can assume that the data or the sample being collected is coming from a certain population. It may be a fictitious one. Sometimes the population may also be real and we can directly relate to it. Let us look at a real population. The population may be the students community which have taken let us say a course in chemical engineering over the last 30 years or 35 years. Here I have put 20 years, but it is a popular mass transfer core course. So, I guess even at the beginning of the institution, the course would have been there. So, we may be looking at the performance of students who have taken this course since it has been offered. It can also be a fictitious population. It may represent industrial washers that are being prepared by using a particular manufacturing process. We do not really need to look at the entire population. Knowing about the entire population may not be practical because it may comprise of let us say millions of entities and trying to get or record their attributes would be a complete waste of time. So, we know that there is a population and we are going to take representative elements from the population. It is important to understand the characteristic features of these populations so that we can make proper decisions. We can also pass judgment on the quality or make corrections or changes from a marketing point of view. It also helps us to set our goals, objectives and the settings at which the processes should be run. A population as I said is a very large entity okay and trying to understand the properties of the entire population is a head clean task and so we need to take a sample out of the population. So that we can try to infer the populations characteristics by knowing the values of the sample. So, from the samples attributes like the sample mean and the sample variance we try to get an idea or estimate about the population mean and variance. So, for example college students majoring in mathematics across the nation may be considered to be a population and the students overall grade or performance may vary over a certain range because all students do not perform alike okay. So, there will be a distribution in their performances. We need not assume that the distribution of the students grades will be normal or Gaussian okay that is the simplest or the most direct assumption we tend to make. However, that may not be correct okay. We do not know the populations distribution unless we have prior data or historical evidence okay. So, we really cannot assume about the populations probability distribution. We also do not know what is the average of the entire population. We also do not know what is the standard deviation of the entire population. In this particular example we do not know the average performance or the average mark of the students population in the nation. We also do not know how their performances or grades are spread. In other words we do not know the standard deviation or variance of the entire population. So, we have to get an idea how these students majoring in mathematics are performing. We take a sample from the population okay. The sample should be a random 1 and we also hope that the random sample we have picked from the population is sufficiently representative of the entire population. So, that we can understand the population better. The sample should not be biased towards a certain group. The sample should comprise of independent observations which are coming from the same population. So, they should be coming from a population which is having the probability distribution identical. When you conduct a sample you obviously follow certain precautions. For example, in an opinions poll survey all the sampled elements should be above the voting age. So, that all are eligible for considering their voting performances and we can get the correct information. There is no point in asking a person for his preference if he or she is not eligible for voting. And in some colleges and institutions surprise his quizzes are conducted and if you want to know an opinion whether the surprise quizzes are useful from an academic point of view then you should conduct a survey at the beginning of the course or at the end of the course where the students would have probably benefited from the surprise quizzes and they would have been more well prepared or they would have a mixed opinion. On the other hand, if you ask angry students coming out of a surprise quiz whether such modes of evaluation help to improve their exam preparations. Most of them or all of them in fact would be biased towards a negative response to such a query. So, you cannot really say that this truly representative of the collective opinion regarding these conduct of surprise quizzes. The sampled elements should be independent of each other and each element in the sample should have equal chances of being picked. Another thing is the sample size. What is the optimum sample size? Obviously, you cannot sample the entire population but if you sample a very large number obviously you are going to get the response more accurately or more precisely. However, it is not very practical to sample a huge response. Sometimes sampling the responses from a population may also mean that you may have to conduct some destructive tests on the specimens and it is not economical to destroy a large number of specimens towards the purposes of sampling. So, the sample size is very important. However, it is intuitively evident that larger the sample size more confident and more precise we feel about the responses, right. So, we have to understand the basic features of these random samples after understanding which we hope to understand the population. We can obviously take more than one random sample just be on the safe side. For example, many opinion polls are conducted by independent news agencies and their predictions are not identical. So, coming back to the random variable, we know that random variable is described by a probability distribution and any combination of random variables, any mathematical combination of the random variables will also result in a random variable. So, a combination of random variables may also be treated as random variables themselves. For example, the sample mean and the sample variance which are obtained by collection of random variables may be treated as random variables themselves. So, the sample mean is a random variable, sample variance is also a random variable. Let us say that we have conducted survey or we have done the sampling and we have taken n entities from the population. Let us denote them by x1, x2 and so on to xn. So, these are all random variables and these are independent. So, to make further progress in random variables and random samples, we need a bit of mathematical background which will be provided in the next few slides. Let us first look at a few definitions. Some of these definitions you might have come across earlier. I am repeating them so that you become more familiar with them and also there are some small differences between various definitions which you should be aware of. So, if you look at the population of let us say a finite and large value, the mean may be defined as mu is equal to, sigma is equal to 1 to n where n is the size of the population xi by n. We really do not care to sample the entire population so we really do not know what the exact value of n is. Most likely it is a very large number but we do not worry too much about the population mean even though it is central to all our discussions, we do not really know its exact value. So, we are having a discrete collection of random variables taken from the population and if we attribute the same probability to each and every entity in the population, we have the probability as simply 1 by n. There are n entities in the population and if each is having the same chances of being selected or picked then the probability would be 1 by n. So, the mean becomes simply sigma is equal to 1 to n xi by n. We assume that the xi random variable has identical probability from i is equal to 1 to n. The population mean mu may be interpreted as the average of all the measurements in the population. So, we know the value of mu. Let us say it is known to us beforehand. This is very important that we have knowledge of mu somehow. For example, it may be a hypothetical stated design specification. So, that is giving us the value of mu. When mu is known already the variance sigma squared since we are talking about the population we are talking about sigma squared. Just as we talked about the population mean and represented with it with mu the population variance is represented by sigma squared. So, we have sigma squared is equal to sigma i is equal to 1 to n xi-mu whole squared by n. Note that we are using the entire population size n here. Earlier in the sample variance we used the sample size-1 but here we are using capital N which is the entire size of the population. This sigma squared is also based on the fact that all the xi values have identical probabilities and the population mean mu is assumed to be known. Now let us define the sample mean. The sample mean is collected from a population. Obviously, we cannot sample the entire population. So, we take a sample of size small n and the small n value is much, much lower than the capital N value. In other words, the sample size is much smaller than the population size. So, the sample mean is defined as x bar is equal to sigma is equal to 1 to n xi by n where n is the sample size. xi are all the entities in the random sample collected. They are assumed to be random and identically distributed and also have equal probability of being selected. This is a sum of random variables and we are totaling them and dividing it by a constant value the sample size n and hence this is a mathematical function of the random variables resulting in the sample mean x bar okay and since these are random variables, a function of the random variables is also a random variable. So, we may treat x bar as a random variable. It is a quantity derived from the random variables. It is also a random variable and so it will have a probability distribution. A sample variance is defined as s squared is equal to sigma i is equal to 1 to n xi-x bar whole squared divided by n-1. Note that we are using capital S squared and we are also using capital Xi and capital X bar in these definitions. Earlier in the sample mean also we used capital X bar. As of now we are manipulating random variables which are abstract entities until the experiment has been carried out or the sample has been collected. So, they are abstract entities until then and we are defining another abstract entity x bar in terms of these random variables. So, we are using capital X here and also we are using capital X here. Similarly, before the experiment is conducted or the sample has been selected we are talking about abstract entities and so we have xi capital X capital X bar and this s squared is also having a capital S. Another feature is we are using n-1 here. We have already seen why we should be using n-1 in this definition. n is of course the sample size and we use n-1. It is a measure of the degrees of freedom and when we refer to degrees of freedom we always talk about independent entities. The degrees of freedom represent number of independent entities and since X bar was not known beforehand it was calculated from the sample. We are using the same sample to find X bar and then trying to find the variance of that particular sample. So, the n deviations xi-x bar are not independent of each other. So, we use n-1. Another important thing is it is a measure of the deviation from the mean. Ideally, this should have been mu but we do not know mu. It is only X bar and the xi values are closer to the sample mean than to the population mean. The X bar itself has been so defined that the xi values are clustered around it and it is in the center of all these random variables but the mu need not be at the exact center in which case this xi-x bar whole squared and you sum that for all the entities in the sample is likely to be smaller than if you had summed the squares of xi-mu okay. So, the important thing is when you are trying to calculate the spread based on the sample, the spread is likely to be smaller because the xi values are clustered around X bar and their distances from X bar are effectively smaller than the distances from the unknown population parameter mu. So, you are trying to reduce the scatter by basing the variance definition on X bar. So, this reduction of the scatter may give a false sense of security. You may feel that there is not much scatter in the data okay. And to compensate for that, what we do is if this deviation squared is slightly smaller because we are using X bar instead of mu, here also we reduce the degrees of freedom and put it as n-1 instead of n okay. So, the numerator has decreased and the denominator has also decreased. So, there is a compensating effect. So, the s squared is more reliably estimated and it is a true reflection of the population variance sigma squared. Remember the population is having parameters mu and after sampling we get X bar and we hope that X bar is suitably representative or sufficiently representative of mu. We also get the sample variance based on the sample and we hope that s squared is sufficiently representative of sigma squared. So, there are 2 reasons for using n-1. The first one is sigma i equals 1 to n Xi-X bar is equal to 0. So, this is a constraint which tells that only n-1 Xi-X bar are really independent. If we know n-1 Xi-X bar values using the fact that the sum of all the deviations is equal to 0, we can find the nth deviation. We have n-1 deviations. We also have this particular constraint that the sum of the deviations is equal to 0. So, using the n-1 deviations and this constraint, we can find the nth deviation. So, the nth deviation is truly not an independent one and using n-1 also helps to balance the effect of the decrease in the numerator. This is also what I discussed a couple of slides back. Now, we come to joint probability density functions. Some of you may not really follow the detailed math which I am going to discuss shortly. However, there are 2 ways of doing it. One, you look up your favorite book on calculus, try to understand about integration, multiple integration. Multiple integration is very straightforward. It is direct extension of the simple single integration. Another way is you do not have to understand the derivation, but please understand the final proof, okay. That is very important. Even if you did not follow the mathematical derivations, you can also try to understand the conclusion, the main conclusion which is coming at the end of the derivation. A joint probability distribution involves 2 or more random variables. They are described as the name implies in a joint fashion, okay. There are 2 random variables which are occurring together. So, we will be first starting with 2 random variables, the joint probability distribution of 2 random variables and then we will take up the extension of more general case where n random variables are jointly described. If you look at the joint probability distribution functions, there will be lot of similarities with the single random variable probability distribution function. So, use that as the basis to understand the multiple random variables case. The first property is f of x, y, x, y should be greater than or equal to 0 for all x and y. So, this is a probability distribution function involving random variables capital X, capital Y and it is a function of small x and small y and it should be positive. This small x and small y represent the values of the random variables x and y after the sample has been carried out or the experiment has been carried out. Just as we see that the sum of all the probabilities should be equal to 1, we are looking at continuous probability distribution functions. And so, we have f of x, y, x, y dx dy is equal to 1. Earlier in the case of a single random variable, we had the integral f of x dx is equal to 1. Now, we are having f of x, y dx dy equal to 1. The f of x, y is defined such that probability of the random variable x and y belonging to region R in a 2 dimensional space is given by double integral of R f of x, y, x, y dx dy. So, it is just a statement defining the probability of the 2 random variables belonging to a particular region and that is given by the double integral. Double integral represents the area, okay. So, we are talking about a random variable x which is covering a certain length and we can think of the random variable y which is covering a certain width. So, a joint distribution of both x and y would cover a rectangular region in the 2 dimensional space or a rectangular surface and that is given by the double integral f of x, y, x, y dx dy. Now, we come to a very useful and interesting property of the joint probability distribution function. If the 2 random variables which are defined together are in fact independent of one another and they are also identically distributed, okay. They are independent and identically distributed. We get probability of x less than or equal to x, y less than or equal to y. We have again the double integral going from minus infinity to y, minus infinity to x f of y into y into f of x into x dx dy. So, earlier it was f of x, y, x, y and now it has become f of x into x into f of y into y. The fact that we have split them into 2 functions implies that the random variables x and y are independent and they are also having the same functionality. Only the variable is different. Here it is x and here it is y but the functional form is identical. This shows that the 2 random variables are independent but they are identically distributed. One can say that equations are also like pictures. What can be said in many words can be represented in one photograph or in one equation. So, when we are considering the joint probability density function involving p non-independent random variables, the f of x1, x2 so on to xp should be greater than or equal to 0 for all values of x1, x2 so on to xp and then the area under the curve should be equal to 1. So, earlier we were talking about 2 random variables. Now we are going in for a general case where we have p random variables and the functional form is given in this manner. So, if you look at a region in the p dimensional space because there are p random variables, you have the integral, multiple integral, encompassing or surrounding the region R, the p dimensional region and that probability value is obtained by calculating this multiple integral by using the joint probability distribution function f of x1, x2 so on to xp into x1. This is not into, this is actually a function of x1, x2 so on to xp dx1 dx2 so on to dxp. So, what is the real meaning of this argument? We know that the random variable will take a particular value. But when we are talking about continuous probability density functions, the probability at a particular value whether it is a single random variable or many random variables, the probability at a particular point will be equal to 0. Just as the weight of a conical block of wood is 0 at a point. Only when we consider the thickness of the conical block of wood and its surface area, we can talk about volume and then we know the density and then we can talk about its weight. So, we are also saying that in the entire space, we are talking about a region R and so what is the probability of the random variable x1, x2 so on to xp falling within that region R and that would be represented by coordinates for x1, x2 so on to xp. This lower limit may be minus infinity or it may be another lower boundary. So, we are talking about the n dimensional space formed by all these random variables and in this n dimensional space, you can have the lower boundary and the upper boundary. This is very interesting. However, we may not be really doing some problems with these multiple integrals. We have to just understand the concepts. So, we can talk about the expected value of a particular random variable xi and that is given by mu of xi is equal to expected value of xi and that is given by the multiple integral ranging from minus infinity to plus infinity. That particular random variable representative or the sampled value or the experimental value xi into f of x1, x2 so on to xp as a function of all these random variables x1, x2 so on to xp, dx1, dx2 so on to dxp. So, we do not put any numerical value here. That is an important thing. We do not put 5 or 0.5 or anything. We are just saying that the random variable corresponding to xi, xi may be x1, x2 or xp. So, for that particular random variable, what is the expected value given this joint probability density function. The expected value of that particular random variable obtained from the joint probability density function. So, we plug in xi, the variable xi corresponding to the capital xi random variable and then we multiply with the joint probability distribution function, carry out the integration and the limits of the integration are from minus infinity to plus infinity. Since we have carried out the integration from minus infinity to plus infinity as a result of this exercise, we will get a numerical answer provided this function is well defined. Similarly, we can find the variance of xi. Whatever we did for the single random variable, we are now doing it for again a single random variable but now described in terms of its association or combination with the other random variables x1, x2, so on to xp. So, variance of a particular random variable xi is obtained by again carrying out the multiple integration between the lower boundary to the upper boundary. Here, it is minus infinity to plus infinity as a general case. So, this xi is the value of the random variable xi we are looking at. Here xi is not a numerical value, we write it only simply as small xi and then we put minus mu of xi which is the expected value of the capital xi whole squared into the probability distribution function. In the simplest case where we had only one random variable, where we had only one probability distribution, we had written as f of x dx x-mu whole squared. Now, we are writing in terms of x-mu whole squared, mu was obtained from the expected value of xi and then we are multiplying with the joint probability distribution function. We get the variance v of xi. Again, this will be a value because you are integrating it from minus infinity to plus infinity, it will be a numerical value. So, the independence of p random variables is going to be discussed in the current slide. The random variables x1, x2 so on to xp are independent if and only if f of x1, x2 so on to xp, x1, x2 so on to xp this is small x and this is capital X is obtained from the product of the individual probability distribution functions of x1, x2 so on to xp okay and important thing is they are all identically distributed. In other words, the functional forms are the same, only the arguments of these functions are different. So, this is a joint probability distribution function and if the random variables forming the joint probability distribution function are independent of one another, we can multiply with the individual probability density functions. This is a very useful result. Now, we will be looking at the product of two random variables. So, just to make yourself familiar, you may look at these equations. Expected value of x is the particular representative of capital X that is small x, f of xy, xy dx dy. We are talking about a joint probability density function involving two random variables x and y. Similarly, we have expected value of y is equal to minus infinity to plus infinity y f of xy, xy dx dy. Now, this is the important thing. Here, we are writing it as a product of two random variables xy. There is nothing to worry or panic. We simply put it as the representative of capital X which is small x, representative of capital Y which is small y and then we put the same joint probability distribution function and carry out the integration and report the final answer. So, we are having E of x here, E of y here and E of xy here. So, we will take a small break here. We have looked at population. We have looked at sample. We have also seen that by looking at the properties of the sample, we try to infer the properties of the population. So, the sample's properties to be a good representation of the population's properties, we need that the sample is indeed random. Each element in the sample should have equal probability of being picked and they should also be identically distributed. Then, we saw that a sample involves multiple random variables. A function of multiple random variables is also a random variable. Since we are talking about a collection of random variables, we also need a brief mathematical background on the joint distributions of the random variables. Fortunately for us, the sample comprises of independent random variables. So, the probability density function for the joint random variables case gets considerably simplified. We also assume that they are identically distributed. The random variables are identically distributed. In which case the individual probability distribution functions are also identical. So, considerable simplification has been enabled by the virtue of our requirements. And then we looked at the expected value of the random variables in the joint probability distribution conditions. We also found that just as we can define expected value of x, expected value of y, we can also define expected value of a combination of random variables x and y. We will proceed after a small break at this point.