 Hello, welcome back. In today's lecture, we will be solving a few problems. The reference I used for solving one of the problems is the book written by Ramchandran and Sokos, Mathematical Statistics with Applications, Academic Press published in 2009. It has an interesting set of both examples and problems. So the topics covered in this example set are properties of random samples, applications of the central limit theorem and maximum likelihood estimation of the parameters and also the method of moments. The first example is two random samples come from two different populations P1 and P2. The two samples are also of different sizes 9 and 25. The two sample distributions, however, are to have the same standard deviation, what should be the ratio of their respective population standard deviations. So we are asked to find the ratio of the population standard deviations such that the two unequal sized samples have the same standard deviation. So depending upon the size of the sample, you can have different sampling distributions. You also know that the sampling distributions of the mean are centered around the population parameter mu itself but have a lesser spread given by sigma squared by n where sigma squared is the variance of the population from where from which the random sample was taken and n is the size of the sample taken. So using this information, we can do the following. I have given a table here. In this table, you can see the population parameters listed P1, P2, mu1, mu2. The two population means sigma 1, sigma 2, the two population standard deviations and of course, a population would hypothetically comprise of infinite size, a very, very large size and when you go to the sample, again the sample probability distribution of the means will have a mean value of mu1 and mu2 for sample 1 and sample 2 corresponding to the two populations from which they were taken. Standard deviation is sigma 1 by root n1, sigma 2 by root n2 and what should be the ratio of sigma 1 by sigma 2 such that these two are equal. So the question is very simple. So sigma 1 by root n1 equals sigma 2 by root n2 and then we have sigma 1 by root n1, root n1 would be root of 9 so that is not difficult to get. Sigma 1 by 3, sigma 2 by n2, what is n2? 25, again that is easy to get, root of 25 is 5. So you have sigma 2 by sigma 1 is 1.67 and sigma 2 square by sigma 1 square, the ratio of the two population variances would be 25 by 9 which is 2.78 rather than doing the mental mathematics. Due to the calculator 25 by 9 that is 2.777 so on so we can truncate it to 2.78. So the second population variance was 2.78 times more than the first population variance but the second sample distribution variance was identical to that of the first as the second sample size was also higher by 2.78 times the first. So when you normalize the variances of the two different populations by the sample sizes taken in this case they were equal because the sample size taken from the second population was higher than the first. So this sort of balanced out the higher variance of the second population okay. Let us go to the next example. Again it is a simple example. You have two random samples x1 bar and x2 bar. They come from two independent normal populations n1 mu1 sigma 1 square and n2 mu2 sigma 2 square. The two samples are also of different sizes namely n1 and n2. So find the mean and variance of the following linear combinations x1 bar-x2 bar then x1 bar plus x2 bar. Very nicely the problem statement gives us all we require. It says that the two parent populations are normal and they are also independent of one another. So when you take a random sample out of these two populations we have to get the random sample means that is easy. So you will have x1 plus x2 plus so on to xn divided by n and again x2 would be from the second population. Again you add up all the attributes or values of the random sample elements and then divided by that particular sample size. So you will get sample mean 1 and then you will also get sample mean 2. The important result is suppose you take random variables x1, x2 they come from independent normally distributed populations then a linear combination of x1 and x2 would also be a normal distribution. That is an important result. Now we are having x1 bar and x2 bar. x1 bar in turn is formed by taking the elements of the first sample adding all the attributes of those sample elements dividing it by the sample size. Similarly do for the second random sample. So now you are going to combine these two. So rather than thinking of them as x all the elements divided by n1 then all the elements of the second random sample divided by n2 you think of x1 bar and x2 bar as random variables themselves and they are coming from two independent populations. So the distributions of x1 bar and x2 bar are independent of each other and if you think on these lines it is easier to proceed further. So now we have to find the mean and variance of the two linear combinations. Why I gave this example is we encountered such cases very frequently even in doing different kinds of problems. So the following linear combinations of random variables will also be normal distributions as the two random variables are independent and normally distributed. So these would also be normal distributions. So this would be one normal distribution. This would be another normal distribution. What are the mean and variances of such normal distributions for the two cases? So expected value of x1 bar-x2 bar would be expected value of x1 bar-expected value of x2 bar that would be mu1-mu2 and that is represented as mu of x1 bar-x2 bar mu of the probability distribution formed by x1 bar-x2 bar. Again you have expected value of x1 bar plus x2 bar that would be expected value of x1 bar plus expected value of x2 bar that is equal to mu1 plus mu2 which is represented by mu of x1 bar plus x2 bar. So the linear combinations of the probability distributions of x1 bar and x2 bar would also result in a normal distribution which is centered at mu1-mu2. Well you can ask what would happen if mu1 is greater than mu2? No problem it is a positive value. If mu1 is less than mu2 it is a negative value. So what? Let the resulting probability distribution be centered on a negative value. There is no harm in that. So again if you look at expected value of x1 bar plus x2 bar that would be E of x1 bar plus E of x2 bar which is mu1 plus mu2. So when I am taking a linear combination of independent random variables which are normally distributed I am going to get a resulting probability distribution which is also normally distributed and having the mean at the sum of the means of the 2 probability distributions I am adding. So this is again quite straightforward. Let us look at the variance. The variance is quite interesting. The expected value was sign dependent depending upon what was a sign used here. But when you look at the variance of x1 bar minus x2 bar is variance of x1 bar plus variance of x2 bar. Variance of x1 bar plus x2 bar is variance of x1 bar plus variance of x2 bar. So the negative sign or positive sign does not matter. The negative sign or positive sign would really matter when you look at the covariance. And here in the first case it will be minus covariance of x1 bar and x2 bar. Here it will be plus of covariance of x1 bar x2 bar. But the covariance will vanish because x1 bar and x2 bar are independent. So we simply have variance of x1 bar plus variance of x2 bar in both the cases. So summarizing the results from this example we have the random variable x1 bar having a mu1 as mean and sigma1 squared by n1 as variance. So the standard deviation would be sigma1 by root n1. X2 bar for the second case, again I think it is better if I sort of go back a little bit. What is x2 bar? This is the random sample taken from a second population. The second population is normally distributed. So you take the elements of size n2 then add the attributes or values of these elements divided by n2 you will get x2 bar. And similarly you can take many such random samples from the second population and each one would have a different average value. So they will form a distribution of the sample means. This distribution of the sample means would be normal with the mean at mu2 and variance at sigma2 squared by n2. What is mu2? It is not only the mean of the sampling distribution but it is also the mean of the parent population from where the random sample was taken. And sigma2 squared again is the variance of the second parent population and the variance of the probability distribution of the sample means taken from the second population will be smaller and it will be given by sigma2 squared by n2. So the standard deviations of course would be sigma1 by root n1 for the first case sigma2 by root n2 for the second case. And x1 bar-x2 bar a linear combination of the two random variables would have a mean of mu1-mu2 we saw that it is sign dependent we just saw it a couple of slides back. And the variance would be variance of x1 bar which is sigma1 squared by n1 plus variance of x2 bar which is sigma2 squared by n2 and so they are added up. And when you take the standard deviation it would be square root of sigma1 squared by n1 plus sigma2 squared by n2. When you take x1 bar plus x2 bar as the linear combination of the two random variables the two random sample means they will be distributed around mu1 plus mu2 at the center and having a variance or spread given by sigma1 squared by n1 plus sigma2 squared by n2 the square root of that would be sigma1 squared by n1 plus sigma2 squared by n2. This applies for independent distributions when x1 bar and x2 bar are independent of each other then this results I have shown here would apply okay. So importantly I would like to reemphasize the two random samples are independent and normally distributed as they were taken from two independent normal distributions. Hence the linear combination of these random variables also obeys the normal distribution. Since it obeys the normal distribution we can express this in the standard form so that we may use the probability tables. So when you express them in the standard normal form it becomes quite straightforward x1 bar may be in turn normalized by subtracting mu1 and x1 bar minus mu1 divided by sigma1 by root n. Let me just correct that typo and so we have z1 is equal to x1 bar minus mu1 divided by sigma1 by root n1, z2 is x2 bar minus mu2 divided by sigma2 by root n2 and if you look at x1 bar minus x2 bar you can treat it as another random variable with mu1 minus mu2 and standard deviations square root of sigma1 squared by n1 plus sigma2 squared by n2. So the random variable combination x1 bar plus x2 bar may be expressed as shown here. So this is a very nice way of putting it in a compact form and then we may use the standard normal probability tables to do the necessary calculations. Now let us look at example 3. The problem statement goes on like this from historical data the yields of power from a nuclear reactor supplied by XYZ company or normally distributed. This reactor supplied by this company is operated in several plans around the world. The population standard deviation based on process design specification is 0.7 gigawatts. The average power output of power from 6 random measurements taken at a plant using this reactor is 2 gigawatts. However the XYZ company had guaranteed an average power output of 2.3 gigawatts from its reactors. Obviously the client organization using this reactor is getting an average power output of 2 gigawatts and it is concerned because it is supposed to produce 2.3 gigawatts but it is producing only 2 gigawatts and that may lead to loss okay. And when the company is contacted the company says do not worry the thing is normal it is only a random fluctuation or a random variation. Even if you have taken the means the difference is because of random fluctuation. But the company said if it is random fluctuation on the positive side if we had got 2.6 gigawatts that would have been nice but we are getting only 2 gigawatts whereas you are promising 2.3 gigawatts. So there is an issue here and we have to see what is the probability of the average power output from the plant being 2 gigawatts even though the actual mean value is 2.3 gigawatts. Coming again what we have to do is there is a distribution of the sample means and the mean value is 2.3 gigawatts. So from this sampling distribution of the means probability distribution what is the probability of picking up a sample with the mean power output of 2 gigawatts. If the probability is quite high then the probability of occurrence of such kind of events is quite high. So we can only attribute it to random effects we cannot say anything more. However if the probability of picking up a sample of mean power output of 2 gigawatts is pretty low from a sampling distribution of the mean of 2.3 gigawatts then we have to question the supplier okay. So we have to look at the sampling distributions of the means. Since we are talking about the mean power output we are referring to the sampling distributions of the means and they also have a probability distribution. So the population mean is given as 2.3 gigawatts sample mean x bar I am using small x bar because sample has been taken at its value known and that is 2 gigawatts only population standard deviation based on design specification is 0.7 gigawatts having the same units as the mean power output and sample size is only 6. So it is given that the population is normal and the value of sigma is also known which makes life easier for us and so we have to find out the probability of the power output being less than or equal to 2 megawatts from the given data and x bar I am normalizing it again x bar-mu1 by sigma by root n 2-2.3 divided by 0.7 by root 6. Let me sort of check it out. So I should be doing-0.3 into root 6 divided by so I am getting-1.04978-1.05 is okay and so what is the probability that x bar would be less than or equal to 2 which is equivalent to asking what is the probability of the standard normal variable is z less than or equal to-1.05 and the probability is 0.147. So the probability of the sampled mean being lower than or equal to 2 gigawatts is rather high at 0.15 okay. So the company is saying the mean power output is 2.3 gigawatts it is not stopping there it is also saying that the standard deviation of the normal distribution is 0.7 gigawatts. Now we are talking about the sampling distribution of the means the probability distribution of the sample means and the probability distribution of the sample means is centered again at 2.3 gigawatts and having a spread given by 0.7 by root 6. So what is 0.7 by root 6 0.286. So there is a spread of 0.286 gigawatts around this particular sampling distribution of the mean. So the standard deviation is 0.286 gigawatts the company is getting only 2 gigawatts. So when we do the calculations for the probability of this occurrence namely the occurrence of 2 gigawatts or lower when the sample is taken from a sampling distribution of the means centered at 2.3 gigawatts and standard deviation of 0.286 gigawatts. The probability comes to 0.147 which is rather high. So you really cannot question the supplier because 0.15 is a good reasonable chance of occurrence of this kind of event. So if you do the plotting with the mini tab this is a normal distribution centered at 2.3 gigawatts and having a standard deviation of 0.7 divided by root 6 which is 0.286 gigawatts. So this is the spread and I am looking at the probability of occurrence of 2 gigawatts or lower from this probability distribution and I am finding the probability the area under the curve in the shaded region which comes to 0.147. So moving on to the next example the plant contest the claim of the manufacturer that has claimed the population standard deviation of 0.7 gigawatts is rather large hence by mutual agreement the standard deviation is not used but more measurements namely 41 are carried out. So that 0.7 gigawatts is thrown out of the window and you are no longer even thinking of the population being normally distributed that is not mentioned in the problem statement whereas in the previous problem statement was given that the population was normally distributed but you are also taking a large sample size of 41. The sample mean now comes to a slightly higher 2.1 gigawatts but the sample standard deviation is 0.85 gigawatts the sample standard deviation is even higher than the design specification value of 0.7 gigawatts. What would be the probability that the observed mean output or lower is possible? What is the probability of this occurrence that you can get a sample mean of 2.1 gigawatts or lower that is what we have to find now. So conditions have slightly changed population mean value mu is 2.3 gigawatts sample mean x bar is 2.1 gigawatts sample standard deviation s is 0.85 gigawatts sample size is 41 we are no longer using the population standard deviation of 0.7 gigawatts. So we are not supposed to use sigma but we can use s the sample standard deviation when s is used that is permitted because the sample size is quite large we can even continue with the normal distribution according to the central limit theorem. The central limit theorem says that irrespective of the population probability distribution characteristics if a large sample is taken typically greater than 30 then the resulting sampling distribution of the means is also normal. In the present case the parent population we do not have to worry about because the sample size is quite large and so the central limit theorem will apply and so the sampling distribution of the means is going to be normal. And since we are going to use s because sigma is not available for use the s value may be substituted for sigma in the calculations and we are also having a large sample size of 41 to account for it. So the problem calculations are quite straight forward instead of using sigma here we use s we have x bar – mu by s by root 10 and that is 2.1 – 2.3 and this – 0.2 into root 41 divided by 0.85 that comes to – 1.5066 – 1.51 so the probability of x bar less than 2 is equivalent to a probability of z less than – 1.51 and the probability has now considerably reduced to 0.066. So the results show that the probability of the sample having power output less than or equal to 2.1 gigawatts may occur only 6.6 percentage of the time or the probability value is 0.066. So we stop here and let the 2 parties take it from here okay. So showing this on the normal problem distribution here we have a standard deviation of 0.13275 how did that come about that was s used as 0.85 divided by root 41 so 0.85 by root 41 is 0.13275 the mean value hypothesized or taken as 2.3 gigawatts so that is what we have here. So the probability of occurrence of 2.1 gigawatts or lower is given by the area of the shaded portion and that is 0.066 okay so the probability is 0.066. So the 2 probability distributions are plotted as shown in this figure generated from mini tab so you are having 2 probability distributions the first one is central at 2.3 gigawatts and has a standard deviation of 0.2858 how did this 0.2858 come about it was 0.7 gigawatts divided by root 6 the design specification of sigma was 0.7 gigawatts and the sample size was 6 in the first case. So we are having 0.7 divided by root 6 which is 0.2858 the second distribution shown is having a lesser spread and it is also centered at 2.3 gigawatts it is based on a sample standard deviation of 0.85 gigawatts which is higher than 0.7 gigawatts and still the spread this smaller because of the larger sample size. So instead of using sigma by root n we are using s by root n2 where s is 0.85 gigawatts and n2 is 41 so 0.85 divided by root 41 is 0.1327 which is more than half of the earlier spread value of 0.2858. So you can see as lesser spread here and also the probability value declined. The next problem is you are given a random sample from a parent population described by the complicated probability distribution function where f of x is beta into gamma x e power x cube minus 1 by sin x. So beta is an adjustable constant such that the probability distribution is a valid one you know what it means the area under the curve for any probability distribution function should be 1 continuous probability density functions described by a smooth curve and the area under such curves should be equal to 1. So we adjust the parameter beta such that this is a valid probability distribution function. Let the mean and standard deviation of the distribution be phi and psi that means the variance of this distribution is psi square. If the sample size was chosen as 64 find the mean and standard deviation of the sampling distribution of the means what is the form of the sample mean distribution and what is the probability that the sample mean will be within 0.15 standard deviations of the population mean. So since the sample size is quite large at 64 which is greater than 30 the sampling distribution of the means will be normally distributed according to the central limit theorem regardless of the shape of the parent population distribution. So now the problem is quite straight forward the mean of this distribution of the sample means will be phi and the standard deviation will be psi by root 64 which is 0.125 psi. So 1 by root 64 is 1 by 8 which is 0.125. So the standard deviation of the sampling distribution of the means would be 0.125 psi. This distribution may be represented by a normal distribution of mean phi and variance which will be square of this 0.01563 psi squared okay 0.125 squared let us confirm 0.125 squared is 0.015625. So that is fine. What is the probability that the sample mean will be within 0.15 standard deviations from the population mean. So the problem can be expressed in the following way probability of the value of the random sample being 0.15 sigma distant from the population mean. So probability of mu which is the population mean and also the random sample probability distribution mean mu – 0.15 sigma less than or equal to x bar less than or equal to mu plus 0.15 sigma. So the random sample which we take may have a value either lower than mu or higher than mu and it may lie either on the right hand side of mu or on the left hand side of mu. So now it is easy to normalize and how do we normalize we just subtract mu from x bar and divide by sigma by root n we do it in all the other 2 sides of the inequality and then we get – 0.15 sigma by sigma by root n plus 0.15 sigma by sigma by root n and this works out to probability of – 1.2 less than or equal to z which is a standard normal random variable less than or equal to 1.2 and this comes to 0.77 okay that can be read off from the standard normal probability charts. I hope now you are comfortable using these charts and you should be able to figure out how we get this 0.77 I will just illustrate this on the board. So you have the standard normal distribution which is having a mean value of 0 and variance sigma squared is equal to 1 and we have to find the area under the curve 1.2 – 1.2 so what we can do is probability of z less than 1.2 – probability of z less than – 1.2 so first what we do is we find the area under the entire curve and then from this total area we subtract out this area and we get the required probability. If I remember right this comes to around 0.88 and then this would be 0.12 if the entire area is around 0.88 then this area would be 0.12 and by symmetry this area would also be equal to this area would be 0.12 so 0.88 – 0.12 is 0.76 I am just doing it from memory and you can also see the answer is coming to 0.77. Let us move on to the next problem here we have the Pareto distribution we have quite an interesting function this was the problem I had taken from the Ramson and Sokars book and f of x is equal to a by x power a plus 1 x greater than or equal to 1 is equal to 0 for x less than 1. So the parameter a is referred to as the shape factor what is the maximum likelihood estimator of the parameter a based on the random sample x1, x2 so on to xn. Some of you may ask we do not know the value of a and we do not know whether this is a valid probability density function. So finding the area under the curve from 1 to infinity a by x power a plus 1 dx should tell us the value of a so what is the additional need for finding the value of a I leave it to you okay the hint is you cannot find out a using this method for the simple reason that no matter what value of a you plug in there the integral 1 to infinity will be equal to 1 and you do the integration you can find out this will be x power minus a minus 1. So it will be minus 1 by x power a and the a would cancel out and so when you go from 1 to infinity it would be 1 minus 0 1 power a is always going to be 1. I request you to do the integration yourself and confirm that no matter what the value of a is the a will cancel out and so this area under the curve will always be equal to 1. So let us move on to the actual problem we have to define the maximum likelihood function we are using the method of maximum likelihood parameter estimation method to find out what a is the Pareto probability density function is expressed only in terms of a single parameter theta it is represented as f of x comma theta. Let us take a random sample and once the values are known we will denote them by x 1 x 2 so on to x n. So the likelihood function of the sample for the single parameter case is L of theta is equal to f of x 1 theta into f of x 2 theta so on to f of x n theta. So we have to estimate this parameter by maximizing this relationship. So first let us get the relationship L of theta is equal to f of x 1 theta into f of x 2 theta so on to f of x n theta and that would be a by x 1 to the power of a plus 1 into a by x 2 to the power of a plus 1 so on to a by x n to the power of a plus 1. So L of theta is equal to a power n because I am doing it n times and this is a product of all the x values to the power of a plus 1 and when we take natural logarithm on both sides we get ln of L is equal to ln of f of x 1 theta into f of x 2 theta so on to f of x n theta. So ln L is equal to ln of a power n by the product of the entities xi to the power of a plus 1 i running from 1 to n. So when we take ln L we have this we can straight into 2 parts ln of a power n becomes n ln a and then this becomes ln of product of xi to the power of a plus 1. So again this is quite simple you will get ln of L is equal to n ln a we saw this earlier. How did this get simplified you know that the log of product of entities is the sum of ln of those entities. So the a plus 1 is common here and you can put a plus 1 here and then you get sigma i equals 1 to n ln of xi. The next step is to differentiate this function with respect to a and then equate it to 0 and when you differentiate with respect to a this becomes n by a and here we had a plus 1 there was no a inside. So that became quite simple minus 1 into sigma ln xi. So the estimated parameter a is given by n divided by sigma i equals 1 to n ln of xi. So quite simple. Let us move on to the next problem use the method of moments to find the parameter estimators of the following probability distribution function f of x is equal to 1 by b minus a is equal to 0 otherwise. So we have to estimate both a and b. We are going to use the method of moments. So f of x is equal to 1 by b minus a and the first moment e of x is obtained from the distribution the following manner expected value of x is equal to a to b x dx by b minus a which is x squared by 2. So b squared minus a squared by 2 b plus a into b minus a by b minus a. So b minus a will cancel out. So we have b plus a by 2 and expected value of x squared the second moment is given by x squared dx by b minus a x cube by 3. x cube by 3 will become b cube minus a cube. And so you are having b cube minus a cube divided by b minus a which is b minus a by b minus a into b squared plus b a plus a squared. And that is what we have here. So these distribution moments may be equated with the first and second sample moments. And when we do that we get m1 as 1 by n sigma a equals 1 to n x1 plus x2 plus so on to xn. We will just correct that typo control okay. Here we go. So m1 is equal to 1 by n sigma i equals 1 to n x1 plus x2 plus so on to xn is equal to a plus b by 2. m2 is equal to 1 by n sigma i equals 1 to n x1 squared plus x2 squared plus so on to xn squared that is a squared plus a b plus b squared by 3 which is same as what we had. So we have 2 equations in 2 unknowns. The unknowns are a and b. The moments are m1 and m2. Those are not unknowns okay. So we can write m1 is equal to a plus b by 2 and m2 is equal to a plus b whole squared minus a b where this can be written as a squared plus 2ab plus b squared minus ab that would be a squared plus ab plus b squared. And when you have these and you solve for a and b you get these 2 relations. I leave the quadratic equation solving to you. Hope you get the same answers as I did. So thanks for your attention and we were doing some illustrative problems. There are lots of books on statistics and probability which have many interesting problems. I request you to not only solve these problems independently but also look up the problems in various books and try to solve them without any assistance either from these lectures or from the worked out examples in those books. Try to solve them on your own and if you are getting the correct answer well and good nothing more has to be said. But if you are finding some difficulties and you are not able to get the correct answer go through the lecture material again. See where exactly you have not understood correct your concepts and then hopefully you will be able to work out these kind of problems in a correct manner. The important thing is not the actual numerical solving but the interpretation, the assumptions made and the concepts being applied in these kinds of problems. So thanks for your attention. We will see you in the next class.