 Okay, welcome back. I will try to make the second video lecture on our course CL202. Again this is module 6. So we will try to finish up module 6 today and hopefully go to the tutorial sheet number 8 and try to solve some problems so that we can finish module 6. You can try out these problems and then we can start with the next module which will be on hypothesis testing. So let's go to our desktop. So minimize this and just as a quick recap we have been seeing how to do an interval estimation for the two main parameters of interest which were mean and variance of a population using finite number of samples that we have. So in the last class, let me see if I can pick up the pen or a pencil. Yeah. So in the last class we finished these five interval estimates. So they were five different cases. So these three were to do with one population estimates and these were for two populations that is there were two different populations and we were trying to estimate the difference of the means of the two populations. Of course we have always assumed normality in all our work so far which unless otherwise specified is a default in this course. Now we'll move to a different distribution which is a binomial distribution. So if you'll recall the binomial distribution was concerned with finding out the probability of success as we mostly called it when there were n Bernoulli trials. So you'll remember that in case of n Bernoulli variables and I'll call them x1 and so on and so x1 takes the value of 1 if it's a success and it takes a value of 0 if it's a failure and we said that the probability of success that is the probability that x1 equals 1 that is a success was achieved is denoted by p or the probability of success. So this of course oops I forgot the one is equal to the small p as we've been calling it okay. So this is x1 now if there are n Bernoulli trials so you have x1 x2 till xn then the number of successes obtained in those n trials will be x1 plus x2 till xn. So every time you get a success you're going to put a 1 there so by writing this on this addition you're actually counting the number of successes okay. Now an interesting situation is where you would like to have the probability or you would like to know what is the proportion of the success. So if I add all this up okay and call this as a sum total so I'll call this as xt so these are the number of successes that were obtained okay. Now in n trials the ratio of the number of successes to the total number of trials is called the proportion okay and this is a variable which is of interest because you know in some sense it is actually telling you what is the value of this p over here it is the probability of success. So you have n samples x1 to xn and you want to be able to estimate p. So an obvious choice is you know find out this ratio and we will call this as the sample proportion okay. So again the hat indicates that it comes from the sample and it is it's an estimate or it is a statistic for estimating the true p which is underlying in that population okay. So we really want to know what is that value of p and one way of finding out that value of p is by conducting these Bernoulli trials and then obtaining this statistic which I've called here as p hat okay. So this is what we are going to talk about. Now you'll remember right in the beginning of this module we talked about an example where we said that 75% of all chemical engineering students like prefer a closed book exam to an open book exam and we had said that the margin of error over there was 8.5%. So in this case we would write it as 75%. So 75 okay I know it's a little clumsy because I'm using my keypad and so I have 75 plus minus 8.5% okay. So this is how you see results reported particularly during election time okay when they do exit polls they report two numbers they report one number over here which is 75 and then they report a margin of error 8.5 and very often in those polls they will also tell you what was the sample size though many pollsters try to suppress that number okay. Now in this case the 75 that we have is nothing but this number okay it tells you the total number of successes divided by the total number of trials. So for example and these are Bernoulli trials so you know that there's only two outcomes so the question could be are you going to vote for an open book exam or a closed book exam. So if you have voted for an closed book exam we will consider it as a success and we'll count that as one. So when you have all these counts over all the number of students that you have you have sampled or you've pulled over then 75 gives you so this would be 0.75 okay or written in terms of percentage would be 75. So you will realize that this p hat can only be a number between 0 and 1 and now we have to be able to understand how does this come about. So from what we have done so far you already know this is related to the confidence interval okay and that is what we will discuss in today's class. So let me go to that slide there you go okay so the first thing we should realize is that so far all the five cases that we discussed before had to do with a binomial with a normal distribution and now we're moving away from a normal distribution but you will see that we'll come back to a normal distribution. So we will consider as we discuss that the probability of success is p in a binomial distribution okay so if there are n trials and you have x positive outcomes then the probability of x is so you should now realize that this x is really the sum over all those n trials then the this distribution has a mean of np and you can and you know we have done this many times in the class where we have written this as x1 to xn and then we have run the expectation operator over each of those independent random variables so each of these trials are independent so when we run these the expectation operator you know that the probability that the expectation of each of these variables is p and so the total expectation of x which is x1 to x addition of x1 to xn becomes nothing but np. Similarly you will recall that the variance of each of these xis was p into 1 minus p and since these variables are independent the variance is add up and so it becomes n times p into 1 minus p so you are well aware of this particular result. Okay now we do the switch and we try to connect this to a normal distribution by claiming that and appealing to the central limit theorem that as n increases the binomial variable x has a normal distribution with its mean being np which you can see comes from here and its variance being np into 1 minus p so we will now stop using a binomial distribution and start approximating it using a normal distribution so again the standard trick you subtract out the mean sorry about that you subtract out the mean and you divide it by the standard deviation and then you know that that is the unit normal distribution so this variable will therefore vary as the unit normal so you subtract so it becomes nx minus np divided by the standard deviation which is np into 1 minus p. Okay now the moment you can do that you already know that if you had a normal distribution with mean being mu and variance being sigma square then you could have written this probability statement so we've seen this in the last class I hope none of you have any doubts about that I will hopefully do a doubt solving session and I will see how to do that so please make sure that you have doubts you would like to discuss you know please do attend that session so we know that you could always write a two-sided confidence interval using width 100 into 1 minus alpha and so this is the two sides you remember that this will be z of alpha by 2 by the definition of a quantile point and because this was symmetric this standard or this became equal to minus of z of alpha by 2. Okay so now that I can think of this in terms of a normal distribution I will I know that mu is equal to np and the standard deviation is equal to the square root of np into 1 minus p and that gives me this statement so I do algebraic manipulations and with that you know I should be able to write it in this particular fashion. However there is a problem and the problem is that this p is n unknown and it has now become a non-linear equation since that is the case we can replace p with p hat okay in the variance or the standard deviation term only so you can see that you know I can now I will write this as x minus np over divided by so this was the case where you will recall is the variance was known okay when we had used this kind of a distribution that the variance was or the standard deviation was known so in this case the standard deviation is not known and so we will have to play a trick and the trick we play is by using p hat as obtained the sample value over here instead of p okay so we have we have replaced p and so we have replaced p with p hat only in the standard deviation term having done that I can now write this statement in this particular fashion okay and I will now be able to rearrange so that only p is in the middle and p hat has gone towards the either ends okay which is in this end and in this end so this as I have claimed all along is no longer a probability because p was a is a parameter which is not a deterministic which is not a stochastic parameter and so I get my proportion interval as such so when we said that there is 75 percent with 8.5 percent margin of error then you can now connect this that this term over here is the margin of error where the wherever where else this the sample value that you got is that 75 percent so let's look at an example quickly you have a sample of 100 transistors randomly chosen from a large batch to determine if they meet the standards so now there are only two options so each transistor you choose you say it meets the standard or it doesn't meet the standard so it is an example of a Bernoulli trial and now you have 100 such variables and each transistor is chosen randomly so you will make the assumption that each of these Bernoulli variables are have you know they are independent to each other and so if 80 of them meet the standards then you want to be able to make an 95 percent confidence interval okay so you know that p hat is 80 out of 100 so p hat will be uh that's not very good let me try to fix it p hat will be equal to 80 out of 100 which is 0.8 and so you can see that there's a 0.8 here for p hat now you should remember that z of 0.025 this is 95 percent so alpha by 2 will be 0.025 that value is 1.96 and then you put in these values over here so you get this number of 0.72 2.88 that is the confidence interval or you would say that the the proportion of of those transistors that meet the standards is 80 percent plus minus 8 percent okay so it goes from 72 to 88 now just like in case of a in that case one where we try to ask ourselves the question that can you tell me what should be the size what should be the sample size n okay what should be n so that the size of the confidence interval so this will be p hat okay that will be 80 percent in the last example and from this end was 72 in the previous example and so it was plus minus 8 percent and this was 88 percent okay so this 78 minus 72 is the range of that interval so in this case it is 16 percent points is the range of that interval you might have reasons to be able to report and a proportion estimate where the range of that interval is much smaller okay so if so that is the width of the interval what I have been calling is the range and let us say if that width is a desired number b then a corresponding question is how many samples should I have so that my b is of a desired number so let's say I did not want it 16 percent I wanted it to be only 10 percent okay then how many transistors would I need to check now as you can imagine you wanted more accurate information means the number of samples have to go up and that is precisely what what happens in this particular case so if you rearrange this and write it in terms of n then you can show that n will equal this okay now you can you can further show that that number can that this is an upper bound okay for on n so one way that you should be able to you should be able to rationalize this is by noting that the maximum value of p hat into one minus p hat is one fourth okay so think about it the maximum value of p hat p is the ratio is the probability of success this can take a value only of up to one fourth okay and so if you want a bound on n then you can show that the number of samples should be greater than this particular number okay so that will give you a bound for the number of samples so that the desired length of that uncertainty is b okay I think that was the last slide and so I finished it you can so again you can go through this in greater detail and make sure that you are able to you know in most cases just using a pen and paper derive many of these quantities for example how do you get an upper bound over here how do you get this proportion for the most case if you just draw something like this and you start deriving you can come up with these numbers without having to remember all of them so that was the was case six so with this we have finished chapter seven of Ross and what I will attempt to do at this point is is probably solve some of these problems from the tutorial problems from the book so I hope you all have this book probability and statistics for engineers and scientists we have only put up the problem numbers on Moodle which you should attempt these problem numbers are given on Moodle so let me go to I don't see it open but there you go it is tutorial eight and these were those problem statements okay so I'll have to figure out a way of how to do this through a video lecture what I will probably do is look at the problem and read it out from the book then we can go to our our the lecture notes which you have and discuss that with respect to the lecture notes if possible I'll try to solve it I have solved problem number seven and problem number 36 using R and those codes have been uploaded the problem number seven code is also over here okay problem number 36 I have not included here but that our file is uploaded on Moodle so let's look at some of these problems so I'm going to start with problem number one okay and I have this really rudimentary board behind me and I'll try to use that to the extent I can so let me start with problem number one I hope the board is visible to you because I'm going to make use of it extensively now for the rest of this class let me see if I can that seems reasonable and I hope voice is not going to be a problem this time I'm using a headset so I'm going to read out problem number one so let me see if I can write down something my hey okay so it says let x1 to xn be a sample from a distribution so I think I'll just stand up okay so it says x1 to xn are samples from a distribution whose so these are iid and f of xi is given as e to the power of minus x minus theta I hope that is visible there's a bracket here when x is greater than or equal to theta and is zero otherwise okay now you have to determine the maximum likelihood estimator of theta so I've discussed this in the class the maximum likelihood estimation estimator of theta is an extremely you know there are those three steps that I've always discussed with you the first step was to write the joint density so first step is the joint density in this case because these are independent and identically distributed you can write f of x1 to xn as the product of all of these and so this will be so I should put an i here to be accurate so it will be and when I multiply them out the exponents will add up so it is going to be e to the power of minus summation of xi minus theta over all the i's i going from 1 to n and here of course I will have that x1 is greater than equal to theta till xn is greater than equal to theta or it will be a zero else note that to be this all these inequalities can be represented very simply by saying that theta is less than the minimum of x1 to xn so I can say theta is less than or equal to the minimum of x1 to xn okay that's another way of writing it all right so you have your joint density what are the second step the second step was to find the likelihood function so if you write down the likelihood and I had said that the likelihood function is nothing but with the with the so I had written it like this that's not very good I hope it's okay so I had written it like this in this case it would be l of theta given x1 and x1 is equal to its realization so I had to use the notation that you have x1 is equal to x1 star x2 is equal to x2 star and so on so this will become simply the same the same function with xi replaced with xi star the numerical value okay because these have now been so given x1 is equal to x1 star and so on so I'll just skip writing that now the third step was to find the maximum of this function that is the maximum likelihood estimate so in this case I can see that there's an exponent so instead of taking the maximum of the likelihood function I'll take the maximum of the log likelihood function and so what I have I will do is find the log of the likelihood function okay and that will essentially become so I'll take the it'll be minus summation of xi minus theta i going from 1 to n when theta is less than the minimum of x1 to xn okay and is 0 else okay so I come to my third step which is finding a max and how do I find a max I can find a max by taking the derivative and setting it equal to 0 so I should take the gradient of lnl of theta with respect to theta set it equal to 0 okay and that will give you the maximum likelihood estimate now in this case you will note that it is already linear in theta so when you take the derivative with respect to theta there is no more theta in the remaining equation okay so a simpler way probably to do this would be to calculate just look at it graphically okay and I'll see if I can do this graphically see if I have a piece of cloth which might be more effective okay so the first thing I if I want to draw this graphically I will look at this and I will rewrite it as l of theta is equal to so the summation of xi minus n times theta now because these are numerical values this is a fixed number so this is a linear equation it is like y is equal to mx plus c it's a linear equation only thing that this equation works only for this domain and what happens after we pass this domain it becomes zero okay so if I were to draw this it would look something like this after I have all my realizations from x1 to xn I will plot find that minimum value of x1 star to xn star and draw this equation of this line y is equal to mx plus c when theta is less than this so whatever be so this is that threshold so I will draw that straight line okay and so I have to correct myself because there was a negative sign here this becomes minus and this becomes plus so that the slope is positive if I had not made that correction the slope should have been negative so I've corrected myself so it's a positive slope and this becomes zero after that so this is your likelihood function l of theta versus theta okay yeah it's a little the tube light is making it a little there's a little shine over here I'll just write this over here maybe it'll be more clearer I've written l of theta is equal to n theta minus summation of xi star and that is what I plotted over here so the question is what is the maximum and the answer is of course the maximum is nothing but the minimum value of x1 star to xn star so I'll say theta hat maximum likelihood estimate based on n samples is equal to the minimum of x1 x1 to xn okay now if I write it like an as an estimator and not an estimate then I will put the random variable over here okay so this is an estimator if I were to put in actual values in it and give you a numerical value then it will become an estimate okay so I'll write it as an estimator for now so we've always been calling random variables as capital so by that virtue I should put this also as a capital theta so that is how you would calculate your maximum likelihood estimate now let me look at the next problem so the next problem is problem number three you have okay the problem number three is extremely straightforward and I think we have done this in the class so I'm not going to venture to do it it essentially tells you that given a normal distribution okay I think there are many scratches and so it doesn't clean properly so problem three is let x1 to xn be a sample from a normal population with the mean being mu and variance being sigma square determine the mle of sigma square when mu is known okay so this is an interesting problem problem number three it says this time x1 to xn belong to a normal population with mean mu and variance being sigma square now in the class or in your notes we have found out a maximum likelihood estimator for mu and sigma okay and we had noted that mu maximum likelihood is equal to 1 by n times summation of xi random variable capital xi okay over all i's and we had found that sigma square hat maximum likelihood estimate was 1 over n times the summation of xi minus mu maximum likelihood the whole square and we had made a point we had said that see the maximum likelihood estimator so recall that the sample variance is 1 over n minus 1 times summation of xi minus x bar incidentally x bar and these are same okay the sample mean the maximum likelihood estimate of the population mean and this is the sample variance and the maximum likelihood estimate of the population variance so while the these two were same these two are different and we are shown that the expected value of s square was sigma square so this is something we had shown in the class so that is very nice because it tells you that while s square has and you remember has a chi square distribution okay so you remember that n minus 1 into s square by sigma square had a chi square distribution with n minus 1 degrees of freedom at least the mean value of that distribution so if I were to plot it it would be something related to chi square with this q so at least it tells you that the expected value of s square is equal to sigma square okay and that was very good to know however we had seen that in case of the maximum likelihood estimate that is not the case that the expected value of sigma hat square ml was not equal to sigma square okay and we had made an argument saying that we gave away one degree of freedom over here and so this became a biased estimator so expected value of sigma hat square is not equal to sigma square okay in fact I think it we had shown it is n so you should be able to show this if you divide it by n and you multiply it by n then it will become n over n minus 1 into sigma square which is not equal to sigma square except in the in the limiting case when n tends to infinity and we said so it has nice large sample properties but not so nice small sample properties okay so this problem is trying to explore a situation where you have a mu is already known so you don't have to estimate this the true population mean is already known to you and if the true population mean is known to you then what effect does it have on this okay so when you write down your likelihood function so again you will do those three steps step one will be the joint density okay step two will be to find the likelihood function and step three will be then n of course the long the log of likelihood and step three will be to find only del by del sigma square and equate it to zero and why is that it is because it is only this particular am I visible okay come that's not very nice okay that is because mu is known so you don't have to do del by del mu okay when you do this you should you know you should be able to calculate this and then they want you to find out the case when of the expected value of sigma hat square when mu is already known okay so you assume that the mu is a known value so you don't have to be you don't have to estimate this in this case you should be able to show that the expected value of sigma hat square when mu is known turns out to be sigma square okay so I will I will urge you to do this particular problem it's an interesting problem and try to compare it with our original derivation of the expected value of sigma square using maximum likelihood when we when we did not assume that mu is known so you'll come to you'll see the difference so the difference is really that when mu is known the maximum likelihood estimate derivation will give you an estimator whose mean is unbiased or it's an unbiased estimator whereas when mu was not known it became biased so this is what it had turned out to be before okay when mu was not known here when mu is known so I'll urge you to to to try this out okay so that was problem number three let's look at problem number five okay this is another problem on maximum likelihood estimation so maybe I will just discuss this with you all right so problem number five is an interesting problem it says that suppose that x1 to xn are normal with mean mu1 y1 to yn so there is one population which is x1 to xn which is normal with mean mu then you have y1 to yn are normal with mean oh with mean mu1 okay y1 to yn are also normal so you can say it has it is also a normal distribution but it has a mean mu2 so they are not identically distributed okay one is a different normal the other is a different normal so if I were to look like this this could be mu1 so x are distributed as per this and I don't so the variance we'll see later the y's are distributed as per this and you have a third population w1 to wn they are also normally distributed and in this case the mean is mu1 plus mu2 so I know that the distribution of w will be somewhere down I'll add these two I get mu1 plus mu2 and it is centered on that okay so there are n variables here n here and n over here it says that they all have a common variance so let me write that variance as sigma square so then this is the distribution okay they all have a common variance and it tells me that you can assume these all to be independent all the three different random variables to be independent the question is find the maximum likelihood estimator of mu1 and mu2 so I have to find mu hat 1 maximum likelihood and mu hat maximum likelihood for 2 okay and you have to be able to use all these samples in making that determination in making that estimate okay so again I think I will just discuss this with you it's a pretty straightforward problem so we have those three steps so we never go away from those three steps our first step is find the joint density now here you know that these densities are that these are independent okay so how will you find the joint density so the joint density is going to be f of x1 to xn y1 to yn and then w1 to wn and so you should be able to just indicate so these are going to be the dummy variables will be x1 y1 w1 given mu1 mu2 and sigma square okay so I want to write it bring the notion of the parameter inside while writing the density so this will be you can write it anything to the visible you can write it as 1 by 2 pi into sigma square so let me put a sigma square over here okay and now it should have been 1 by into the number of variables so this is a multivariate density and the size of the random variable is 3n so it will be 3n by 2 okay into the exponential they all have the same variance so I can write this as minus 1 by 2 sigma square into but the means are different so since the means are different I should be able to write this as summation of xi minus mu1 so that's not entirely true it's mu1 let me write that here over the i's plus summation of yi minus mu2 the whole square over the i's going from 1 to n and you have the third term which is summation of wi minus the mean in this case is mu1 plus mu2 so I'll have to subtract out the mean the mean turns out to be minus mu1 minus mu2 the whole square and I'll close the curly brace okay that is the joint density this was step one what was step two in step two we substituted the numerical values and instead of calling it a density we called it a likelihood so all these numbers were substituted but these are now going to be treated as unknown so I'll have a theta and my theta is going to be so let me just erase this my step two consists of calling this as l of mu1 comma mu2 comma sigma square given all the measurements okay all right so I think this one you missed out the curly brace over here which I drawn okay what is the third step or well we realize that there is an exponent so you can take the ln of this so if you take the ln of this let me erase this out okay so you take the ln of this and then you take the derivative with respect to mu1 mu2 and sigma square so remember that your theta over here if I might write it or not theta hat but just the theta over here is equal to mu1 mu2 and sigma square you've been asked to find out only the values of mu1 and mu2 or the maximum likelihood estimate of mu1 and mu2 so then you can take del ln of l by del mu1 and set it equal to 0 del ln of l by del mu2 set it equal to 0 and you solve for mu1 and mu2 okay it turns out that you don't need to do this del ln by del sigma square is equal to 0 is not needed okay if you go back to the derivations where you had a very simplistic case you will note that when calculating the maximum likelihood estimate of the mean you don't need the value of the variance but when you calculate the maximum likelihood of the variance we needed the the mean okay so here you don't need to calculate this so I encourage you to try this out it's fairly straightforward let me move on to the next problem this is problem number nine so these three problems had to do with maximum likelihood we now start going towards problems where confidence intervals are needed so I have tried to you know I'm I really encourage you to use R to the extent possible so that you can start doing serious problems otherwise you know you'll end up doing problems for the exam but given any serious thing in real life you should be able to solve it and that has so if you learn some software and by the way I still plan to take my fifth and sixth quizzes as programming quizzes okay so I encourage you it's a freeware download it if you haven't I put three tutorials on R in on Moodle I have also you know put a lot of other information on Moodle I've also uploaded this problem so this is problem 7.9 which I've taken from which is from of course our textbook by Ross so it says that the PCB concentration of a fish caught in lake in a lake was measured by a technique that is known to result in an error measurement so the PCB concentration is that of a toxin and you know as our river bodies or water bodies get polluted fish are known to uptake a lot of this pollution including heavy metals and when we consume the fish we ourselves end up getting those high or lethal you know not lethal but those doses and we have to deal with the consequences so in this case it is a PCB concentration of a fish that they're trying to measure and they use a technique that is known to result in an error measurement okay and this error measurement is known to be normally distributed with a standard deviation of 0.08 ppm or parts per million now suppose the results of 10 independent measurements of this fish are given by these values so these were those 10 independent measurements okay the first question is to give a 95 confidence interval for the PCB level of this fish now whenever you get a problem of this type you have to so we have looked at those six cases and you have to try to figure out in which case does it fit so if it does not fit you make an assumption and try to make it fit okay for all the parametrics interval estimates that we have been talking about we'll for the most part assume everything to be normally distributed if it doesn't try to appeal to the central limit theorem and make sure you have enough number of measurements so in this case as you see it is assumed to be normally distributed so that part is taken care of we've also been given a standard deviation of 0.08 ppm so this standard deviation is the standard deviation of the population how do I know it well they haven't used the word with a sample standard deviation okay to begin with if they have given me they have taken 10 independent measurements and you know I can calculate the sample standard deviation and see whether it matches at 0.08 and it does not so in this case I know that this is case one of those six cases so case one if you recall was where the mean was you had to find an interval estimate for the mean when the variance was known the population variance was known okay maybe I can just try to discuss on the board as so please recall case one was when you're trying to find out an inter confidence interval for mu when sigma square is known okay and let me write this as a hundred into one minus alpha percent confidence interval in which case you could say that mu belongs to and we've done this derivation please go back and look it up so if sigma if x bar so you have those 10 different measurements of the fish you can calculate x bar from there okay then the confidence interval in this case one is x bar minus z of alpha by 2 into sigma by root n remember that sigma over here is 0.08 parts per million okay and the upper limit of this confidence interval is x bar plus z of alpha by 2 into sigma by root n okay so you know here is the thing that even if you use a software you have to you know to be able to use it you have to be able to know the underlying theory to be able to use it effectively else it's just like a black box okay so I will go over here as the first step so you know I've put enough comments for you to see so the first step I have entered those 10 different values and called put them in a vector or an array x so these are those 10 independent measurements of the fish okay PCB measurement there's a very nice command in are called summary it applies to almost everything any variable you can put a summary and it gives you some information about that data set okay I have used c so c stands for create so you say create and then you put the vector array inside you know you can also read off data sets from the internet you can read from a file and my tutorial talks about it is not my tutorial I picked it up from the web but that tutorial talks about about how to do this okay so it is good to be able to develop some facility with it so I'm calling this variable summary as some and we will look at it so maybe I can just run it as we go along so if I put run it'll just look at the run the highlighted statements so you can see that these were those numbers that is x as given to us here and the summary produces this output so it says that the minimum value is 10.1 the first quartile is 10.85 so remember what is the first quartile the first quartile was that the probability on the left of that quartile number is 25 percent the median is 11.4 so the probability and of course these these are all using samples and not a probability distribution so when I say first quartile it should be a first quartile of the sample when I say the median I should talk about the sample median it is 11.4 the third quartile means 75 percent on the left of that quartile number is 11.48 again to be precise I should say sample third quartile oh I missed it the sample third quartile is 12.35 and the maximum value you have is 12.5 so if you look at the range the minimum value of PCB that you had was 10.1 and the maximum value was 12.5 okay now you've been asked to calculate a 95 percent confidence interval yeah here we go it's a 95 percent confidence interval so you know that alpha is 0.05 okay and as shown on the board I will have to find out the value of z of 0.025 because it is z of alpha by 2 so let's define alpha and this is sigma which is the population standard deviation you can use these print commands and so on then the command for finding of the mean although you know from here that the summary already summarized the mean for you but you have a command called mean standard deviation sd variance var so it's very very intuitive if you've used statistical tools on on any software you know these are very similar the number of data points n is the length of that array which is length of x okay so let's try to run this if I run it I get was there anything on the top yeah so this is a variable I've entered you are free to use equal okay instead of this assigned so you can say sigma is equal to 0.08 so it says the population standard deviation is given as 0.08 and then in order to calculate I need x bar as I showed you discussed on the board and the s bar had turned out to be 11.48 okay x bar and the standard deviation had turned out to be 0.864 if you would like the number of samples so n I've not printed out is 10 so you have 10 data points that we've entered okay so now we have to only find out or do that calculation so the most important thing in that calculation which you have to be able to know how to do or learn how to do is to calculate this z of alpha by 2 okay so please make sure I'll discuss with you right now how you could do it in our we have discussed this a few times but it is important that you also do it from a book okay so for me here the z of alpha by 2 is really z of 0.025 okay before you know let me just finish this r part and then I can go to reading it off the chart in a book so here is the formula I've just written it here x bar plus minus z of alpha by 2 into sigma by root n okay so I calculate this part separately and I just call it the error it's often known as the margin of error so I'll just call it as the error and I have used q norm so again go back in terms of thinking how is this the the q norm so you will recollect that q norm actually corresponds to the let me go back here so when you use q norm okay by default it gives you the lower tail probability so if I tell it q norm of of 0.025 okay it will tell me the probability on the left is 0.025 okay for me z of alpha by 2 means that I want this was z of alpha by 2 that means the probability on the right is 0.025 so there are two ways either I can tell it that on the left if this is the probability on the right then on the left my probability will I should give I should ask it to give me that value where the probability on the left is 1 minus 0.025 okay and then I give the mean and I give the standard deviation and I leave the default which is the lower dot tail is equal to true okay so this is one way of doing it you say lower tail probability which is the default then it will give you the that number so that on the left side the total the left side the probability is 1 minus 0.0 or 1 minus 0.025 okay this is one way of doing it or you give this as 0.025 and you say that lower dot tail is equal to false so in 7.36 when this is encountered again I have used that that kind of a notation okay so you should be clear on that this will not help I should really minimize it and go back to my R window all right so if you want to see this the mean is 0 the standard deviation is 1 okay so what is Z of 0.025 this is something that you should know without having to look up and we have seen that this number is 1.96 so you can see just that value so this is 1.96 if you go back to the board where where I written Z of alpha by 2 it is 1.96 and therefore on the left side it will be minus 1.96 okay so you can use that you can multiply that with sigma by root n and then do the the the the plus and the minus to give you the confidence interval so here we've calculated is the left confidence limit as x bar minus that the right confidence limit as x bar plus that see this is a two-sided confidence interval and so we say and then we've tried to print it so let me just run this so what was x bar it was 11.48 I can run it by clicking this so x bar was 11.48 so the left confidence limit turned out to be 11.43 and the right confidence limit turned out to be 11.53 okay so there that is how you would be able to solve it so let me just go back and because you know for you should be able to read from a let me make sure I am projecting the screen yes I am so let me go back and open up the table so this is the Z distribution table now you have to be extremely careful when using the Z distribution table and they will always specify on the top whether they are using the left tail distribution they're going to specify the left or the right okay so here they're going to specify from minus infinity so again if I can use a pencil okay so x is here and they are giving the left side of that distribution so it corresponds to really the probability discriminative probability distribution function this is the area which is going to be specified in this table okay now an issue of course that you encounter is that in this case we had we wanted to find out Z of 0.025 before that let me just discuss this table so you have a quantile number that is given to you and it has been given up to two decimals so for example if I give you the quantile number 0.0 okay so this is 0.0 this number becomes this on the row and this will become 0.01 so the the number over here if this is 0.1 then this corresponds to the probability on the left side of 0.11 this corresponds to on the left side of 0.12 this over here corresponds to the left side of 1.02 okay so these are those x values and you can read off up to two decimals as as shown in this table now you are interested in finding out what is the value for this x being equal to so for us it was 0.025 so Z of 0.25 means 0.025 is that right 0.05 yeah it is half of 0.05 so Z of 0.025 uh and this is the probability okay so if I'm looking at the left probability so the probability over here should be 0.975 because 1 minus 0.025 will be 0.975 so I now have these probabilities inside the table so I'm going to find out where is 0.975 so in attempting to do that I find out that there is a 0.975 over here so I go up and I say okay it is 1.9 something the second decimal is on the top and that something is 1.96 okay this is that 1.96 so it tells me that this number Z of 0.025 is 1.96 so on the right of 1.96 is 0.025 and on the left of 1.96 is 0.975 and you'll see that you were very close when you we had checked that 1.96 and we had got here as 1.959964 which is essentially 1.96 okay so this is how you should be able to read the table so the second question in this problem was to find the 95 percent lower confidence interval and 95 percent upper confidence interval so again I'll go to the slide where these are given where these numbers have been provided to you and you should be able to read them off with the and should be able to you know not only read them but you should be able to also derive them is what I expect so when mu when sigma square is known you have you have this so this was question a this is question b one sided upper interval and one sided lower interval is question c so again in this case while we needed Z of 0.025 here you will need to find out what is Z of 0.05 okay and shall we go to the table and try to find it once again so I need to find out that value of X in the table where the probability on the right side is 0.05 which means that the probability on the left side is 0.95 so let me come you only 0.8 okay we are here 0.95 is somewhere here see between these two and this is where I when I spoke to you in the last class and told you sometimes you would need to interpolate it is over here that 0.95 lies bang in between of 1.64 and 1.65 and a linear interpolation would be 1.645 so Z of 0.05 is 1.645 so the idea of this interpolation is extremely straightforward given that you have only discrete data of a continuous distribution you've been given this number and you've been given this number the desired number was here okay so how do you do you just think of this as a straight line and you do a linear interpolation which is reasonable for small increments as in this table so in this case because it was symmetric between 0.95 lies right in the middle of 0.9495 and 0.9505 we know it will be midway between this so it will be 1.645 okay let's go back to problems so we have looked at 1, 3, 5, 9 I just discussed with you let's look at 11 so I have let x1 to xn plus 1 be a sample from a normal population having an unknown mean and variance 1 let xn bar be the average of the first n of them what is the distribution of xn plus 1 minus x bar n I think I can blow this up so I think this problem now uses your understanding of probability and connects it to statistics it's an interesting problem and maybe I'll just discuss this and you can take a shot at it so it says that you have n plus 1 samples not n samples you have n plus 1 x1 to xn and then you have this xn plus 1 and these all belong to the normal distribution with an unknown mean mu so this is unknown but its variance is 1 okay now you know if you were to calculate x bar n so you should remember what does this in indicate the n indicates that you are calculating the sample mean using n data points and so this we had we're not using all n plus 1 but we're using only n data points and so I have i going from 1 to n okay so if your sample mean is defined as such is the average of the first n of them then what is the distribution of xn plus 1 minus x bar n okay so if you were live in front of me I would have loved to listen from you okay you are supposed to you should be able to answer this question so as a hint remember that this is also a normal distribution with mean mu and the standard or the variance being sigma square by n okay and you know that xn plus 1 is also a normal distribution with mean mu and variance being 1 okay if I give you two variables which is x and y and I tell you both are normal then what can you tell me about x minus y we saw this was the magical property which normal variables had which for example if x and y had a uniform distribution you cannot say x minus y is uniform in fact we have shown that it is not okay it is a triangular distribution but if xn is normal and x bar n is normal then this is also this whole thing combined is also normally distributed okay so the first question is what is the distribution of xn plus 1 minus x bar n so we have solved this problem in the first part of this course where we have shown that any linear transformation of normal random variables is also a normal random variable and you should be able to calculate the mean because you're subtracting it out the two means will will so you can use the expectation so expectation of this minus expectation of this and so it will become mu minus mu which is 0 and what will happen to the variance now since x bar n is independent of xn plus 1 because xn plus 1 did not depend on what is the value of x bar n and x bar n does not depend on xn plus 1 so these two are independent and so what happens to the variances they will add up and so the variance you know will be 1 plus sigma square by n so if you feel that we are using known properties and we haven't really shown that x minus y is normally distributed I will suggest that you go back to the first part of this course and you see that we have solved these kind of problems the simplest way to do it is to look at the characteristic or the moment generating function of xn plus 1 moment generating function of this and you can find out the moment generating function of this quantity you will see that it will have the same moment generating function as that of a Gaussian and then you can calculate the mean and the variance of that okay so I'm the second part is b is if x bar n is 4 give an interval with 90 confidence given interval that with 90 confidence will contain the value of xn plus 1 okay so they are saying if you get a numerical value of x bar n which is 4 then provide a 90 confidence that will contain the value of if you give an interval that with 90 confidence will contain the value of xn plus 1 so you can give an interval of the mean mu okay and they are asking for that that interval will contain the value of xn plus 1 is a question not very clear to me maybe five I will look at it and probably upload it on Moodle solution it's the question itself is not very clear to me let me move on to problem number 13 problem number 13 says a sample of 20 cigarettes is tested to determine the nicotine content and the average value observed was 1.2 milligrams okay compute a 99% two-sided confidence interval for the mean nicotine content of a cigarette if it is known that the standard deviation of a cigarette's nicotine content is sigma which is 0.2 all right so I'll just discuss this but this problem is extremely straightforward and very similar to the problem we discussed in case of the fish so probably you can take a shot at it so they have given you the following information you have you have 20 cigarettes so this is question number 13 you have 20 cigarettes and you want to be able to determine the nicotine content and the average value so I will call x is the nicotine content of each cigarette so the average that they got with the 20 samples was 1.2 milligram and you are supposed to compute a 99% two-sided confidence interval for the mean if it is known that the standard deviation of the nicotine content is sigma is 0.2 milligram so this is a case where the the standard deviation of the population is known okay and so this goes back to being case one and you need a 99% two-sided confidence interval so the two-sided will therefore be x bar plus minus z of 99 so 0.005 I think okay into sigma by root n so you have to be able to find out what is z of 0.005 and you can go to that table and determine its value let me see if I have it with me this is problem number 13 it is 2.575 the value that I have okay please check it you know I'm picking it up from some notes alright so you know how to check it okay so that's straightforward that was problem number 13 so let's go to the next problem which is problem number 14 and problem number 14 is connected to problem number 13 it says that suppose that the population variance is not known in advance of the experiment alright so this builds up goes from case one it says well you did not know sigma you did not know the population variance which is a more common case you know either no very few situations where you can claim that you know the true population variance but not its mean so this is the more common case where they do not know but from those 20 samples they have also calculated the sample variance and they have found that the sample variance is 0.04 milligrams okay and they want you to compute again a 99 percent confidence interval so alpha is equal to 0.01 okay 99 percent confidence interval so you know that if you do not have the true this value over here and instead you use the sample value the sample standard deviation then this is no longer a normal distribution but it has a t distribution with n minus one degrees of freedom so this will be a case where you will be using t of 0.005 comma 19 so remember it was t of alpha by 2 comma n minus 1 so there are 19 degrees of freedom over here and we have very often spoken we have said that one degree of freedom goes in determining the sample variance or x bar is used and so we reduce it by one degree of freedom so what you need to know in this particular case is the value is how do you calculate t again in case of matlab or r it is straightforward I really don't have to look at anything but just call qt so recall that in r the the distribution would be a qt distribution okay so if I had to find that value I would say qt I hope it gives me help it gave me help so let me true choose my lower tail as false if I choose my lower tail as false I can write this number directly otherwise I have to write one minus of that number so it is 0.005 okay comma degrees of freedom is 19 I have that over here ncp is a non-centrality parameter we don't need to state it and we will do lower tail is equal to false okay and this value will so I know that oh you cannot see well let's look at this number and then we can go there so the number I've got is 2.86 okay we had seen that same number for the z distribution I now don't recall what that number is perhaps we had read it off the table so the first point I wanted to make is that t of 0.005 comma 19 was 2.86 okay z of 0.005 means it is the same as t of 0.005 comma infinity okay if you had infinite observations and this is a much narrower distribution so I know that this is going to be less than 2.86 and we calculated it my memory is failing means I don't recall this number but we have just calculated so it is going to be less than 2.86 okay that is the first point I wanted to make the second point is I wanted to look at a table so that you are comfortable reading it off the table so I come here this was for the z distribution this is sky square this is t so as I discussed with you earlier that for every degree of freedom you have a different probability density function so it is not possible to give so each like the z table is a t distribution when you had infinite degrees of freedom n was approaches infinity which means that so here no we give a very coarse grained information so in this particular case the way that they have given the values is this the degrees of freedom is here okay and the value of alpha you is over here so unlike the table where the x axis of the distribution was given over here and here and the probability was inside here the probability is on the top okay it's just this much so very little information because they have given one table for many many degrees of freedom but this is usually enough because most of the times people are trying to find a 95 confidence interval or they're trying to find a 90 confidence interval or a 99 confidence interval and so you'll see that these are most of the times easy to to use so let's come to our problem we want to find out t of 0.05 so our alpha by 2 is this value so this is the value that I will read off from and my degrees of freedom is 19 so it is here and so I have this value 2.861 okay so remember again that this table is giving you and and this is not giving you the left tail probability this is giving you probability of this parameter which we have defined as the right tail probabilities okay so 0.0.05 0.005 means that the probability in the right tail is 0.005 well see when you go back to reading this distribution the z distribution see that they have not used this is not z of alpha okay in fact this was z of 1 minus alpha because they're giving you the distribution the probability from the left end from minus infinity to that value to that quantile okay so there's a difference in reading of the z table and the t table so please take care of that so I have 2.861 and that is how you would read it we had got in R as 2.861 which is which is very close so now you should be able to you should be able to use those values in order to calculate the to calculate this quantity okay all right this was problem number 14 problem number 15 is says that in problem 14 that means this is the problem statement problem 14 you want to compute a value c for which we can exert with 99% confidence that c is larger than the mean nicotine content of a cigarette so I want you to pay us you know special attention to that word exert okay so there are different ways that you can look at this data one is I have these 20 observations and I can build a confidence interval okay the other way is by making a statement okay which is known as an assertion and this will we will go in great detail in the next module where we will do hypothesis testing which is basically where somebody makes a statement and you want to use data to find out if the data is consistent with the statement or not okay so that is something we'll do in in hypothesis testing in our next module okay which I will upload after finishing my lecture today so you should get ready with that as well so in this case they have said that you they would like you to compute a value c for which you can say with 99% confidence that c is larger than the mean nicotine content okay so here you want you want an upper bound on mu mu should be less than c and you want to claim this with 99% confidence interval so note that c is an upper bound on mu all right and remember that when we were discussing the lower confidence interval the one-sided lower confidence interval and the one-sided upper confidence interval I was always talking about how a lower confidence interval gives you an upper bound okay so it was like this we started off by writing our our statement in terms of probability and then we flipped it in order to find out the confidence interval okay so we have a a bound and we want to be able to say that mu with 99% confidence this is not probability now it's only a confidence statement that mu is going to be less than this we can say with 99% confidence okay so remember that this is nothing but an upper bound and the upper bound is so the mu is from minus infinity to c that is your lower confidence interval and this value gives you the upper bound okay so we can go back to seeing what that value was so this is again case two and in case two we we have seen the one-sided confidence intervals which was this okay so this is the lower confidence interval that means mu goes from minus infinity to this so this term over here gives you that upper bound so we can say with 99% confidence that mu will be less than this quantity and note that this quantity is now going to be t of because it is 0.0 0.01 okay comma 19 so again you can go to your table and you will see that alpha just like 0.005 was given to us also the alpha of 0.01 is given to us so I go back and see this is alpha of 0.01 and the value of 19 will correspond to this 2.539 okay so that will be t of 0.01 so I think you should be able to do that let me come back maybe I should speed up a bit I have there are many more unsolved problems maybe you know problem number this was problem number 15 then you have problem 16 which let me skip it's an interesting problem it is indeed related to problem number it is related to problem number 11 okay so it has no this is not related to problem number 11 I'll leave 16 to you okay take a shot at it and we can discuss when we meet for the doubt solving session let me go ahead and move to problem number 32 yeah so problem 32 was related to problem number 11 okay in problem number 11 you had to find out the distribution of xn plus 1 minus x bar n okay here you have to find out the distribution of not only xn plus 1 minus x bar n but you had to find out the distribution of the sample variance also okay so it's a very similar problem that you have n plus 1 samples given to you you have calculated the average using the first n samples and you have to find out the distribution of xn plus 1 minus x bar n divided by sn times the square root of 1 plus 1 by n okay so in this case I will in this particular case you should be able to realize that this is basically a z distribution divided by the square root of a chi square distribution with n degrees of freedom okay and you should be able to show that this is the equal to the t variable okay so with what I have covered in class I will leave this for you to show uh there is a part c and a part d um the part c says give the prediction interval for xn plus 1 so this is very similar to that problem number 11 where you were given uh you wanted to find a confidence interval and which I said I will do with you separately okay or I'll upload the solution key if I can figure it out um says give the prediction interval for xn plus 1 in this case as well okay um let me move forward I have problem number 36 so problem number 36 is a problem which fits in um uh in case of finding out an interval estimate for the population variance oh and I have done this one in r as well uh and that is uploaded on Moodle you can take a look at it so problem number 36 says that the capacities of 10 batteries were recorded so you have n is equal to 10 there were batteries and the capacities are in ampere hours estimate the population variance so they want you to find out sigma square and I'm assuming it will be uh oh this is estimate the population variance so what is the point estimate of a population variance a point estimate of the population variance is simply 1 over n minus 1 is a sample variance is uh xi minus x bar the whole square over okay or you could use the maximum likelihood variance if you state that clearly that you're you're using the maximum likelihood estimator then you could use these these both are point variances this you know is unbiased this one is going to be biased okay part so that was a part b is to compute the 99 percent two sided confidence interval so this is where you have to compute a confidence interval the 99 percent confidence interval and the 99 percent confidence interval is given to you in your or in your module this is so we are talking about interval estimates for the variance and the 99 percent confidence interval was given by this okay the two-sided confidence interval was given by this so in this case you have to be able to find out what is chi square by 2 n minus 1 and chi square 1 minus alpha by 2 n minus 1 degrees of freedom so remember that here they are no longer symmetric so it didn't work like when we were estimating the mean but you need to be able to estimate both these quantile points from the table so let me go to r and open up 7.36 and that is over here so in this case let me see that my desktop is don't know it's not okay there you go so in this case the problem statement is over here um so the capacities in ampere hours of 10 batteries were recorded compute the population variance so they said you could use the create command to put the to generate the x value you summarize it you have the population variance estimate which is simply so this is the sample variance the s square okay so i've called it s square um you could compute the x bar the standard deviation and the length of the data the length of the or the number of samples that you have so let me just run this part so the sample variance which is s square turned out to be 32.23 ampere hour square the sample mean was 144.3 so that was the mean of these data points which were given to you here on the top the standard deviation which is a square root of the sample variance the sample standard deviation is 5.68 the length of the data samples which we entered or the number of data samples we entered is 10 now this is what i just showed you in the module 6 handout these are the two sides of that of the confidence interval the two-sided confidence interval okay so here you have to be able to find out the chi square point now the chi square point in r is extremely straightforward you have alpha so you do q chi square alpha by 2 give the degrees of freedom and say lower tail is false so it is giving you that quantile point okay and for the right confidence limit you will need this particular value 1 minus alpha by 2 comma n minus 1 and lower tail is false so it is giving you the right side probability so if you were to do that you can calculate the right and left side confidence interval no i need to run here and so while the sample variance was 32 point something my left confidence limit was 12 is 12.2 and my right confidence limit of the two-sided interval here alpha is 99 or this is a 99 confidence interval so alpha is 0.01 and you can see the right side is 167.211 so this is definitely not a case where things are symmetric like we're used to so far the sample standard deviation was was the point estimate was 32 okay but that you can figure out by looking at this formula as well that it is not like in the previous case you had mu x bar plus something and minus something that is not the case you have s square divided by that chi square point and the s square is multiplied by n minus 1 so that it's a chi square variable okay i think that is sufficient for us to see let me just look at the table so we have not seen at the chi square table let's go back to the chi square table so in this case i needed let's let me look at the so i needed n was how much n was this is problem number 36 so n was 10 and you needed a 99 percent confidence interval so you need these two points chi square this is 99 so it is 0.0 0.995 comma 9 and chi square 0.005 comma 9 okay so that is so 9 the chi square is only a positive there are no negative numbers here so it is going to be something like this okay so this is on the right is going to be chi square 0.005 comma 9 the degrees of freedom here should be 9 and this will be so this is 0.5 percent and this is 0.5 percent so this is going to be chi square 0 the on the right side is going to be 0.995 probability on the right side the probability on the right side here is 0.005 those values i could have got from r let's just look at one of them so this is going to be chi square of 0.005 i just need to evaluate this so i'll run this and it turned out to be 23 let me write this on the board that this was 23.59 and let me find out the other quantile point and that was 1.735 so this one is 1.735 now we can also should be able to read this off from a table so let's look at a chi square table so these are this is the chi square table now like t where there are two arguments uh one is the point and the other is the degrees of freedom so you have like what i drew on the board was a chi square distribution with 9 degrees of freedom if i draw with 10 it's a different curve okay so let's try to look at the first one chi square 0.005 with 9 degrees of freedom so this is n is here 9 degrees of freedom and if i draw this line i can see that 0.005 is here oh right so that that is the value so 23.589 and the value that i have on the board or from r also was very close to that the other was chi square of 0.995 and 9 degrees of freedom and that value is this 1.735 and that is what i had written on the board as well okay so this is how you can read off the chi square table or you can get it from r as the situation may be okay so i have looked at problem number 36 let me look at some more problems so this was case 3 so we've now solved a problem to do with case 3 let's look at a problem from case problem 38 so i will go to over here let me look at problem 38 oh i think i didn't i forgot to minimize when i was doing the r so you probably didn't get to see the chi square that i did i'm sorry about that okay i can just go to chi to the r to just show you what i had done how i had got those values so you see i had calculated it in this particular way so i use chi square gave the value of that of the probability of the significance level n minus 1 and i did lower tail is false so i'm finding out the probability on the right side and it gave me a value of 23.589 and when i did this as 1 minus alpha by 2 which is essentially 0.995 it gave me 1.75 735 and that is what i had also written on the board which we saw and we saw from the table so i think i missed out then showing you the table as well let me quickly show you that so here was the chi square table so n was the degrees of freedom so i had come down to 9 or n minus one for us it is 9 the degrees of freedom when i had alpha by 2 was 0.005 then i used this this gave me 23.589 as we just saw and when it was 0.995 it gave me 1.735 okay so this is how you read off the table and i had drawn that chi square distribution on the board and i had written these values so you can see this is 1.735 and this is 23.59 all right so the next problem that i was trying to attempt was 38 and so these problems i have to do with two samples so two populations okay so remember case four and case five were two population you know parameters the difference in the means so this appears to be two samples and in this case we have population one the following are independent samples from two normal populations my okay you have sample one or one population both of which have the same standard deviation so let's say population x and population y okay both are normal populations so x belongs to n mu one comma sigma square y belongs to n mu two comma sigma square the variances are same they are telling us that the samples obtained here there are one two three four five so i'll say n is equal to five and they give you five different samples here the number of samples obtained is three okay so you have x1 to x5 you have y1 to y3 so i can write y2 and y3 okay those were the three samples that you have obtained and they want you to find an estimate for sigma the sigma over here and here are the same so this is a situation where you have two different populations and you want to find out an estimate of sigma square which is identical using all five samples here and all three samples here so if you recall in the last class we had talked about a pooled estimate we had two populations pooled variance if i can call it okay and i will show you that when i so we had seen that sp square was if these are n over here when we had n minus one times if i call this as s1 square the variance the sample variance of these samples s2 square is a sample variance of over here then it was this plus m minus one into s2 square divided by the degrees of freedom which was n plus m minus two so it's an extremely straightforward problem this problem comes under case five if i remember where you had two different populations the variances were unknown case four was when the variances were known so you had sigma one square here sigma two square here and in case five was when the variances were unknown but you knew that they were identical so you could pool both of these samples together in order to find a pooled estimate okay and in that case you could show that the standardized variable or x bar minus y bar has is distributed this is case five so we can show that this is distributed with normally with mean being mu one minus mu two and the variance being sp square okay so you could use this to normalize and then build the confidence intervals for mu one minus mu two that is what we had done maybe we'll just look at our slides to look at that case five to just quickly jog our memory yeah so this was sigma one square sigma two square are unknown and what we had done was was this so we had found out a pooled estimate okay and we had used the logic that if you add two chi square variables with n minus one degree of freedom and m minus one degree of freedom then the resulting and they are two independent chi square random variables then the addition also results in a chi square random variable with degrees of freedom adding up so it will become n plus m minus two so that was problem number 38 let's look at problem number 440 I hope I have no I missed so you know I think I should bring it to a close I'm making a lot of errors what I missed showing you was though I was looking at the slide I missed showing you that slide so I was talking to you about this where you have case five okay where sigma one square sigma and sigma one the sigma one square sigma two square are unknowns but they are equal and when they are equal then we can find this pooled variance as I had written on the board okay so that is what I had tried to I do all right let me look at problem number 40 okay so problem number 40 is an interpretation problem I'll leave it for you to look to look at it it wants you to so it's a case when x1 to xn is a sample from a normal population explain how to obtain a 100 into 1 minus alpha percent confidence interval for the population variance sigma square when the population mean is known okay so this is a case where you are finding the confidence interval for the population variance sigma square and I have already erased that out but since I have my let me check with the yeah it is so let's go to that is case number three interval for sigma square yeah so they're asking you that when you build these population variances then you had assumed that the sorry when you had built this interval for the population variance you had assumed that you do not know the value of mu in fact in building this you have not used the value of mu so question 40 says that what if you knew the correct value of the population mean mu then how would how would you obtain the confidence interval for sigma square so you say that mu is known under this assumption what is the what will change over here this is a very nice problem again checking fundamentals so I will suggest that you you can start by doing the standard diagram which I have been following where you know the underlying now that was completely incorrect I have to draw a chi square variable okay and you wanted to find let us say a two-sided confidence interval so you had a lower side and an upper side and you wrote down that probabilistic statement that the probability of l being less than equal to s square okay is less than equal to u is equal to 1 minus alpha okay after this you standardize s square you know that n minus 1 into s square by sigma square gives you the chi square distribution only in this case the mu is known so think about this distribution that if you were to calculate the variance when so you were trying to calculate s square maybe I'll use the board so if you were trying to calculate so you remember that we had been discussing about this we said that n minus 1 into s square by sigma square is a chi square variable with n minus 1 degrees of freedom so recall back as to why did we use n minus 1 why did we not say n and you should be able to show that it was because we did not know mean and so one degree of freedom had gone away in getting an estimate for the mean in this a problem you know the mean and when you know the mean you should be able to show that s square by by sigma square by n is a chi square variable with n degrees of freedom okay so that will be the difference and by using this information you should be able to build a confidence interval estimate for sigma in the case when the mean mu is known so you don't take away that degree of freedom because you already knew the value of mean and so you make use of that degree of freedom okay let's quickly go to problem number 42 I think I've been it's been a long lecture but so problem 42 so maybe I will not do them but I'll just discuss so problem 42 is an example of where you have is an example of case 4 where there are two populations given to you and in the two populations you have the mean so you have population x1 to xn okay and y1 to yn they've given you x bar they have given you y bar they have also given you s1 and they have given you s2 they have assumed that sigma 1 is equal to sigma 2 so this is not case 4 this is going to be case 5 and they want you to find out the 99 percent confidence interval for mu 1 minus mu 2 so this is extremely straightforward this is case 5 so you will find out the pooled estimate sp or the pooled variance sp square like we discussed in the class and then you should be able to find out what is that 99 percent confidence interval okay just what I just discussed before I erase this out so you know that x bar minus y bar is normally distributed with mu 1 minus mu 2 and sp square being the pooled variance and you know how to calculate that pooled variance like we just discussed so that was problem number 42 now problem number 43 is connected to problem number 42 where we will assume that you know the variance so you don't need to make use of these so the sigma 1 and sigma 2 are given to you and these are given to you as sigma 1 square is 4 sigma 2 square is 5 so this goes into case 4 okay where you will not use the pooled estimate you know that these are not equal so case 5 we had assumed that these two were equal and we had made use of the pooled estimate in this case we know that they are not and so this goes into using case 4 so let me just show you case 4 in the slides because this will be the first time that in the example problems we'll be looking at case 4 so this is case 4 so this sigma 1 square is 4 this sigma 2 square is 5 the number of samples are given to you and you know what is x bar and y bar so you can very easily find out that so this turns out to be a normal distribution so the point you will have to find out is this z of alpha by 2 so it will be z of 0.005 okay which we have also computed before so that was problem number 43 it belonged to the category of case 4 next is 49 okay so probably this is the last problem I look at in problem 649 we want to estimate p the way the so this is case 6 which is the variance sorry which is the probability of success so the question is let me just read out the question that to estimate p the proportion of all newborn babies that are male the gender of 10,000 newborn babies was noted if 5106 of them were male determine a 90 percent confidence interval of p so it's an extremely straightforward problem so you have x i is the gender of i-th newborn okay now x total is equal to x 1 plus x 10,000 so I've made this mistake right is 1 if gender of i-th newborn is male and is 0 if it is a female okay so this is a binomial this is a Bernoulli variable so this becomes a binomial variable and it was seen that the proportion that they obtained was 5106 so the p hat value so specific p hat value I will call it p hat star to say that it's a numerical value is is 5106 which is this x t divided by 10,000 okay so it is 0.5106 now that is a point estimate we saw in the beginning of this class that it is an unbiased estimator of p p is the probability of x i is p probability of success okay so you know you generally we see that you know it should be 0.5 you know the equal probability that the child is a boy or a girl so it should be 0.5 but when you took a sample of 10,000 you got a value of 0.5106 so you have to be able to give me a confidence interval which is a 90% confidence interval and you know this we made it we likened it to a normal distribution and we did this z of alpha by 2 into the variance and the variance that we saw was p into 1 minus p by n okay it's only that because we don't know this p we had made this assumption that the variance is known which had allowed us to write it in this way so p star is given to you you can calculate z of 0.05 right alpha by 2 because alpha over here is 0.1 so you can calculate that and you will be able to get the answer I hope I have yes okay so I'm going to leave the last two problems to you to solve them I've discussed in some sense but this was only a discussion you should actually try to solve them I will upload the third the next module module seven on hypothesis testing so please watch out for updates on Moodle I hope now Moodle is available on the internet and that you're able to access it from wherever you are so stay safe and I'll see you the next time