 As Salaamu Alaikum. Welcome to lecture number 36 of the course on statistics and probability. Students you will recall that in the last lecture I discussed with you in detail the confidence interval for mu and the confidence interval for mu 1 minus mu 2. In today's lecture I will begin where the confidence intervals for p and p 1 minus p 2 and then we will go on to a very interesting and important topic, determination of sample size. So, let us begin with a confidence interval for p, the proportion of successes in a binomial population. As you now see on the screen, for a large sample drawn from a binomial population the confidence interval for p is given by p hat plus minus z alpha by 2 into square root of p hat into 1 minus p hat over n. In this formula as in the case of the confidence interval for mu z alpha by 2 will be equal to 1.96 for 95 percent confidence and it will be 2.58 for 99 percent confidence. Of course, if we want only 90 percent confidence then z alpha by 2 comes out to be 1.645. Students, it is p hat plus minus z alpha by 2 into the square root of a certain quantity and according to what I explained in the last lecture, this is the statistic plus minus z alpha by 2 into the standard error of our statistic. Shai dhaap ku yaadhu, when I discussed with you the sampling distribution of p hat, I did convey to you that the standard error of p hat. In other words, the standard deviation of the sampling distribution of p hat was equal to the square root of p hat into q hat over n and this is exactly what we have in the expression that I just presented. So, apne dekhaha ke uska jo overall format hai that is very similar to what we were discussing in the last lecture. Let us now apply this formula to an example. As you now see on the slide, let us look at a survey of teenagers who have appeared in a juvenile court three times or more. A survey of 634 of these shows that 291 of them are orphans one or both parents dead. What proportion of all teenagers with three or more appearances in court are orphans? The estimate is to be made with 99 percent confidence. In order to solve this problem, the first thing to note is that n is 634 and that is a large sample size, so that the formula that we are discussing is valid in this particular situation. Next, we talk about p hat and considering being an orphan as success, p hat comes out to be 291 over 634 and that is equal to 0.459. As such, q hat which is the proportion of failures in the sample comes out to be 1 minus p hat and that is 1 minus 0.459 which is equal to 0.541. Now, substituting all these values in the formula of the confidence interval, we obtain 0.459 plus minus 2.58 into the square root of 0.459 into 0.541 over 634. Of course, we put the value 2.58 in place of z alpha by 2 because of the fact that we want 99 percent confidence. Now, solving this expression, we obtain 0.408 as the lower limit and 0.510 as the upper limit of our 99 percent confidence interval for p. Students, let us try to interpret this result. This is the problem in which we are talking about teenagers who have appeared in the juvenile court three times or more and we are trying to see what proportion of them are orphans. Our result is 41 percent to 51 percent. This is the 99 percent confidence interval for p. The true proportion of orphans among teenagers of this type. Now, note that this interval has a particular width 41 percent to 51 percent or a very high level of confidence for p. The sample size was 634 which is a quite a large sample size. If we want the width to be lower, we can reduce the level of confidence and that will make it a little narrower because 99 percent is a very high level. Even 95 percent is alright and if we do that, then of course, our Z alpha by 2 will become 1.96 and in that case, when we compute, we will get a relatively narrower confidence interval. Let us now look at another point important point. It is 634 sample size. Generally, we can apply this formula. Students, you may remember that when I told you that the binomial distribution can be approximated by the normal distribution, I presented a rule of thumb in it and that was that if both Np and Nq are greater than or equal to 5, then we can apply the normal approximation to the binomial. In this case, both Np hat and Nq hat will be much larger than 5 and so this particular formula can be applied. Let us now consider another example. After a long career as a member of the city council, Mr. Scott decided to run for mayor of the city. The campaign against the present mayor has been very strong with large sums of money spent by each candidate on advertisements. In the final weeks, Mr. Scott has pulled ahead according to the polls published in a leading daily newspaper. But in order to check the results, Mr. Scott's staff conducts their own poll over the weekend prior to the election. The results show that for a random sample of 500 voters, 290 will vote for Mr. Scott. Develop a 95% confidence interval for the population proportion that will vote for Mr. Scott. Can Mr. Scott conclude that he will win the election? You have seen that this is a very interesting problem and quite a real life problem. First of all, let us see what is success. In the previous problem, of course, success was the first thing that we had to realize and in that one, the fact that the teenager is an orphan that was being regarded as success because I have already explained to you that success is a technical term. Here voting for Mr. Scott is success and voting against Mr. Scott is failure. If this is the definition of success, then of course it is very easy for us to compute P hat, the proportion of successes in our sample. And as you now see on the screen, since 290 people have favored Mr. Scott out of 500, hence P hat is equal to 290 over 500 and that is equal to 0.58. Of course, 0.58 is a point estimate of P, the true proportion of people in the population who will be voting for Mr. Scott. Now, as far as the 95% confidence interval for P is concerned, we have the formula P hat plus minus z alpha by 2 into the square root of P hat into 1 minus P hat into 1 minus P hat over n and substituting all the values, our interval comes out to be 0.537 to 0.623. In other words, the 95% confidence interval for P is 54% to 62%. Students, you remember that Mr. Scott thinks that he will win the election? Now, the point estimate that is 58% or the 95% confidence interval that is from 54% to 62%. Now, these figures are higher than 50% or agar 50% se zada log vote karenge to zahir hai ke he will be winning the election. To iska matlab yeh hua ke on the basis of this survey that his staff has conducted on their own, he can reasonably conclude that he will be winning this election. Let us now consider another example. As you now see on the screen, a group of statistical researchers surveyed 210 chief executives of fast growing small companies. Only 51% of these executives had a management succession plan in place. A spokesman for the group made the statement that many companies do not worry about the management succession unless it is an immediate problem. However, the unexpected exit of a corporate leader can disrupt and unfocus a company for long enough to cause it to lose its momentum. Use the survey figure to compute a 92% confidence interval to estimate the proportion of all fast growing small companies that do have a management succession plan. Students, iski andar, we are interested in computing the proportion of companies that do have this plan in place or have 92% confidence interval construct kar na chate. So the first thing to note is that if in our diagrammatic version of this situation we are putting 92% area in the middle, it means that we have to place 4% area to the left and 4% to the right. So in this case may zki value, as you now see on the slide, according to the area table of the standard normal distribution, z alpha by 2 comes out to be 1.75. Hence we can use this value to compute our confidence interval. The formula of course is p hat plus minus z alpha by 2 into the square root of 1.75. p hat into q hat over n. Now we have n equal to 210 and p hat equal to 0.51 as we noticed a short while ago. Hence substituting all these values in our formula, the 92% confidence interval comes out to be 45% to 57%. Now that we have discussed how to construct the confidence interval for p, let us proceed to the confidence interval for p1 minus p2. In this situation we may be interested in determining the difference between the proportion of successes in one particular population and a similar proportion in another population. Suppose we are interested in determining the difference between the proportion of smokers in Karachi and the proportion of smokers in Lahore. So we will be talking about p1 minus p2 where one stands for Karachi and two stands for Lahore or kai or interesting situations where we are interested in the difference between proportions in two populations. So as you now see on the screen for large samples drawn independently from two binomial populations, the confidence interval for p1 minus p2 is given by p1 hat minus p2 hat plus minus z alpha by 2 into the square root of p1 hat q1 hat over n1 plus p2 hat q2 hat over n2. The point estimate plus minus z alpha by 2 into the standard error of the point estimate p1 minus p2 hat minus p2 hat plus minus z alpha by 2 into the square root of sum big quantity and this sum big quantity square root is the standard error of p1 hat minus p2 hat. Of course I could have started deriving all of them for you but I would not want to do that. I would like you to develop a sense of the basic pattern without necessarily having to go into the lengthy mathematical derivation. Let us now apply this particular formula to an example. As you now see on the slide in a poll of students in a large university 300 of 400 students living in students' residences that is hostels approved a certain course of action whereas 200 of 300 students not living in students' residences approved that particular course of action. Estimate the difference in the proportions favoring the course of action and compute the 90 percent confidence interval for this difference. Now in order to solve this problem first of all let us denote by p1 hat the proportion of students favoring the course of action in the first sample that is the sample of residence students. Also let p2 hat be the proportion of students favoring this particular course of action in the second sample that is the sample of students who are not residing in students' residences. Then according to the data that we have available to us p1 hat is equal to 300 over 400 and that is 0.75. Also p2 hat which is 200 over 300 comes out to be 0.67. Therefore the difference in the two sample proportions is 0.67. Therefore 0.75 minus 0.67 and that is 0.08. Now the required level of confidence is 90 percent. Therefore Z alpha by 2 is 1.645 and hence our 90 percent confidence interval for p1 minus p2 is 0.08 plus minus 1.645 into the square root of 0.75 into 0.25 over 400 plus 0.67 into 0.33 over 300. Solving this expression we obtain the lower limit of our confidence interval as 0.023 and the upper limit is 0.137. Rounding these we can say that the lower limit is 2 percent and the upper limit is 14 percent. Let us interpret this result. We are saying that the difference between the two categories of students, this difference in the proportions lies somewhere between 2 percent and 14 percent and this statement is being made on the basis of 90 percent confidence. Now you have seen that this is a very wide interval. You have one edge which is 2 percent which means the difference is very small and the other edge is 14 percent and that is not very small. So now you have seen students that our level of confidence was not that high. It was not 99 percent or 95 percent. It was only 90 percent and our interval is slightly wide. This means that the sample sizes are 400 and 300 although they are so large but still they are not large enough for us to have a narrow interval as narrow as we would have liked in this particular problem. So this is the kind of situation that we are dealing with all the time when we are doing interval estimation. See, different formulas are developed on the basis of scientific reasoning, mathematical reasoning. But this cannot be said that we will cover every kind of thing in it and there will be no problem or encounter in it. Obviously all these procedures that you are studying, it is very useful in many situations but the limitations in this problem should be recognized as well. Only then you will be able to apply these mathematical formulae and many other mathematical formulae in a proper way, in a real life situation. Alright, let us now start another very interesting concept and that is the concept of the determination of sample size. See, students, we have been talking for so long regarding estimation and we are saying that the sample size was so much and so on. If you have to conduct a study of this type, the first question will arise that how large a sample should I take from this particular population? This is the first question and all other things have been discussed. So, I will now present to you a method of determining the sample size. If we want to achieve a desired level of precision with a desired level of confidence and I will present it to you first with reference to the estimation of mu. As you now see on the screen, in deriving the confidence interval for mu, we have the expression the probability that x bar minus mu lies between minus z alpha by 2 into sigma over square root of n and z alpha by 2 into sigma over square root of n. This probability is equal to 1 minus alpha. Now, this statement implies that the maximum allowable difference between x bar and mu is z alpha by 2 into sigma over square root of n. Students, I am sure, but there is no need for that. We are saying that the probability that this t is lying between these two limits is probability 1 minus alpha. Now, these two limits between which this quantity x bar minus mu lies, these are the extremes where we can allow this x bar minus mu to this extreme or this extreme. So, we can give this statement which I have given and I would like to request you to have another look at the same slide that you just saw. Now, the maximum allowable difference between x bar and mu represented by the modulus of x bar minus mu and it is equal to z alpha by 2 into sigma over square root of n. Now, the quantity modulus of x bar minus mu is also called the error of the estimator x bar and it can be denoted by small e. Thus, the error bound for estimating mu is given by z alpha by 2 into sigma over square root of n. In other words, if our level of confidence is 1 minus alpha and we want that the error in estimating mu by using x bar, this error should be less than e, then we need n in such a way that the following equation is satisfied. The equation e is equal to z alpha by 2 sigma over square root of n. Now, if I bring square root of n to the left hand side and take e to the right hand side, I obtain square root of n is equal to z alpha by 2 into sigma over e and if I take the square of both sides of this equation, I obtain n is equal to z alpha by 2 sigma over e whole square. Students, this formula we have developed, let us try to understand it. It is n equal to z alpha by 2 sigma over e whole square. In this, the components on the right hand side, look at z line, sigma, the population standard deviation and at the moment we are assuming that it is known. In addition, z alpha by 2 and denominator e. Now, note that z alpha by 2 and e are both in your own control. The question is that z alpha by 2 will denote that number which will be according to your desired level of confidence and e is the maximum error that you are wanting in your estimation process. You will determine this yourself. And this is the way we can determine the required sample size in such a situation. But what if sigma is unknown and obviously most of the time the population standard deviation is not known. Students, in such a case we will estimate sigma by s and this s will be computed from a pilot sample. A study which is prior to the actual study. We will replace sigma by s in this formula and find n. So, let us now apply this concept of determination of sample size to an example. As you now see on the screen, a research worker wishes to estimate the mean of a population using a sample sufficiently large that the probability will be 95 percent that the sample mean will not differ from the true mean by more than 25 percent of the standard deviation. How large a sample should be taken? In order to solve this problem the first thing to note is that the error bound e which is also the modulus of x bar minus mu this is equal to 25 percent of the standard deviation that is 25 sigma over 100 and this is equal to sigma over 4. Also because we want 95 percent level of confidence therefore, Z alpha by 2 is equal to 1.96. Hence substituting these values in the formula for n that we derived a short while ago we find that sigma cancels with sigma in the denominator and we are left with 1.96 into 4 whole square and this comes out to be 61.4656. Hence the required sample size is 62 and it is important to note that 61.4656 will be rounded upward because obviously the sample size cannot be fractional. Students you note that here 61.46 which is the answer was a little less than 61.5 so you could say that according to the ordinary rules of rounding it should have been rounded and turned into 61. But please remember that whenever we are determining the sample size by this process we will always be rounding upward. We should not have rounded it down we should have rounded it upward. So we would like to play safe and it is the processes that we should be rounding upward. All right with reference to mu of course a similar process can be applied with reference to the estimation of p the proportion of successes in a binomial population. So as you now see on the slide the large sample confidence interval for p is given by p hat is equal to Z alpha by 2 into the square root of p hat into q hat over n. This implies according to a logic similar to what was presented a short while ago that the error bound e is equal to Z alpha by 2 into the square root of p hat into q hat over n. Therefore solving for n as before we obtain n is equal to Z alpha by 2 whole square into p hat q hat over e square and it should be noted that the values of p hat and q hat are not known because the sample of course has not yet been selected. Therefore we use an estimated p hat which we obtain from a pilot sample. Let us now apply this concept to an example. In a random sample of 75 axel shafts 12 have a surface finish that is rougher than the specifications will allow. How large a sample is required if we want to be 95 percent confident that the error in using p hat to estimate p is less than 0.05 students. Perhaps you are thinking that this seems to be a very complicated situation. Again if we analyze it step by step then the first thing is that we are talking about p and p hat. This means that we are talking about success and failure. So success is what we call which we are interested in. Here we are saying that there are some axel shafts whose surface finish is rougher than what is acceptable. This is success. If it is rougher then this is success and if it is alright then that is failure. Do remember success and failure are technical terms. Now we are talking about the sample of 75 axel shafts we have selected. There are 12 of them which can be classified as success i.e. the finish that is rough. But we are saying that if we want to be 95 percent confident that the error in estimating the true proportion of the shafts which are rougher than the allowed limit that is less than or equal to 0.05. If we want this then what should be our sample size. You will say that this explanation is too long but it is alright. Everything is according to what we are wanting to do. First note that we have already drawn the 75 axel shafts sample. This is why we will look at the pilot sample and the actual sample, the actual study which we want to talk about now, we want to determine the sample size with 95 percent confidence and an error bound of 0.05. Hence as you now see on the screen the p hat that we have obtained from our pilot sample is 12 over 75 and that is 0.16. Also the error bound e which is the modulus of p hat minus p that has been specified as 0.05. Also because we want to be 95 percent confident therefore z alpha by 2 is equal to 1.96 substituting all these values in the formula that was presented a short while ago. We obtain n is equal to 1.96 over 0.05 whole square multiplied by 0.16 into 0.84 and solving this expression n comes out to be 206.52 which upon rounding upward yields n is equal to 207. Students, you saw that the first sample was 75 but that was the pilot sample that we drew in order to have an estimated value of p hat and when we put that in this formula then our required sample size for the actual study that has come out to be 207. So we will draw at least 207 sample size and then we will get error bound 0.05 or level of confidence 95 percent. At least I said that if you keep it bigger than this then it is even better. Your level of confidence might increase or your error bound might decrease so it will be even better. But you can say that the minimal required sample size to have this much confidence at least and to have this much error bound at the most that minimum desired required sample size is 207. Students, you saw that inferential statistics are interesting. You are able to draw inferences and conclusions about real life phenomena and about populations on the basis of sample data. In a scientific and proper way of course, you can draw conclusions about entire populations. As I said in lecture number 31, inferential statistics can be done in two main ways. One is estimation and the other is hypothesis testing. Under the category of estimation, we talked about point estimation, one single value that estimates the parameter and next we have talked about interval estimation. Now students, we are going to start the other very interesting and important area of statistical inference and that is hypothesis testing. What do I mean by hypothesis testing? I mean that I would like to test a certain hypothesis that I already have in my mind. Estimation is the situation in which you assume that we have no idea about that phenomenon and we are trying to estimate mu or p or whatever it may be. We are wanting to do that with an open mind based on the information that we have in the sample that we have drawn. We are wanting to estimate an unknown quantity that we don't have any information about. Students, hypothesis testing on the other hand is this situation where we already have an idea, let me say a hunch about that particular phenomenon. For example, suppose that we are talking about African people who I think we all would agree that they are generally very tall. We might have this assumption in our mind that the mean height of the adult males of this particular African country must be at least so much. Now this is an idea in our mind that at least that is the mean height. In this situation, we would like to test this hypothesis based on sample data. We draw a random sample from that country and we measure the heights of those people who are in the sample and we find x bar. Now our x bar will either support our hypothesis or it will fail to support our hypothesis. If our x bar is close to the hypothesized value of mu, then of course we would like to accept our hypothesized value. But if there is a lot of difference between our x bar and the hypothesized value of mu, then obviously we would not be inclined to accept our hypothesis. So students, this is the crux of the matter. We will see whether our sample data supports our hypothesis or provides evidence against it. But this procedure is a very mathematical and a very sound procedure. But it involves a lot of steps. It involves a few steps and I would like you to concentrate on various points one by one. It is possible that you will not be able to clear all the points in the first attempt. But if you carefully look at each point, then this concept will become clear to you. Now that I have talked about the hypothesis, let us express it in a formal way. As you now see on the screen, we first of all define the null and the alternative hypothesis. A null hypothesis generally denoted by the symbol H naught is any hypothesis which is to be tested for possible rejection or nullification. Examples of null hypothesis are the given coin is a drug is ineffective in curing a particular disease. There is no difference between the two teaching methods. Students, you have noted that the three examples I have presented, they pertain to situations that we may be interested in in real life. But these three statements were not presented in a very mathematical way. Generally, when we write null hypothesis, then we express it in a mathematical way. For example, this first example that this particular coin is unbiased, if we want to express it mathematically, then we will write it this way that H naught is that P is equal to half. So, probability of head is going to be half and if we are regarding head as success, then of course, this probability is to be called P and so we say P is equal to half. Similarly, as you now see on the screen, other examples of null hypothesis expressed in a numerical way are H naught mu is equal to 62 inches and H naught mu is equal to 130 pounds. In both the examples, the first example is that mu is equal to 62 inches. This hypothesis pertain to a population of women and we are saying that our hypothesis is that their mean height that is 62 inches. Similarly, the second one that would pertain to a population of women again I can say and we are saying that their mean weight that is 130 pounds. This is our null hypothesis that is the hypothesis that we want to test. Students, against this we have the other hypothesis that we will accept in that situation if we reject null hypothesis and that other hypothesis is called the alternative hypothesis. As you now see on the slide, an alternative hypothesis is any other hypothesis which we are willing to accept when the null hypothesis H naught is rejected. The alternative hypothesis is customarily denoted by H 1 or H a. For example, if our null hypothesis is H naught mu is equal to 62 inches then our alternative hypothesis may be H 1 mu is unequal to 62 inches or H 1 mu is less than 62 inches. Students, these two examples I have just presented to you, of alternative hypothesis. There is a big difference which is very important. The first time I said that our alternative hypothesis may be that mu is not equal to 62 inches. It is a simple thing that null is saying that mu is 62 and alternative is saying that no it is not 62 inches. The second time I said something different and I said that null is saying that mu is 62 inches but alternative is that the mean height of this particular population of women that mean height is less than 62 inches. The difference between these two alternative hypotheses may be students. This we have to relate with this thing that this tells us whether we are going to do what is called a two-tailed test or are we going to do what is called a one-tailed test. I had told you that there are many concepts or steps involved in this. Now gradually look at each and every thing and just do not panic or worry that you cannot understand this. Just go along slowly but surely and inshallah you will understand this very interesting concept. Now I will go to the next important concept and that is the level of significance. As you now see on the slide the probability of committing type one error can also be called the level of significance of a test. Students, now the question arises that what is the meaning of type one error? The probability of which we are talking about is the level of significance. For this I would like to draw your attention to all possible situations that you can have in a hypothesis testing procedure. As you now see on the slide there are two situations as far as the actual truth is concerned and there are two situations as far as the decision that we are going to adopt is concerned. See the table in front of you has a box head on the top that gives the two decisions one of which will be made. Either you will accept H naught or you will reject H naught. But the stub of the table is that the actual truth that can be that the null hypothesis that you have hypothesized it is actually true or the null hypothesis that you have hypothesized is actually false. Now the two situations you are seeing on the right side or the two situations you are seeing on the top when you gather them together you have four possible situations. Either H naught is actually true and you also accept H naught. This means that you are taking a correct decision and there is no error. In other words there is no mistake. Now if you look at the bottom right hand corner of the table you find this situation when H naught is actually false and based on your sample data you also decide to reject H naught. So, if H naught was really false and you also decide to reject H naught then it is obvious that once again you are taking a correct decision and there is no error. But students the other two cells of this table present situations where you are taking wrong decisions. If H naught is actually not actually such. But you say that I have to reject H naught because my sample value does not tell you what has been hypothesized then you are rejecting such a hypothesis which is actually true. And this is a wrong decision and this particular error is called type one error. If H naught is actually false but you decide to accept H naught based on your data then once again students you are taking a wrong decision. This kind of an error is called type two error and students the probability of committing type one error is denoted by alpha whereas the probability of committing type two error is denoted by beta. The first one alpha is also called the level of significance of the test. Students hypothesis testing. The formulation of the null and the alternative hypothesis and the concept of the level of significance. Next time we will be continuing with the concept of hypothesis testing and we will discuss various other concepts. In the meantime I would like to encourage you to attempt quite a few questions pertaining to interval estimation. My best wishes to you and until next time Allah Hafiz.