 Assalamu alaikum. Welcome to lecture number 44 of the course on statistics and probability. Students you will recall that in the last lecture I discussed with you analysis of variance and experimental design. Towards the end of the last lecture we began the discussion of the randomized complete block design. Another very important design and you will recall that I presented to you an example the detailed discussion of which we will be doing in today's lecture. As you now see on the slide in a feeding experiment of some animals four types of rations were given to the animals that were in five groups of four each. The following results were obtained. Students you can see a bivariate table. The top row represents the four types of rations and the first column gives you the five groups into which the animals had been divided. So, if we study the first column the values 32.3, 34.0, 34.3 and so on represent the gains in weights due to ration A and the first value pertains to the animal who is in group 1, the second one for the animal who is in group 2 and so on. Now we are required to perform an analysis of variance and to state our conclusions. Students you have seen that in this problem weight gain due to the rations that is the variable of interest and we are assuming that the purpose of this experiment is that we know that the four rations are one of the reasons why we get the maximum weight gain. The purpose is to increase the weights of those animals and we are trying to compare the different rations in this regard. Now what is the hypothesis testing procedure? Very similar to what we did last time, but students the essential difference is that last time we were talking about the completely randomized design and analysis of variance in that situation is called one way analysis of variance. Aaj is swaqt, amare paas sirif treatment zi nahi, yani rations zi nahi hrkism ke, balkke we have also categorized those animals into five groups and technically each group is called a block. We are now talking about the randomized complete block design. And jesa ke main last time we explained kiya tha jab ame yeh mahesus ho ke amara experimental material homogeneous yani similar nahi hai. And there are some dissimilarities then we do the blocking. Yani jo relatively homogeneous material hai usko ek block mein dal ne, dusri krism ka jo hai usko dusre block mein dal ne and so on. So, let us now do the hypothesis testing procedure step by step. As you now see on the slide, step one is the formulation of the null and the alternative hypothesis and you will be interested to note students that in this particular situation we are in a position to form H naught and H a not only for the treatments, but also for the blocks. Our primary interest of course is in the treatments H naught says mu a is equal to mu b is equal to mu c is equal to mu d, but H alternative is that not all the treatment means are equal. Similarly, we have the null hypothesis H naught dash saying mu 1 is equal to mu 2 is equal to mu 3 is equal to mu 4 is equal to mu 5 and H dash alternative says not all the block means are equal. I in donno sets of hypothesis ko baari baari consider katte. Jaisa ki main kaha, hamara primary interest to treatments me hai wuchar ration jo hai which one of them is able to produce the maximum weight gain that is what we want to know. The null hypothesis says that on the average the weight gain is the same for a, b, c and d and the alternative says that at least one of them is different from the others. Now, what about the other set of hypothesis H naught dash and H a dash jistha maine last time explain kia tha students agar ham fertilizers ko compare karna cha rahe hain aur hamune khet main mukhulif kheto main boenge to mumkin hai ke jo khet nahir ke nazdikh ho, unki fertility different ho from the soil which is further away from the canal, ye baat aapko yaad hogi. Isi tara obviously janvron ki jo grouping ki gayi koi na koi criterion tha jiske tahit kia gaya. But students this analysis of variance that we are going to do in this example the two way anova it enables us to test this hypothesis as you saw on the screen that the null says that mu one is equal to mu two equal to mu three equal to mu four and equal to mu five. Yani wuchar weight gain henna that is the same for all the groups or alternative ye hai that at least one of the groups is different from the others. So, is baat ka matlab kya hai? Iska matlab ye hai ke agar ham baad me after we perform the analysis if we accept the null hypothesis ke ji wo paanchom groups ke liye weight gain baarabar hai to iska matlab to he hoa na ke that grouping was actually not required. Kyuke agar wo paanchom ke liye ek hi jaya weight gain ana hai to phir us hawaale se those animals are homogeneous and we did not need to group them into various groups. Lekin agar ham null ko reject karengi or alternative ko accept karengi which says that at least one of them is different from the others. Then we can say that this kind of grouping or blocking was justified. The second step of the hypothesis testing procedure is the level of significance and as you now see on the screen we may take alpha as 5 percent as we usually do. The third step is the test statistic. Now in order to test H naught versus HA the test statistic is F is equal to mean square treatment over mean square error and in order to test H naught dash versus HA dash the test statistic is F is equal to mean square block over mean square error. The fourth step is the computation of the test statistics and for this purpose we have to carry out quite a lengthy computation. The numbers that are in the brackets inside the body of the table they represent the squares of the various values that we have in our data set. The row under the data values and their squares is t dot j the sums of the various columns t dot 1, t dot 2 and generally t dot j. Similarly the column to the right of the data values is denoted by b i dot and these are the sums of the various rows b 1 dot, b 2 dot and generally speaking b i dot. Students here we have to note that t or b stands for treatment and b stands for block because columns represent the treatment so their totals are t dot 1 or t dot 2 and the rows are representing the blocks or the groups into which the animals have been divided so their sums are b 1 dot or b for block. Going back to the slide once again the row under t dot j is t dot j square and similarly on the right side on the top the column to the right of b i dot is b i dot square. After this if you look at the row under t dot j and look at the column under t dot j then students you will see that sigma i x i j square is under t dot j and sigma j x i j square is on the right side. The reason for this is that the squares in the body of the table are under t dot j. Now when you will add the sums that you have already obtained may you do it using the bottom row, may you do it using the last column. The grand total of the squares of the data values comes out to be 21725.22. Similarly the sum of all the data values comes out to be 21725.22. Similarly the sum of all the data values comes out to be 21725.22. The data values comes out to be 656.4 as you can see on the slide and sigma b i dot square is equal to 86258.04 whereas sigma t dot j square comes out to be 108387.48. Students this is a little long calculation here. Once you have obtained all these sums after that it is fairly simple. The formulae that we have discussed last time with reference to the completely randomized design. The difference being that the sum of all the data values not only are we going to be able to compute the treatment sum of squares in this problem but also we will be computing the block sum of squares. As you now see on the slide, the total sum of squares is equal to double summation x i j square minus t dot dot square over n and this comes out to be 182.17. The treatment sum of squares is given by sigma j t dot j square over r minus t dot dot square over n and the answer is 134.45. Students this is a little long calculation here. I want to remind you that t double dot square over n is called correction factor and this is involved in the formulae when we are trying to apply the shortcut formula. Now, we are also going to be able to compute the block sum of squares and as you now see on the screen the block sum of squares is equal to sigma i b i dot square over c minus t dot dot square over n and the answer is 21.46. It should be noted that c represents the number of observations per block. In other words the number of columns having computed the total s s, the treatment s s and the block s s students, the error sum of squares is given by the total sum of squares minus the sum of the treatment sum of squares and the block sum of squares and substituting the various values that we just obtained the error sum of squares comes out to be 26.26. Students, you will remember that in the last lecture when we did a completely randomized design example, we carried out elaborate calculations like we just saw for this example, but what was the purpose of these calculations? The ANOVA table. Now, in this example we will construct ANOVA table that is a kind of an extension of the one that we had last time and as you now see on the slide, the columns of the ANOVA table are just as we had last time, source of variation, degrees of freedom, sum of squares, mean square and F. The first source of variation is between treatments that is between the rations, the second is the variation between blocks that is the variation between the various groups into which those animals have been divided. The overall variation is represented by the term total and error variation represents the variability that exists within each treatment block combination that we have. Students, this is the last source of variation I mentioned. I said that this is the error variation is representing the variation that is within each treatment block combination that we have. So, students, overall we have 20 different situations, 20 different treatment block combinations i.e. treatment A with group 1, treatment A with group 2 and so on. So, we have 20 different treatment block combinations i.e. treatment D with group 5. If you do this experiment again and again, you will get different results in spite of the fact that the experimental condition is the same, k wohi treatment, wohi block, lakin ek janwar ke liye weight gain kuch aur aayagi aur dosre janwar ke liye kuch aur. So, this is called the within variation aur yeh just saari kombineshno ke liye istra ki variation hai unka jo combined effect hai that is what we have in this column in our ANOVA table. Going back to the slide, students, we note that the degrees of freedom corresponding to the overall variation is 20 minus 1 that is 19. In other words, the total number of treatment block combinations that we have in other words are into C minus 1 where R represents the number of rows and C represents the number of columns. After all, students, do we not have 5 rows and 4 columns, 5 blocks and 4 treatments? Toh jaisa main kaha, the degrees of freedom for the total is given by R into C minus 1 that is 5 into 4 minus 1 that is 20 minus 1 that is 19. The degrees of freedom for treatments is 4 minus 1 that is 3 y because we have 4 treatments, 4 columns. So, C minus 1 in other words is the degrees of freedom for treatments. The degrees of freedom for blocks is 5 minus 1 that is R minus 1 and that is equal to 4. And last but not the least, the error degrees of freedom are obtained by subtracting the sum of 4 and 3 from 19. In other words, 19 minus 3 minus 4 gives us 12 as the error degrees of freedom. Students, these formulas of degrees of freedom as you can see they are extremely simple. The only thing that you have to keep in mind is that if you have taken the treatment column wise, then obviously the degrees of freedom for treatments will be C minus 1. If you take treatments along the rows, then the degrees of freedom for the treatments will become R minus 1 and since the blocks will be along the columns in this case, therefore the degrees of freedom for the blocks will become C minus 1. So, it is fairly simple. Let us go back to the slide and concentrate on the third column that of the sum of squares. As we computed a short while ago, the total sum of squares is 182.17, the sum of squares for treatments is 134.45, that for blocks is 21.46 and the one for error by subtracting 134.45 plus 21.46 from 182.17, the error sum of squares is 21.45 plus 26.26. The column of mean square is very easily obtained by dividing the values in the sum of squares column by the corresponding degrees of freedom. Therefore, 134.45 divided by 3 gives us 44.82 as the treatment mean square, 21.46 divided by 4 gives us 5.36 as the block mean square and 26.26 divided by 12 gives us 2.19 as the error mean square. Students, this is the calculation that we were doing for this purpose. Of course, in order to obtain our test statistic, as I said earlier, in this case we are able to test two sets of hypotheses and correspondingly we have two F values. Coming back to the slide, F1, which is the ratio of the treatment mean square to the error mean square, that is 44.82 divided by 2.19 comes out to be 20.47. And F2, which is the ratio of the block mean square to the error mean square, comes out to be 5.36 over 2.19 equal to 2.45. Students, of course you remember that the next step is the critical region or ismartaba. Interestingly, we have two different critical regions. The reason is that the first F is F1 that follows the F distribution having 3,12 degrees of freedom. But the other one F2 that follows the F distribution having 4,12 degrees of freedom. Why did I say students? Do you not remember? According to what we did last time, k wo jo F hum nikalthe hain uski jo degrees of freedom hoti hain, they are the ones corresponding to what we have in the numerator of our F, that is our first degree of freedom and the one corresponding to what we have in our denominator, that is the error mean square. Uski jo degrees of freedom hain, that is our second degree of freedom. So, as you now see on the slide, since the level of significance in each case is 5 percent. Therefore, the critical region for testing H0 against HA is given by F greater than or equal to F0.053,12 and consulting the F table, this value is equal to 3.49. Similarly, the critical region for testing H0 dash against HA dash is given by F greater than or equal to F0.054,12 and this is equal to 3.26. The sixth and last step is the conclusion. Since our computed value of F1, that is 20.47 exceeds the critical value 3.49. Therefore, we reject the null hypothesis regarding the treatment means and conclude that there is a difference among the means of at least two of the treatments, that is the mean weight gains corresponding to at least two of the rations are different. On the other hand, since our computed value of F2, that is 2.45 does not exceed the critical value 3.26. Therefore, we accept the null hypothesis regarding the equality of the block means and thus we can conclude that the blocking, that is the grouping of the animals as it was done in this experiment was actually not required. Students, you have noted that the null hypothesis has been accepted, that is mu1 is equal to mu2 is equal to mu3 and so on up to 5. This means that the blocking we had done, the groups we had made, that was actually not required. But now obviously an experiment has been performed, the data has been obtained. Now we can use this information in the future. If we want to do this kind of experiment in the future, then we will not be grouping in this manner the next time we face this result. But what about the hypothesis that our primary interest lies in? You remember, we are basically interested in comparing the four rations and we want to know which one of them gives the highest weight gain. Students, as you have noted, our computed value that was much larger than the critical value. Therefore, we have rejected the null hypothesis and we have concluded that the four rations are not all similar with regard to weight gain. Naturally this question arises that which one is the best? The answer to this question, students can be obtained by applying what is called the least significant difference test or the LSD test. And as you now see on the screen, according to this procedure we compute the smallest difference that would be judged significant and compare the absolute values of all the differences of means with this smallest difference. This smallest difference is called the least significant difference or the LSD and is given by LSD is equal to T alpha by 2 nu degrees of freedom multiplied by the square root of twice the mean square error divided by r the number of observations per treatment. And it should be noted that nu stands for the error degrees of freedom. Students, as we have discussed in many other procedures, we are not doing the detailed mathematical derivations in this course. Similarly, we will not go into details. But the point that I want to convey to you is that this is basically the T test or let me say something that we have derived from the T test that I have already discussed with you when we were wanting to compare two population means. Situations is that students have four rations and we want to compare those four means. We have decided according to the analysis of variance that we conducted that not all four of them are equal. So, the next thing is that we should compare them pairwise A with B, A with C, A with D and so on. So, if we start doing the analysis, then we will have to do the T test again and again. Testing mu A equal to mu B versus mu A not equal to mu B, then testing mu A equal to mu C and so on and so forth. So, this LSD procedure, this is a procedure by which all this lengthy work is summarized and we will decide which two or three are different and which are not. So, let us apply it to this example. As you now see on the screen, the least significant difference is equal to T alpha by 2 mu square root of twice the MSE over r and in this problem MSE is equal to 2.19, r is equal to 5 and mu the error degrees of freedom equals 12. Since our level of significance is 5 percent, therefore T alpha by 2 mu is equivalent to T 0.025 12 degrees of freedom and students looking up the T table and then performing all the calculations, the LSD comes out to be 2.04. Students, ye jo resultena isko understand ke jhe. This is the least difference between any two treatment means that would be judged significant. Hamne aftarol iska naam kya rakhav hai, least significant difference. To iska matlab yeh hua ke hamari jaa treatments hain. Ham apne jo data values hain unki madad se, in sab ki means nikalinge and of course, what we will be getting are not mu a mu b mu c or mu d, bo to unknown parameters hain jinn ke baare me hain saara inference karna kya. But of course, we will obtain x bar a, x bar b, x bar c and x bar d or uski baad we have to compare the differences between these respective x bars with the least significant difference that we have just computed. As you now see on the slide the totals of the first, second, third and fourth columns are 172.1, 173.9, 168.5 and 141.9 and of course, these are exactly the same quantities that we have already computed t dot 1, t dot 2, t dot 3 and t dot 4. Lekin is sverkth hain haari interest totals me nahi belke hain mean values me hain and hence dividing 172.1 by 5 we obtain x bar a equal to 34.42. Similarly, dividing 173.9 by 5 x bar b is 34.78 and so on. In order to apply the LST test students we should arrange these sample means that we have just obtained in ascending order of magnitude. Doing that x bar d comes first because it is the smallest value 28.38 it is followed by x bar c which is 33.70, this is followed by x bar a 34.42 and last but not the least we have x bar b equal to 34.78. So, this is for convenience. Because of this we are very easily able to explain which means are not significantly different from each other and which are. So, as you now see on the slide we will be drawing lines under pairs of adjacent means or sets of means that are not significantly different. And in this example you can see that the line starts under x bar c and it goes up to x bar b. Students is k jo difference hain 33.70 or 34.42 the absolute difference between these two numbers is less than 2.04 the least significant difference that we computed a short while ago. Not only this but also the difference between 34.78 x bar b and 33.70 x bar c this difference is also less than 2.04 the LSD. Because x bar a x bar c or x bar b that will also be less than the LSD the only sample mean which is standing out as being significantly different. From the rest is x bar d we started this procedure from x bar d but we found that the difference between x bar d and x bar c the absolute difference between these two is less than greater than 2.04. And therefore, we were not able to draw a line underneath these two so that we could have said that they are not significantly different. But, after that x bar c x bar a or x bar b jo hain un ke dar mean jo differences hain they are less than the LSD. And therefore, we can say that these three are not significantly different. Students ye itna muskil nahin ye jitna aap sabhaj rahe hain. Ye hum karte ishta se hain ke hum start let hain pehli apni jo pehli mean hotiye wahashe. We start trying to draw this line agar wo pehle maini draw kee jaa sakte because of that difference being larger than the LSD. Then, of course, we just don't do that. But then we start from the second one. Phir hum sakin se sudo hain ge jahata hum jaa sakin. Aur uske baad hum third se bhi start le sakte hain. And we can go up to wherever we can go. So I would like to encourage you to attempt quite a few questions on this particular topic which you can find in your own text book as well as in other books. But students please keep one thing in mind. The LSD test, the occasion for applying this test does not even arise if we accept the null hypothesis that we had in the analysis of variance. Yani agar hamara wo ju pehla hypothesis tha na that mua is equal to mu b is equal to mu c is equal to mu d agar wo accept ho jaata to phir to saval hi pehda nahi hota na ke aap ye LSD test laga hain. Ye to sirf us surat me lagega. If you reject that null hypothesis and then you say to yourself ke ji ye to phir is ka matlab hai ke chaaro rashan ek jaise nahi hai to phir naturally ye saval pehda hota hai ke which one is better or which one is worse and so on. Students, in this particular example me which rashan is better which is worse? Do you know? According to what we did just now, C, A and B are not significantly different. The only one which is significantly different from the rest is rashan D. Lekin haven't you noticed that it has occurred on the left end of the four when we arrange them in ascending order? Is ka matlab ye hai that it is not the best one, rather it is the one which has the poorest performance with regard to weight gain? Is ka matlab ye hua ke agar hain recommend karna chaat the farmers ko ke agar aap ne injanvro ka weight bahana hai to aap phala rashan inko dein to hamar hi recommend ka ho nahi chaahi? Students, we may recommend C, A or B because our statistical analysis has shown that they are not significantly different with regard to weight gain. To phir uske baad aasi hoega, ke depending on the availability or the price or other such criteria, they can decide whether they want to have C, A or B. But surely on the basis of the analysis that we just conducted, we will not recommend rashan D. After all that has stood out as being different significantly from the rest and not having the best but the poorest performance. Students, this brings us to the end of our discussion regarding analysis of variance and experimental design. The next concept that I am going to discuss with you is the chi square test of goodness of fit. You might recall that in lecture number 28 when we were discussing probability distributions, we did an example in which we fitted the binomial distribution to real data. Aur aap peyad hoga, ke hamne wo probabilities compute ki jo binomial formulae kirti hai thame milni thi and then we also found the expected frequencies and after that when we were trying to compare the observed frequencies with the expected ones, we said that there does not seem to be much discrepancy between the two and it appears that our fit is a good fit. Likin aapko ye bhi yaad hoga ki main uswakta aap se kaha tha ke there is a formal procedure by which we can judge whether or not our fit is good and that is exactly what I am going to discuss with you right now. Before we go to the test, let us very quickly review the chi square distribution on the basis of which we will be conducting this test. So, as you now see on the slide, the chi square distribution is a continuous distribution ranging from 0 to infinity. The number of degrees of freedom determines the shape of the chi square distribution. Generally speaking, the chi square distribution is positively skewed, but the skewness decreases as new increases and we can say that the chi square distribution tends to the normal distribution as the number of degrees of freedom tends to infinity. Having reviewed the basic properties of the chi square distribution students, let us proceed to the chi square test of goodness of fit and I would like to do this with the help of exactly the same example that we considered in lecture number 28. As you now see on the slide, the example read, the following data has been obtained by tossing a loaded die 5 times and noting the number of times that we obtained a 6 fit a binomial distribution to this data. The data was x values 0, 1, 2, 3, 4, 5 where x of course represents the number of 6's in 5 tosses of the loaded die and the column of frequencies read 12, 56, 74 and so on so that the total was 200. In order to fit a binomial distribution first of all we find the sample mean x bar and for this purpose we find sigma f of x which is 398 and dividing by sigma f which is 200 our x bar comes out to be 1.99. Now, putting x bar equal to n p and noting that n is equal to 5 we obtain p is equal to 0.398. Hence the binomial formula is 5 c x 0.398 raised to x into 0.602 raised to 5 minus x. Students obviously we will be substituting the values of x in this formula x is equal to 0, 1, 2 and in this way we get all the probabilities and after that we will be multiplying each of them by 200 total number of times this experiment was repeated and that will give us the expected frequencies. So, as you now see on the screen the probabilities are 0.07907, 0.26136 and so on and the expected frequencies come out to be 15.8, 52.5, 69.1 and so on. Now, our interest lies in determining whether the expected frequencies are quite close to the observed ones or is there a considerable amount of discrepancy between the two. It appears that there is not a tremendous amount of difference between the observed frequencies OI and the expected frequencies which are denoted by EI. So, why not adopt the proper formal procedure and the chi-square test of goodness of fit has exactly the same pattern the six steps that we have in any hypothesis testing procedure. As you now see on the slide the null hypothesis in this particular situation is that the fit is good and the alternative is that the fit is not good. Students, actually the underlying mathematics is such that this is the way we should formulate our null and alternative hypotheses in this particular situation. But this is the point that if we say that the fit is good then we can proceed in the manner that we would like to proceed and keeping in mind that we always begin by assuming that H naught is true. This is the assumption which will lead us to the steps that we have after this. The second step is the level of significance and we can set it at 5 percent. The third step is the test statistic and as you can see the test statistic in this particular situation is given by the summation of the quantity OI minus EI whole square divided by EI and it can be mathematically proved that if H naught is true then this statistic follows the chi-square distribution having k minus 1 minus r degrees of freedom. Students, k is the number of x values that we have, 0, 1, 2, 3, 4, 5. So, k is equal to 6. The number of parameters that we estimate from the sample data by normal distribution n and p. That n is already known because we know that the loaded die that is being tossed 5 times. So, n is equal to 5. But do you remember that we did the procedure where p was estimated from the sample data? Why? Because our equation was x bar is equal to n p. Although the real equation is mu is equal to n p because mu was not available. We replaced it with x bar and p was removed through it. So, that is an estimated value of p and so this is one parameter that we are estimating from the sample data. As I said earlier, r is the number of parameters that we estimate from the sample. So, in this question r is equal to 1. We have estimated the same parameter from the sample data. Students, I had said earlier that our statistic sigma OI minus EI whole square over EI follows the chi-square distribution having k minus 1 minus r degrees of freedom. If k is 6 or r is 1 then 6 minus 1 minus 1 that is 4 degrees of freedom. But having said all these students, there is a very important point that I need to convey to you. If x is equal to 5 then we have to combine some of those x values so that our expected frequency becomes greater than or equal to 5. As you now see on the slide in this example, the expected frequency against x equal to 5 is equal to 2.0 and this is less than 5 and hence we will combine the x value 4 with the x value 5 and thus adding the two expected frequencies pertaining to these two x values, our expected frequency of the combined category will come out to be 17.1. Also, the observed frequency of the combined category comes out to be 1 plus 18 that is 19 and thus the effective number of categories that we now have is no longer 6 but 5. As such our statistic will follow the chi-square distribution having 5 minus 1 minus 1 that is 3 degrees of freedom. The fourth step is the computation of the test statistic and for that purpose we construct the columns of O i minus E i and O i minus E i whole square divided by E i. Now, the sum of this last column is equal to our test statistic and in this problem it is equal to 2.69. Students, as you know the fifth step is the determination of the critical region and I would like to point out that the underlying mathematics of this test is such that it should always be a right tailed test. Therefore, looking at the chi-square table under 0.05 against 3 degrees of freedom we obtain as you now see on the screen the critical value 7.82. The last step is the conclusion and students since our value 2.69 is less than the critical value therefore, we accept H naught and conclude that the fit is good. In today's lecture we discussed the randomized complete block design and two way analysis of variance. Then we went on to the LSD test and last but not the least we discussed the chi-square test of goodness of fit. Students, in the next lecture I will be discussing with you the chi-square test of independence. Best of luck and until next time, Allah Hafiz.