 Well, I hope that you had a nice little break. Let us get on with the problem solving. We left the fertilizer problem at this point where we were discussing the analysis of variance table, source of variation we have 3 of them the treatment error and sorry we have only 2 of them treatment and error and then we have the total source of variation. We have 2 degrees of freedom 12 degrees of freedom for the treatment and error respectively sum of squares are 44.13 for treatment 286.8 for error and total is 330.93. So, rather than looking at the sum of squares here the error sum of squares is much higher than the treatment sum of squares may be by even 4 times where to normalize them and once we do that we find the mean squares are rather comparable and it is coming to 0.92. So, even without doing any F test or so we can look at the mean square values and conclude the treatment sum of squares are not dominating or considerably higher than the error sum of squares both of them are comparable. So, whatever differences I observe because of treatment variation was comparable to that of the noise and hence I can immediately say that the fertilizers are not different from each other right. What is this p value of 0.424 but before I get to that it is also good to independently verify your calculations. I have used Minitab to check whether my calculations are indeed correct since the numbers are matching pretty well I am happy with my calculations. So what is this p value of 0.424 our alpha value was specified up front at 0.05. So, if the F value was so high 0.92 is not good enough if the F value was so high that it fell in a region beyond F alpha a-1 a into n-1 degrees of freedom okay numerator and denominator degrees of freedom then the statistic falls in the rejection region. So, we are finding the F value and we are also comparing this F value with F alpha a-1 a into n-1 in our case F alpha is 0.05 3 fertilizers-1 and 3 into 5 repeats. So, this becomes F 0.05 to 12. So, we are having one F value that is 0.92 let us say this is the F distribution this is F alpha a-1 a into n-1 my value is lying somewhere here 0.92 this is the critical value. So, this is the acceptance region and this is the rejection region and this is the critical value. I do not have this number with me but I am saying that this 0.92 is falling in the acceptance region how may I say that I am also having the P value of 0.424 P is equal to 0.424. What is this P value telling me? So, what this tells me is it is the area under the curve actual area under the curve. So, this area is 0.424 whereas the region beyond the critical value for alpha is equal to 0.05 is this value this value is 0.05 and this value is 0.424 obviously to cover an area of 0.424 as the area under the curve I have to shift the F value further up or towards my left so that more area is covered. So, obviously the F value is not lying in the critical region or it is not lying in the rejection region. So, I have to accept the null hypothesis. What is this 0.05 and what is this 0.424? The alpha value of 0.05 tells that the probability of the type 1 error is 0.05. If you choose alpha as 0.05 then you are permitting 0.05 chance of wrongly rejecting H0 when it is true. In our case the P value is 0.424 that means the probability of wrongly rejecting H0 when it is actually true is as high as 0.424. Obviously with this high probability you cannot reject the null hypothesis. In other words you have to accept the null hypothesis and say that there is no difference between the fertilizers in influencing the crop yield. When you are permitted a value of 0.05 you are getting a value of 0.424. So, if on the other hand if I had got a value of P as 0.001 then that F value is firmly lying in the rejection region and so I will say that there is only a 0.001 chance of wrongly rejecting the null hypothesis. So, this is very important. So, if the P value is pretty high and it is greater than the critical value you accept the null hypothesis. If the P value is very very low and it is lower than the specified alpha value then you reject the null hypothesis. It depends on where the F value is lying whether it is lying in the acceptance region or it is lying in the critical region. Why do we need to specify the P value when you know the critical value and you know whether your statistic is lying in the acceptance region or in the rejection region. When the P value is also given or estimated then you know by what margin the null hypothesis was accepted. For example, when P value was 0.424 you could accept the null hypothesis pretty comfortably. If the P value had come to 0.051 or 0.049 then you know that the null hypothesis was pretty close to getting rejected. So, this will tell you whether you have to choose a different alpha value to come to a more firmer whether you want to choose a different alpha value to come to a better conclusion. So, the P values are also very important and I am sure that you would have come across these P values reported in several research papers. So, this analysis indicates that there is not enough evidence from the experimental data to contradict the initial hypothesis that the tomato yield does not depend on the brand of fertilizer. I am not sure how many of you followed this long sentence. Simply put the tomato yield does not depend on the fertilizer brand used. So, you can tell the farmer look that your data does not support that the fertilizers are having an effect. Well, the farmer has tried a lot and invested a lot of effort in growing the tomatoes from 5 different fields. Let us hope that he makes sufficient profit. So, the next problem is to obtain a 95% confidence interval for the different treatment means. Another important and interesting exercise would be to find the 95% confidence interval for the difference between the pairs of treatment means. Even though we have concluded that the fertilizers are not being effective, we can also construct the confidence intervals as a matter of exercise and the reason for that is quite simple. There may be certain problems where there may be a difference between the treatments. So, by constructing the confidence intervals, you can find out which of the difference in treatments are significant in which of the cases one fertilizer is different from the other. For the present problem, we have already concluded that the fertilizers are not different in their effect on improving the crop yield. So, the fertilizers are pretty much the same. So, now we are going to construct the confidence intervals for the different treatment means mu A, mu B, mu C. The treatment mean for fertilizer A, treatment mean for fertilizer B, treatment mean for fertilizer C that we are going to do. Then we are also going to construct a confidence interval for mu A-mu B, mu A-mu C, mu B-mu C and so on. These confidence intervals are very important and useful because if there was indeed a difference between the treatments, then we can identify using the 95% confidence interval for the difference between the treatment mean pair. So, let us see how to do it. The important thing is what is the value of S you are going to use. The simple suggestion is instead of S, you please use mean square error to the power of 0.5, okay. That is the square root of the mean square error. The square root of the mean square error is also referred to as the standard error, okay. And please note that the mean square error was based on A into n-1 degrees of freedom which means we calculated the mean square error based on 3 into 4, 12 degrees of freedom. We included all the repeats across all the treatments when calculating the mean square error. By pooling the mean square error, we are getting a better estimate of the standard error. So, the degrees of freedom should be A into n-1, okay. This is an important thing. Even though we are talking about individual treatment means, you may be tempted to use the degrees of freedom as n-1 for each treatment mean. The standard error was based on the pooled estimate involving all the repeats across all the treatments. So, we have to use A into n-1 as the degrees of freedom in the t tests as well as the confidence interval test that are to follow. So, here the confidence interval for the treatment means the formula is pretty simple, y bar i dot-t alpha by 2a into n-1 square root of msc by n less than or equal to mu y less than or equal to y bar i dot-t alpha by 2a into n-1 square root of msc by n. So, this is the confidence interval for the treatment mean and we know the value of alpha which is 0.05. The t test is based on A into n-1 degrees of freedom. The t value is calculated based on A into n-1, okay. So, there is nothing more that needs to be said n of course is the number of repeats. So, which is 5 per treatment. So, we put in the value of 5 here. Please do not put 5 into 3, 15. Please do not put 15. Please put number of repeats per treatment. And here luckily all the repeats are same in number for all the treatments. So, all we now need to do is look up the t table and find the probability t 0.05 by 2, 3 into 5-1 that is t 0.2512 and that value comes to 2.179. You may verify that and square root of mean square error by n, for mean square error you use 23.9 which is what you found from the analysis of variance table. Mean square error for this case was 23.9. So, you plug in 23.9. Since it is a mean square you have to take it inside the square root and n is equal to 5 and this would be approximately okay maybe 5 and then square root of 5. Anyway it is 2.186. Answers look correct and we just multiply 2.179 corresponding to the t value square root of m s e by n is 2.186 we multiply the 2 and we get 4.764. And all we need to do is take the treatment mean value which for the first mean was 32.2 if I remember correctly. So, 32.2-4.764 less than or equal to mu 1 less than or equal to 32.2 plus 4.764. So, that comes to 27.44 less than or equal to mu 1 less than or equal to 36.96. So, this is the 95% confidence interval for the first treatment mean for fertilizer A. Similarly we can do for the other 2 treatments all you have to do is make sure that you put in the correct treatment mean value here and you will get 23.24 less than or equal to mu 2 less than or equal to 32.76. And for the third mean you get 25.24 less than or equal to mu 3 less than or equal to 34.76. And we can also do a t test the null hypothesis is mu i-mu j is equal to 0 or mu i is equal to mu j the alternate hypothesis is mu i-mu j is not equal to 0. In other words there is a difference between the treatment means. So, we get the t not as y bar i dot-y bar j dot-0 divided by square root of sigma squared by n plus sigma squared by n. We have already gone through the reasoning for the use of sigma squared by n and sigma squared by n. I hope you can recollect the discussion if not I request you to go back to your earlier lectures and look at them. We are using 0 here because the null hypothesis states that there is no difference between the treatment means. So, whatever value of null hypothesis you use you plug in here. I can also say that the difference between the treatment means is exactly 2 okay 2 kilograms the difference between that is a speculation that can also be done. If you can say that mu i is equal to mu j I can also say that mu i is equal to mu j plus 2 I can say anything. So, if I had said mu i-mu j is equal to 2 then the alternate hypothesis would have been mu i-mu j not equal to 2 then instead of putting 0 I would have put mu i-mu j as 2 here. So, do not get confused if the problem statement is slightly different from what you are normally used to. If you understand the concepts then you can handle any problem. So, we are getting the t0 value actually this should be small t0 for the simple reason that we are using the treatment means values. So, let me make the correction here I hope it is not too difficult and yes it is small t and this is I hope not in bold okay. So, couple of corrections I have made for the simple reason the sample values are known. So, you can calculate the statistic immediately. So, I am using small t0 as a definition of an abstract random variable I have to use capital t0 once the values are known the random variable also has a value. So, I should use small t0 another thing is I removed the bold the bold look better to highlight the importance of the formula but the bold also has another significance sometimes it may represent vectors or matrices. So, I do not want you to confuse these bold phase notation or font for the formula and confuse it with the matrices. So, I am just using the normal font here and instead of sigma squared I can use msc and since it is sigma squared by n plus sigma squared by n it is 2 sigma squared by n or 2 into msc by n right. So, once I get the t0 value I can check the hypothesis I can also carry out the confidence interval for the difference in means again these things are bold I thought I changed them into normal font I have not now I have done it change it into normal font. So, I can carry out the t test in the usual way and then seen whether my t0 value was lying in the acceptance region or in the rejection region. So, then you can either accept or reject the null hypothesis I leave that to you as an exercise let us move on to the confidence intervals. So, again I am using a into n minus 1 degrees of freedom then I have this formula. So, I plug in the numbers into this formula and for the first treatment mean difference mu 1 minus mu 2 I plug in y1 dot bar or rather y bar dot yeah. So, I am plugging in y bar 1 dot minus y bar 2 dot that is what I have done here y bar 1 dot is 32.2 and y bar 2 dot is 28. So, that is the difference this is the t value and then you have 2 into mean square error by n. And if you look at it again everything is in bold let me correct it put in a normal font so that you do not confuse it with matrices I do not know why you should confuse but avoid the eventuality. So, the important thing is one lower end of the confidence intervals is negative and the other end of the confidence interval is positive so mu 1 minus mu 2 is going from a negative value to a positive value. So, it has something like this what this actually tells you is the difference between mu 1 minus mu 2 is not significant okay mu 1 and mu 2 are pretty much the same the reason for that is suppose you go to a station to catch a train and you ask the person between what time and what time the train is expected to come when may I expect the train useful answer may be within 15 to 20 minutes or 15 to 30 minutes I am happy. So, I can get to the correct platform in time and so on but on the other hand if I get a reply saying that the train has left 5 minutes back or the train may come in another 20 minutes then you are totally confused has the train already left or is it going to come in another 20 minutes. So, it is like having a confidence lower bound of negative value and the confidence upper bound of positive value then you really do not attach any significance to that statement of train has already left in 5 minutes or it is expected in 20 minutes. So, similarly you do not have any reason to give any significance to these confidence bounds and you say pretty much mu 1 is comparable to mu 2. So, we can do the same thing for the second and third treatment means let me again do the usual thing of converting it into normal font control I control B. So, that has taken care of that. So, you can see that again for mu 2 and mu 3 you are having a negative lower bound and a positive upper bound and so you have to conclude that mu 2 is comparable to mu 3. Let us do for the third and first treatment means and here again we find minus 8.94 less than or equal to mu 3 minus mu 1 less than or equal to 4.54. So, for all the combinations mu 3 minus mu 1, mu 2 minus mu 3, mu 1 minus mu 2 we had a lower bound which was negative and the upper bound which was positive and hence all the treatment differences were insignificant and all the fertilizer treatments were similar to one another. This was the same conclusion we came by using the analysis of variance and the F test. So, again it makes sense to prove your conclusion by different means if you are working in a company and your boss asks you to carry out this exercise if you show the results to him by 2 or 3 different means he is going to be pretty impressed. On the other hand if I have made a mistake in the ANOVA I can catch the mistake in the confidence interval or if I had made a mistake in the confidence interval I can immediately identify it because the ANOVA told me that there is no difference between the treatment means and if I for example get plus 2.54 and plus 10.94 then mu 1 minus mu 2 is indeed different okay then I know something is wrong the earlier treatment the earlier F test told me that there is no difference between the treatments and now this confidence interval is giving me a significance. So, something must be wrong somewhere. So, I will just go and look at my calculations and then I will find instead of putting minus 2.54 I have put plus 2.54 I will make the correction and then I will be happy. So, this way you please try to do the problem in different ways and make sure that you get the correct answers or disclaimer all problems including the following are original and fictitious they are not based on any real situations. In a school there are 4 teachers who teach mathematics there are 4 sections of class 10 in that school well that 4 sections seems to be on the smaller side I have seen schools where there are even as many as 10 sections. Anyway we will take only 4 sections of class 10 in that particular school and the headmaster of the school wants to find if there is any difference cost in the average marks obtained by students due to teaching methodologies of the different teachers right. So, let us look at the problem statement closely there are 4 different teachers they have 4 different teaching methodologies usually no teacher teach teachers in the same way as another teacher. So, all the 4 of them would have their own style of teaching their own style of examination and so on and so the headmaster of the school wants to find if there is any difference in the average marks obtained by the students due to the different teaching methodologies of their teachers. So, he carries out a blocked design so the headmaster may be is a statistician so he carries out a blocked design the students in each section are tested under marks recorded after each teacher completes his time in that section. So, each teacher teaches carries out exam collects the marks choose an alpha value of 0.05. So, you have to identify the factors is it factor or factors and blocks the second question is will the headmaster ask these 4 teachers to teach all the sections or assign 1 teacher per section will he say teacher A go to section 1 teacher B go to section 2 teacher C go to section 3 teacher D go to section 4 will he say like that or he will send all the teachers to all the sections. And what precaution will the headmaster observe when assigning the teachers to the different section state the null and alternate hypothesis the results are given partially in the following ANOVA table completed. So, you do not have to do any back breaking calculations with the calculator and make mistakes well I may make mistakes when I am doing these kind of calculations. Luckily for us the ANOVA table is partially filled up and given to us we had to just complete the ANOVA table and importantly what does the headmaster conclude. Well the problem does not end there it continues present the ANOVA table if blocking had not been used not a difficult problem okay if blocking was not there how would the ANOVA table look like if the headmaster had not considered blocking then what conclusion he may have drawn. So, this is the ANOVA table we are given the source of variation sum of squares degrees of freedom mean square f f 0.05 degree of freedom 1 degree of freedom 2. So, everything is laid out fortunately all the sum of squares are given it is not difficult to find the degrees of freedom for the treatments the treatments are what and what are the blocks blocks are different sections you are having 4 blocks or 4 sections. So the degrees of freedom for the blocks would be 4-1 3 treatments you are having again 4 teachers and you are having 4-1 3 degrees of freedom for the treatments. Then error let us first look at the total degrees of freedom the total degrees of freedom would be 15 because the total number of data points would be 4 x 4 16-1 for the global average. So, 16-1 is 15 degrees of freedom for the total. So, you are having 3 here 3 here so 3 plus 3 6 you are having 15 here. So, 15-6 is 9. So, 9 degrees of freedom for the error 3 degrees of freedom for the blocks and 3 degrees of freedom for the treatments that is the typo let me correct it. So, let me put blocks instead of box. So, that is what we have and then we know the sum of squares we know the degrees of freedom we can calculate the mean square we can know the degrees of freedom for the error we know the mean square for the error we know mean square for treatments mean square for error and so with these 2 we can find the f value compare it with the f 0.05 degree of freedom 1 degree of freedom 2 numerator degree of freedom would be degrees of freedom with treatments denominator degrees of freedom 2 would be degrees of freedom associated with the error. So, identify the factors that is only one factor namely the mathematics instructor the 4 different teachers contribute to 4 different levels of mathematics instruction or 4 treatments of mathematics instruction identify the blocks the 4 classes need not be identical to one another one class may be having very mischievous students another class may be having more number of studious students third section I do not know may be having students who have come from different school and so they have absolutely no idea what is going on the 4th section may be a mix of everyone anyway. So, we have to then conclude that the sections are definitely not identical to each other how can each section be identical. So, we have to consider them as blocks. So, when a teacher is teaching in each of these 4 classes I should say 4 sections let me be precise he is instructing in 4 different blocks. So, the classes or sections I just converted the classes into sections to be more precise the sections themselves may contribute to additional systematic variability. So, we are going to have 4 different blocks 1 block corresponding to 1 section next question is quite interesting will the headmaster ask these 4 teachers to teach in all the sections or assign 1 teacher per section well it is easy and tempting for the headmaster to assign teacher a to section 1 teacher b to section 2 teacher c to section 3 teacher okay teacher c to section 3 teacher d to section 4 that is easy, but rather than doing that he should ask his 4 instructors to go and teach in each of the 4 sections. So, each block or each section is receiving instruction from all the 4 teachers okay please note this each section will receive instruction from all the 4 teachers. If he assigns 1 teacher per class then it is possible let me again make the correction if he assigns 1 teacher per section right. So, if he assigns 1 teacher per section then it is possible that despite the differences between the teachers the students performances may get averaged out. Hence the headmaster may not be able to distinguish between the methodologies adopted by the 4 teachers. So, what I am trying to say is if the instructor go and teaches only 1 particular section then the headmaster may not be able to distinguish between the teaching methodologies of the 4 teachers. For example, the teacher adopting the best methodology may be going and teaching in the class where the students are unable to follow because they have come from a different school. Then the students performance may not be as good. Then the teacher which is who is using an outdated methodology or teaching a very yeah an outdated methodology goes and teaches in a section having very bright students then despite that teaching methodology the performances may be good and it may be comparable to the previous section I just talked about. So, then you cannot really distinguish between the first and the second teacher. So, this may lead to I mean erroneous conclusions. So, to avoid this we do the concept of blocking where we ask all the teachers to go and teach in all the sections. Now the third subdivision to the question is what precaution will he observe when doing these tests. He has carried out the blocking now he has to do the randomization. So, each teacher will teach in all the 4 sections in a random sequence while covering the portions or syllabus okay. You cannot send teacher A to section 1 then teacher B then teacher C then teacher D. So, you cannot have A, B, C, D in the same sequence in all the sections. So, you have to randomize the sequence in which the teacher goes to each section. So, instead of teacher 1, teacher 2, teacher 3, teacher 4 every time okay. So, I have made a small change. I have put sections as A, B, C, D and teachers as 1, 2, 3, 4. Earlier I was talking as sections 1, 2, 3, 4 and teacher A, teacher B, teacher C, teacher D. But please make the switch. So, each section is A, B, C, D like that one section A, one section B that is more correct 10A, 10B, 10C and 10D and teacher 1, teacher 2, teacher 3, teacher 4 okay. So, to avoid any biased conclusions the headmaster may not even know the names of the teachers. He may simply call them by T1, T2, T3, T4 alright. So, T1, T2, T3, T4 for the first section. For section B it is some other sequence. For section C it is a different sequence. Section D it is again a different sequence. So, he has randomized the order of teaching in each section. So, state the null hypothesis. The average performance of the students in mathematics is the same with all the 4 teaching methodologies okay. All teachers are sincere. So, rather than gauging the teaching skills of the teacher it is better to evaluate the teaching methodology of the teacher. So, that if one methodology is found to be better than the other then the teachers may be asked to follow that particular effective methodology. So, I would like to state that I am not comparing the teachers but I am rather comparing the teaching methodologies adopted by the 4 teachers. So, here when we state the null hypothesis the average performance of the students in mathematics is the same with all the 4 teaching methodologies. H0 would be mu T1 equals mu T2 equals mu T3 equals mu T4. The alternate hypothesis would be at least one teacher's instruction methodology is different from the others. So, we have to next complete the ANOVA table. So, I will leave the ANOVA table in front of you. You compare with your answers and see if they match. So, I hope you got the numbers correctly. I am not going to spend too much time here. We have gone through this several times. The degrees of freedom would be A minus 1 into B minus 1 for the error which would be 4 minus 1 into 4 minus 1 which is 3 into 3 which is 9 and the mean square treatments would be 38.5 divided by 3. If you take it as 39, 39 by 3 is 13. So, 12.83 looks correct. 8 divided by 9 is 0.89. And if you look at the F value it comes to 14.42 whereas the critical F value if you may call it is only 3.86. So, the actual F value is much higher than the critical F value. So, obviously the F value is lying in the rejection region. So, you have no qualms in rejecting the null hypothesis. If somebody says to you did you reject the null hypothesis by a very comfortable margin or did you reject it just narrowly, you can report the P value. The P value is 0.00088 which means that the P value is very, very small and so the probability of type 1 error is also very small. The probability of wrongly rejecting the null hypothesis is 0.001. So, 1 in 1000 or even lower than that. So, you feel comfortable, the different teaching methodologies are indeed having an impact on the students average marks. So, let us see which is the better teaching methodology and then adopted uniformly across all the sections by all the teachers. So, where are we now? The P value of 0.05 corresponds to 3.86 and the P value of 0.001 corresponds to a F value of 14.42. So, what does the headmaster conclude based on this P value and seeing that the F value is lying well in the rejection region, he rejects the null hypothesis. He concludes that there is indeed a difference between the teaching methodologies. Well, it is a pity that we do not really know the marks obtained by the students in different sections and then different teaching methodologies. But on the other hand, we did not have to do any calculations. So, if you get something, you have to lose something. So, we do not have the marks in front of us. We got 2 F values 14.42 corresponding to a P value of 0.001 and then F value of 3.86 which is corresponding to P value of 0.05 with 3 and 9 degrees of freedom. So, we can see that the reported probability or the area under the curve beyond the F value of 14.42 is pretty low, 0.008751 which is matching well with 0.0088. So, we are having validation of our calculations. So, it is well into the critical, sorry, it is well into the rejection region. So, the actual F value is greater than the critical value and hence we reject the null hypothesis and conclude there is a difference between the teaching methodologies. If the blocking had not been used, what we would do is combine these two. The sum of squares of blocks would have come into sum of squares of error. So, 82.5 plus 8 would have become 90.5. The degrees of freedom here would have added on to the degrees of freedom here, you would have got 12. So, 90.5 and 12, okay. The treatment sum of squares and treatment degrees of freedom would not have been altered. So, you would have 90.5 and 12. Let us see what happens. Everything else is unchanged, 90.5 and 12, 38.5 and 3 are unchanged for the treatment sum of squares and degrees of freedom respectively. The total sum of squares and the total degrees of freedom are also unchanged. So, now when we are comparing the treatments to the error, we get a very surprising and even shocking result and we get an F value of 1.7. So, we get F value of 1.7 and then a P value of 0.22, a shocking result because it is now lying in the acceptance region. That means you have to accept the null hypothesis. The probability of making a type 1 error that means wrongly rejecting the null hypothesis when it is actually true is 0.22. So, based on the test carried out without blocking, you would have accepted the null hypothesis and concluded that there is no difference between the teaching methodologies. All the teachers are doing an excellent job and let them continue with their own unique style of teaching to the detriment of the students' performances, okay. So, this shows the importance of blocking. Without blocking, the tests were not that sensitive and so you had to accept the null hypothesis. By using blocking, you made the tests more sensitive and you are also able to detect the difference between the teaching methodologies and hence you could come to a good conclusion. So, you can also go and tell the management if you are working in an industry or in any job that the proposed modifications are definitely different. So, there is a change created. The next important question is whether the change which was proposed actually improved the performance or decreased the performance. That is another thing and again tests are available to compare between the different treatment means and see which one is better than the other. We saw that in the case of the confidence intervals in the previous example. So, this completes our discussion on the example problems. I hope that you not only understood the concept but also enjoyed doing the problems. Enjoying the problems is very important just as doing them correctly. Let us now move on to design of experiments involving two or more factors. Now, we are getting into the business end of the design of experiments. Thank you for your attention. Looking forward to meeting you again.