 अदिलींग भी मेटेरिल्स ड़ाटा, वी आप खम लोंग वेग, वी आप लोंग दब बेसिखष अब स्टेटिस्टिक्स, वेरी वेरी बेसिखष अब स्टेटिस्टिक्स, नहीं गरे तीटेys, दें वी सोग दवान अप्लिकेशन एडिया, भी लिए औग्छे, चिछा प्र। यह तो वो कोगे रहा है, और और वेरी वेग ज़वान नहीं ग़ाटा जोगे शप should make the choice. अब यह तीटेटिस्टिक Michael, नहीं वेरी ज़ेटिस्टिक्स, आप ज़ोगी वो टीटेरिज़ा, में मिस्थे वो नहीं। is natural process of doing it and it is not some process that has been devised to fit into the problem, rather the problem it is a physical process. you will realize that it is a very natural process of doing it and it is not a sum process that has been devised to fit into the problem rather the problem itself give the rise to this particular methodology and that is how analysis of variants we are going to introduce then we will develop it as a time goes in a more appropriate कि लियेत हर्यिकल मे muddy के दिन को लितके पदेरं than को उडप्टाझ़ली दिन कहारत्, एक में than कोई से चमब लगातगाया दे वेत्टी लहातय लगाता? तुगी ओर तरह किश्ट में करीव्टी के तूभाणाझा कर्चट्चाया को ज़े कोई ठूँँअप्सगा, तरह कोई तूँँआचाई एक गहाँश्टना मैं, ग़ज़िआस्टी ऻएक्छंदा और लबऱटरी रूपौक살, और वहुँएकषर पवादene, आखज्ठा में लवीरिन लगच्चं करते है। आखच्चंच बवाच्च वखुआते है। जो तो तो मूल लोगी एक पारा आगच्चंटी करते है। show minutes to show certain quality days it has reduced. . . . . . . cando to give you . तस्टि की नहँला They are where it is being done. If you want to do this to begin with we must involve more than one chemical testing laboratories to test the samples and then we need to see statistically test that all samples are giving the same values. If it is the case if we find that we did the test in different chemical laboratories and all laboratories results are statistically same then we can take the combined set of all the chemical standards sorry chemical analysis that has been done from all the laboratories and come up with a standard. This procedure is called a round robin tests in statistics where we expect that there are at least more three or more participants in the round robin tests. All participants carry out tests under a uniform condition. The facility is uniform, the calibration is as per the participants norm and the uniform samples are given for testing. Then the data is further analyzed to see that the statistical variation in test results are not due to the different laboratories but they are due to the simple statistical variations. There are no extreme values in it and then we finally come to a standards by putting all the data together where all the participants are statistically same and then in this case at least we remove the outliers. Please remember our earlier discussion on outliers which are the extreme values there we had said that blindly removing the outliers is not a solution but in this particular case the the way has been to remove the outliers and then calculate the chemical standards. So let us look at the present case. There is an alloy having a 12 alloying elements and we have to develop the standard for 8 major elements as given here as aluminum, chromium, cobalt, hafnium, tantalum, molybdenum, titanium and tungsten. The analysis it is decided that everyone should do analysis through a process which is called inductively coupled plasma optical emission spectroscopy ICP OES. Three experienced laboratories were chosen for this analysis because you have to analyze 8 elements in a 12 alloying element alloy. So it needs a considerable experience so 3 laboratories were chosen. Let us call them lab A, B and C. The calibration method was left to the laboratory. They have to follow their own individual calibration standards and methods. The sample solutions were prepared by the manufacturer and distributed to each laboratory and each laboratory had to report a result on 3 replicated analysis. I am going to say more about this replica and repetition because there is a very common misunderstanding about the 2 but we will talk about it when we talk about design of experiment. So here I am saying that 3 replicated analysis it means that not the 3 observations of the same analysis you have to do 3 analysis itself that is called 3 replicated analysis. See here I am giving you the table of the results for the 8 elements. This is in PPM. So these are the results that laboratory A which is in green, the other is in magenta and the third one is in yellow for easiness to understand it. And these are the chemical analysis values they have found using the same ICP OES analysis. How do we do statistical analysis for this? Well what are we looking for? Number one we want to see that all laboratories perform the same. All the laboratory performance are same in other words statistically they are closed. If that is the case we would like to mix all the data and see if there are any unusual extremities there and usual observations there. If there are no unusual observations then we find the statistical limit within which the analysis would lie most of the time. So we will find an interval estimate of that. Here the emphasis is on the closeness of all the laboratories, unusual observations and most of the time they lie in a statistical interval. We are not going to perform the whole analysis here. We are going to look into the first part of it and that is the closeness of the all the laboratories is what we are going to look into because that refers to analysis of variance. So this is our data. Typically this data is denoted as Xij where the subscript i refers to the ith laboratory and subscript j, subscript j refers to the jth observation of the ith laboratory. So for example if you look at this value 5.530 it will be called X13 because it is looking into the sorry it is the X33, it is the third laboratory looking into the third observation or you can if you look at all the observations together then it says that it is the third laboratory third observation. Here it will be second laboratory 1,2,3,4,5th observation. So likewise it is calculated sorry I think I have mistaken the first what I said is correct it is the second laboratory third observation. So this is X22, this is X23, this is X13 like this all the distributions are the Xij values are calculated. Let us think in a very logical way what should be the statistical methodology. Let us define few quantities. Let us call X i dot as an average of the laboratory. We are averaging out over the three observations of a single laboratory i. So as in the previous case if we take an average of these three observations it will come to X1 dot. If you take the average of these observations it will be X2 dot. Please remember I am not taking the whole matrix of observation. I am looking at each element separately. So let us not confuse about it. So X i dot is the average it is a laboratory average X dot dot is a grand average. Then I defined within laboratory sums of squares. So I call it sums of squares within SSW which is summation of within the laboratory. So Xij minus the laboratory average whole square. So it is sum of squares of within the laboratory because we are doing it within the laboratory and taking summation over all the three laboratories. So this is sum of squares within laboratories. The degrees of freedom will be this will have a degrees of freedom of 3 minus 1 because there are three data points and there is one average which has been calculated multiplied by the three different laboratories. So it is 3 times 3 minus 1. Then we have between laboratories sums of squares or sum of squares between SSB which is 3 times the difference between the laboratory average and the grand average. So laboratory average represents the laboratory and grand average represents the whole putting all the three laboratories together. So this is called between laboratory sum of squares. So it is SSB its degree of freedom is 3 minus 1. Now what we say is if the variation between the laboratories and variation within the laboratories are comparable then you can say that there is no statistical difference between the laboratories. It means that every laboratory will have its own variation between the laboratories there will be a variation. If these two variations are comparable then we can say that there is no statistical difference between the laboratory. So as we say we always talk of rejecting the hypothesis. So we in other words we can say that if this ratio is very large it means that between the laboratory variation is larger than the within the laboratory variation then we say that these laboratories are statistically different. I think logically this argument is quite appealing. This can be written in this table. So for example for element aluminum we have between labs sum of squares which is given here its degree of freedom is 3 minus 1 which is 2. Mean square is this you please remember we have to divide it by the degrees of freedom that why we call it a mean square because this is sum of square. So this is mean square which is this divided by its degree of freedom within laboratory is so much with 6 degrees of freedom mean square is so much. So if you take the ratio of these two it is 0.864. This is what I called a f value don't worry about p value for time being. So likewise I have shown you typically only the partial 4 elements and I have shown you that how the values can be larger this is 4 points something but you look at titanium this as the largest value. So you can say that here is somewhere probability the probably the laboratories are different at least there is one laboratory different from the rest of them. This is called analysis of variance and what you saw here for each element this table which has been made is called analysis of variance table. It is a very simple logical argument to understand that the value is given by two laboratories or more than one laboratories are same or not. So this is it finds the difference the variation between laboratories it finds the variation within laboratories and then if it finds that between the laboratory variation is larger than the within the laboratory variation then there is a chance that your one laboratory is at least one laboratory is different from the rest of them. So now let us go to the theory this is called analysis of variance we consider m population now we are generalizing it here we had three laboratories here we are and in this case we are going to say that we have m population and sample of size n we had a sample of size 3 here we have a sample of size m then xij represents the jth observation from the is population and we assume for time being that xij is normal with mean nu i and a common sigma square we want to test the hypothesis that all the means are same versus the alternate that all means are not same all means are not equal this is the hypothesis that we wish to test once again please remember having normal assumption normality assumption only is for convenience to derive the statistic distribution of f otherwise it is not really necessary so what i have said in the previous if f ratio is large then it indicates that laboratories are statistically different is enough however in this case we are assuming normality we again go through the notation xi dot is summation j is equal to 1 to n xij over n this is like a laboratory average so this is a is group average is population average x dot dot is a grand average which is summation over j to n and i to m xij and they are divided by nm then within sums of squares sum of squares that is sum of squares within the group can be given by xij minus xi dot whole square and then you sum it up over inj and between group sum of squares ssb can be given as n multiplied by summation i is equal to 1 to m xi dot minus x dot dot whole square this is the same procedure that we used previously now we have two estimates of unknown variance sigma xij is distributed as normal mu plus sigma square then ssw divided by sigma square will be distributed as chi square with m times n minus 1 degrees of freedom and therefore expected value of ssw by m over n minus 1 is sigma square also if h0 is true if all the means are same then between group sums of squares is also distributed as chi square with m minus 1 degrees of freedom and therefore expected value of sums of squares between group sums of squares divided by m minus 1 is also sigma square so you see that is what we said in the example the variation due to variation between the group of between laboratories and the variation within laboratories if you assume that all laboratories are same they estimate the same quantity sigma square and therefore the ratio of the two is the test statistic to test the hypothesis that all means are same because if hypothesis is true then this ratio has to be small if the hypothesis is not true then the ratio is going to be larger so under the alternate hypothesis between group sums of squares is going to be much larger than within group sums of squares by its degrees of freedom and therefore critical region is that rejected 0 if f is sufficiently large if f is very large you are going to reject the thing and remember f is a ratio of two chi square distribution which are independent of each other because all observations are taken they are all taken independent independently and therefore the between group sums of squares and within group sums of squares are independent of each other you can show it statistically also and therefore this follows f distribution ratio of two chi square follows f distribution with m minus 1 which is a numerator degree degrees of freedom divided by denominator degrees of freedom now you see here we have assumed that all populations are of the same size suppose we have xij where the i represents the population and j represents the observation and in ith observation the number of I am sorry ith population the number of observation is n i it is not say common n in that case only the formula changes the between group sums of squares becomes summation i is equal to 1 to m n i times xi minus xi dot minus x dot dot square ssw remains the same and therefore within group sums of squares divided by sigma square is chi square with summation of this sorry summation of n i minus m and between group sums of between group sums of squares it is not shown but it is also chi square with it will be a sorry it will be a chi square with summation it will be chi square with m minus 1 degrees of freedom and therefore if you take the ratio between group sums of squares divided by m minus 1 this also estimates the same population variance sigma square within group sums of squares divided by its degree of freedom also defines the same sigma square so if you take the ratio if the null hypothesis is true or then the rejection region you reject the null I am sorry you reject the null hypothesis if this ratio is large and this large because now you assume that it is a f distribution you can say that it is larger than f m minus 1 and this degrees of freedom at alpha when this happens this actually should be 1 minus alpha let me correct myself because once again if you draw the graph f is also a skewed distribution you want to have f m minus 1 summation n i minus m and you want to have this 1 minus alpha because this area will be alpha and therefore this area is going to be 1 minus alpha and therefore this value is going to be 1 minus alpha it should be 1 minus alpha so even if the there are unequal sample size for each population it still follows the f distribution still we are comparing the variation between group sums of squares and within group sums of squares we find that if all the populations are same then between group sums of squares and within group sums of squares divided by their appropriate degrees of freedom the ratio should be very small if the ratio is large it means that between group sums of squares is large and therefore you reject the hypothesis so let us quickly summarize we started with the developing chemical standard case study for analysis of variance we found that analysis of variance is the case of comparing two estimates of sigma square to test the hypothesis of equal population means they are between group sums of squares and within group sums of squares we discussed two cases one is called a balanced design when the you have each population has the same sample size there is called unbalanced design if the population have unequal sample size and as a whole this case is defined as one way analysis of variance because we have taken into account only the difference in the in the in the row values only thank you