 Hi, I'm Zor. Welcome to UNISOR education. This lecture is part of the advanced mathematics course for high school students. I recommend you to go to UNISOR.com to watch this lecture, because it's just part of the whole course and obviously all lectures are interdependent. So take the course. Well, in this particular case, I would like to continue talking about statistical evaluation of Bernoulli experiments. In particular, I would like to use something which we have studied before called sample variance to evaluate, much better evaluate the variance of the sample in order to more precisely evaluate the margin of error. It's very important actually, because more precise evaluation of the margin of error or with a given margin of error evaluate more precisely how many experiments you need to achieve this particular margin of error. So that's all very important and the difference between my old approach where I was just evaluating very crudely the number of experiments or margin of error just based on number of experiments without regard of what kind of experiments I have, without regard to the data. So I expect that evaluation would be much more precise and better and that's actually how people do it in real life more often than crudely the variation we were talking about before. And I will talk about exactly the same three problems which were presented before a couple of lectures ago and I was solving these problems based on, as I was saying, a crude methodology. Crude methodology actually was, let me just remind you that if you have let's say a random variable like C which takes value 1 and 0 with probabilities p and 1 minus p then to evaluate p for instance with certain margin of error we have conducted n experiments and the results of these experiments are also 1s and 0s, right? So their sum divided by the number n is actually a frequency of occurring this particular event, right? Because it occurs and the x is equal to 1 and it doesn't occur x is equal to 0 so sum of 1s is actually number of occurrences divided by n is frequency and frequency is something which we were using to define the probability, right? This probability p, it's kind of a limit of the frequency as number of experiments goes to infinity. So we have conducted n experiments and now we are saying that, well, yes, we understand that if n goes to infinity that frequency would probably go to p but since we are not in an infinite situation we are in a finite situation the question is how close this is to this Well, to evaluate this we have considered an independent identically distributed variable c1, cn identically distributed of these ones and consider this particular random variable, eta so basically this is a single value of this random variable, right? Now, if this random variable which we can basically evaluate in some way if this random variable has a very small variance then the frequency with a high probability lives within the vicinity of the mean of this Now, mean obviously is the same as p because mean that everyone is p so it's n times p divided by n so it's p and that's why we can say that this particular value of this particular variable, random variable might be close to its mean value if variance is small Now, variance of this we were calculating before it's p times 1 minus p divided by n so variance goes to 0 as n goes to infinity so it's quite reasonable to assume that with large n variance would be small and variance is basically a measurement of how far from the mean our random variables actually lie so we were approximating with a normal variable this thing, because it's sum of independent identically distributed variables so it's distribution very close to normal so we can consider a normal variable with the same mean and the same variance and then I made a very crude evaluation of this variance since p times 1 minus p is always less than or equal to 1 quarter when p is from 0 to 1 then the whole thing is evaluated this way so now I know basically the variance of our almost normal variable and know its mean and since the variance in this particular case is evaluated from above using only n then I can use something like a 2 sigma rule 2 sigma rule and say that with probability 0.0 what was it? 9 to 5, 45 my probability of my eta to be within vicinity of this mean, this probability to be vicinity less than 2 sigma so this probability is higher than 0.95, 45 so that's a variation of margin of error and the certainty level we can say that our random variable lies within this margin of error from its mean which is p so I can say that whenever I have calculated this based on my experiments with probability 0.95, 45 it's very close to the p which I would like to evaluate with margin of error equals to 2 sigma now if variance is this then 2 sigma would be 1 over 2 square root of n so it's square root of this standard deviation so that's my variation based on this relatively rude margin top margin for my variance now this is good if p is very close to 1 half if it's equal to 1 half, it's equal to 1 fourth, right? but if p is small this becomes a very, very crude evaluation and it results in the necessity to have a very large number of experiments to achieve the same margin of error which we desire so let me now switch from this to more precise evaluation of my variance using this sample and you will see what kind of results we will get we will get much better results with small p's and these are my three little problems which I have presented before using the crude evaluation of the variance and now we will do it with more precise evaluation based on the sample which we have received so now if we have received a sample x1, etc. xn what we do first to approximate the variance first we do approximation of mean that's what you know and then we do the approximation as square equals to x1 minus m square plus, etc. plus xn minus m square divided by m so this is my approximation of the variance right? now we have already discussed that if this is Sn Sn is actually the evaluation of the variance which is not exactly having the mean value the same as the unknown variance so it's kind of skewed a little bit right? biased as we say and Sn minus 1 which is the same numerator but the denominator is n minus 1 this is unbiased evaluation so we will use this evaluation of the variance now let's use this particular variation in our three problems so the problem number one is I don't really need this I will use this real estate for the problems so the problem number one is we have 10,000 experiments we are manufacturing parts and the experiment is whether the part is defective or not and let's say we've got 300 defective parts now I would like to evaluate with certainty level 0.9545 what is the probability of my part to be defective based on this particular sampling well first what we do is calculate the arithmetic average of this and this is 300 divided by 10,000 which is 0.03 so this is basically our evaluation of the probability of manufacturing defective part 300s question is how good it is what's the margin of error of this particular evaluation now what is m well as I was saying before m is equal to x1 plus etc plus xn divided by n this is a single value of this variable where all c's are identically distributed and independent variables exactly as our unknown one so now I would like to evaluate the quality of approximation my probability with this number m based on the variance of eta so let's calculate so the sample variance and I will put n minus 1 here in this case now what is this well we have experiments 300 times when our variable took the value of 1 so its difference square from the mean is this one and other 9000 sorry 700 we have a value of 0 and we have to divide it by 9999 because this is n minus 1 so it's 9999 so this is our evaluation of variance of our variable c original one now how about variance of eta now we know it's the variance of c divided by n where n is number of experiments because eta is c1 plus c2 etc plus cn divided by n and the variance of sum is equal to 1 over n square in this particular case and then n variances of c so it would be 1 over n so now what I'm actually interested in is not really variance of eta but standard deviation of eta so which is square root of variance of c divided by n so we have to make the calculation of this one and I think I did it it's 0.0291 so this is 0.97 times 300 which is almost like 280, 290 something like this and this is very close to 0 so it's approximately that's what it is 291, 10000 so what I have to do now is I have to divide it by 10000 and extract the square root so that would be 0.0291 divided by 10000 and have a square root and that my calculations show is 0.0017 so this is sigma now I need this certainty level which is corresponding to 2 sigma in normal distribution so 2 sigma of eta would be 0.0034 so what I can say right now is that with probability with certainty level I would rather say 0.9545 I can say that unknown probability of my part to be defective is within 0.0334 and 0.0266 now let me remind you that the margin of error when I was using a crude evaluation of my variance in the first way how I solved this problem without using the sample variance just using this so n is 10000 so it's 1 over 200 so double sigma is 100 so in that old evaluation I had 0.0204 which is a wider margin of error so as you see my empirical evaluation is really with this level of certainty is really much better if I'm using the more precise evaluation of the variance so if I do it more precisely using the sample variance I can say more precisely where is my real probability is so more precisely evaluating of the margin of the variance leads to more precise evaluation of the p this is a wider interval this is a more narrow interval and that's definitely is desired so with the same certainty level using more precise calculations gives more precise evaluation of unknown probability p alright? now the second problem is basically the same as this one all I have to do is what if I would like to have my evaluation even twice smaller than the previous one the previous one was 0.034 I would like to have it so I would like this evaluation 2, 8, 3 so I'm adding 17.000 plus and 17.000 minus to the mean value so I'm making my interval a narrower twice as narrow as before question is what's the certainty level I have to attribute to this evaluation well so now this was before it was 2 sigma 0.034 right? that gave me this particular certainty level here what I'm looking for I'm looking for 34 so I'm looking for this margin of error so I'm narrowing interval from 2 sigma around mean to 1 sigma around mean and as you remember the certainty level of this is obviously smaller if I'm making a more narrow interval my certainty level obviously should be less so this is the certainty level for this particular margin of error so in this particular case I get number of experiments level of certainty and I have determined the margin of error the second problem was I don't have level of certainty I define my margin of error and then I derive the certainty level as being this one it's smaller obviously now the third problem was to vary the number of experiments if I have defined for myself level of certainty and margin of error so let's do this so as before and experiments produce certain results now I would like with this with this level of certainty define how many experiments I should get to get margin of error equal 1000s so before if you remember my margin of error was 0.0034 which is more than times greater than this so I would like to have a very narrow interval but I would still like to have this level of certainty which necessitates increase of the number of experiments now let's do the very rude calculations of how many experiments I need without using the sample variance I think the lecture the material which I was doing in the previous lecture when I introduced these two problems first so if I would like this and I would like this certainty level then my sigma is supposed to be half of this so delta should be 2 sigma which means 0.005 that's something which I would like to have which is supposed to be this right so what's the n well it's 1000 so it's 1000 so n is equal to 1 million alright so I need a million experiments to achieve this particular margin of error with this certainty level well it's a lot nobody is doing million experiments I bet now why is such a big number well primarily because we were using a very crude evaluation of our variance now what I'm asking is let's do the sample variance which is more precise evaluation right and try to basically reduce this number of experiments now what's the problem well the problem is we don't have experiments we have not started our experimentation we don't have data x1 et cetera xn to calculate the sample variance so it looks like I cannot calculate the sample variance and this is the only way I have to do well that's not exactly true because what we can do is we can start experimenting stop at certain level reasonable ok and then we will have some data and try to evaluate the sample variance based on these data which we will obtain and then if it's necessary we will continue experimentation to the number which is desired to get more precise evaluation so in this case I'm suggesting to the following let's just have 100 experiments alright see what our data is then based on this data we will calculate sample variance and then that would be something to start our calculations right ok so let's consider we have 100 experiments and we have 4 defective parts ok now then I define based on these my sample variance well very simply the sample variance of Xe approximately is S square 99 right I divide by 99 which is equal to what 4 times I get 1 minus 0.04 by the way 4 divided by 100 is 0.04 so that's my M square plus 96 times I've got 0 and we have to divide it by 99 so that's my sample variance of the variable random variable Xe which I have calculated somewhere ok 0.0388 ok now now what should I do now I need sigma of eta which is sum of Xe1, Xe2 etc divided by 100 so I have 2 divided by 100 and take the square root of it which is 0.0197 ok fine so this is my sample variance of my 100 experiments now which sample variance do I need well I need sample variance 1 half of delta which is 0.0005 right with this level of certainty my margin of error should be 2 sigma well obviously this is significantly higher than this one so obviously with 100 experiments we have not achieved the precision we need margin of error would be like 0.4 or something like this alright so what do we do well let's think about this way this is about 3 times larger than this this is 5, 10,000 this is almost 200 2,000 so ratio is 40 I have to reduce my standard deviation by the factor of 40 now we know that the standard deviation of eta is 1 over n standard deviation which is square root of this so it's 1 over square root of this standard deviation of xi right so this is very important which actually means that my standard deviation of empirical average which is eta is proportional to 1 over square root of n so if I would like to reduce by 40 times I have to really have 40 square more experiments so I have to conduct 1,600 times more experiments than I have already achieved because my standard deviation is proportional to reverse proportional to square root of number of experiments so if I have conducted 100 experiments and got this standard deviation to achieve this which is 40 times less I need 1,600 more times to conduct the same experiment which is 100 so it's 100 times 1600 so it will be 100 times 1600 which is 160,000 now look at this 160,000 if you remember the crude evaluation of my variance based on just number of experiments gave a million so 160,000 is much less than a million it's still a lot but first of all let's just think about it 160,000 experiments even with a very good precision we will have evaluation within 1000s so from 0.04 before we were talking about 0.03 anyway this is something which we are considering as a real approximate value so whatever we have done 160,000 times we will add them up together divide by 160,000 and we will get some evaluation this or this or something like this and what we are talking about that around this particular value the margin of error is 100 so for instance if we've got 4,800 out of 116,000 it's a round number because this is equal to 0.03 right? yeah, 300 so now I can say that it's with this particular level of certainty my probability lies around 0.03 and the margin of error is this one which is from 0.029 to 0.031 which is really relatively precise evaluation if we are talking that the probability of something is around 0.3 plus or minus 1000s it's a very good evaluation but to achieve this evaluation you need so many experiments you see now what do you think the precision would be acceptable for you? if you are talking about defective numbers for instance and if you have 0.03 like 3 out of 100 approximately now which precision would be considered to be really important for you informative if I will say that the probability of getting the defective part is 0.03 based on my experiments plus or minus 0.01 which is from 0.2 to 0.4 is it a good evaluation? I don't think so so 10,000 experiments before gave me this with a crude evaluation now even with a precision evaluation it's still using the sample variance it was 0.0034 so it's from 0.0266 to 0.0334 it's still very large distance between these if I will evaluate two different manufacturing facilities and one would be this and another would be with the same precision let's say this 0.03 so it's 3466 44 I'm not sure now, you see which one is better well I have no idea which one is better because these intervals overlap and this one is so these are overlapping now since they are overlapping I cannot distinguish one from another with a level of certain 0.9545 but if my evaluation is narrower and the situation is this, that's one and another one would be something like this now these intervals are not overlapping because this maximum is 0.031 and the minimum of this is 0.031 so intervals are like this 0.029 0.031 and 0.033 so this is my one manufacturing facility, this one and this is one another manufacturing facility so this one all these probabilities are smaller than all these probabilities well except this one but it's just considered to be just one point and definitely say that this is worse than this because in this case my probability of making a defective part is always less so that's why it's very important to have this precision and precision requires experimentation and as you saw in this relatively practical case, well 3 out of 100 that's normal I would say for a defective part, right? it requires 160 thousand experiments to have a reasonable level of precision now let me just go back to some practical situations, we have all these statistical variations in pharmaceutical industry and in some other cases and we are not dealing with 160 thousand experiments we are dealing with much less which means that the level of certainty is significantly less than 5 or something and if that's true well we have to actually make our conclusions based on these certainty levels and the margin of errors and it's not easy that's why all these statistical evaluation of let's say new drugs on the market or something like this they are not precise they are working in some cases in other cases we do not have sufficient amount of data to be certain about certain concrete results of let's say drugs or something like this so I would suggest you to read again the calculations which I am presenting in the notes for this lecture on unison.com and other than that would probably be the end of the Bernoulli statistical distributions analysis and we will go into some other things thanks very much and good luck