 Hi, I'm Zor. Welcome to Unisor Education. Continue talking about Bernoulli statistics. This lecture is part of the Advanced Mathematics course for high school students. It's presented on Unisor.com and this site actually contains notes and exams for registered students, so you can basically take it from there. It's actually better to watch it from this website because the game of the notes each lecture is like a textbook basically, so you can read it before or after the lecture. Okay. What I would like to do today is to very briefly summarize what we have learned about Bernoulli statistics and how to deal with this, and I will criticize one particular aspect of the approach which we have taken and I will try to improve this particular aspect by introducing a concept of sample variation which we did not use in the previous lectures when I was explaining how to determine the margin of error and a certainty level for your evaluation. Alright, so very briefly reminding how it was done. So let's consider that you have a random variable c which is distributed according to Bernoulli distribution, which means it's equal to 1 with a probability of p and 0 with probability of 1 minus p. Now you would like to know what this p actually is, we don't know it, and as an example we used to say, for instance we have a manufacturing facility and it manufactures certain parts and certain percentage of the parts is actually defective, a small one hopefully, and we would like to basically find out what's the probability of having a defective part manufactured by this particular facility. For instance we have two facilities and we would like to invest money in one of those, so we would like to know which one is better. So this probability of manufacturing defective facility actually makes sense to know, to be able to invest money into proper company. So this p is unknown. So what do we do to find out this p? Well, we can find it out exactly, but we can evaluate it. Now the way to do it is we have actually made end experiments, got n results, or 1s. Now what we know about this is the expectation of c is equal to p and variance is equal to p times 1 minus p. I've done it many times before to derive this to its trivial actually. So now when I have certain number of results of the experiment and each one of them is either 1 or 0 now what I also know is that the probability is actually defined as a limit of the frequency of occurrence of certain event. So in theory if we are doing certain number of experiments with this variable and this certain number is very very large, well in particular this tends to infinity, then the frequency of occurrence of this particular event should be closer and closer to this p. So the frequency is a very good estimate of the probability as long as the number of experiments is large enough. Okay, so how to find out, now if my expectation is equal to this probability, to find out this probability let me make a sample mean of these experiments. I add them up and divide by n since these are either 1 or 0, 1 when the event occurs and 0 when event does not occur. So this actually is a frequency frequency of occurrence. Alright so this frequency of occurrence should with going to infinity should be closer and closer to p and that's why m is very good a variation of our unknown probability. Great, now our next problem is to basically to prove it. Now how can I prove it? Well we can prove it in a very simple way. Now what is our value m? Well consider the new random variable which is this where every c i's is distributed exactly the same way as c and they are all independent, so independent and identically distributed as c. Now what is this? It's a new random variable, right? This one. eta is a new random variable. But what's interesting about eta is that our mean statistical mean, our sample mean m is actually a single value of this particular variable. Now what's interesting is I would like to know basically the following. What's the difference between this eta and this c? Well here is interesting point the mathematical expectation of eta is, well one n's can be outside and that's a factor and mathematical expectation of sum is sum of expectation, so it will be n of them, they are all the same and they are all exactly the same as this. So this will be n times p divided by n, so it's p. Now the variance of eta again, the factor can be taken out of the variance but in square because if you remember variance is an expectation of square of deviation from the mean, right? So basically it will be 1 over n square and then the variance of sum is actually sum of variance is 4 independent variables, so it will be the same variance as c multiplied by n and divided by n square so it will be p1 minus p divided by n. That's very interesting because you see what's the difference between them? The expectation is exactly the same but the variance is n times smaller of this. Now what does it mean? Well it means that if n is really large our deviation from this mean would be really very very small, right? Because the variation is small and since variation is small, I should say variance, I'm sorry, variance is small then all the values and this is one of the values of this particular random variable, right? Will be very close to the mean value. So they are all concentrated since variance is very small then all the values of this particular random variables with a very high probability, I'm not talking about 100% probability. Maybe there are some odd balls but with a very high degree of probability our single value m would be within this particular area within the mean of this particular random variable which is actually p. So m would be close to p and using this variance we can evaluate how close it is. It's very important however there is one unfortunate event. Well we cannot measure this because we don't know the p, right? However what we did before and that's very important, I said that this is less than or equal to 1 over 4n because the function p times 1 minus phi on this particular segment from 1 to p has such a shape and the maximum is 1 fourth when the p is 1 half. That's a parabola, right? It's a graph of parabola y is equal to x times 1 minus x. At 0 and at 1 it's equal to 0 and it's directed with these horns downwards and the maximum is in the middle which is 1 half and the value is 1 quarter. So although I do not know p times minus p, so I do not know exactly my variance of eta, I can always say that it's more greater than this. Which is already good, right? Because n is going to infinity, right? n is increasing so my variance is still getting smaller and smaller and I can now quantify how small it is, right? Well granted it's not a very good variation. Well with p close to 1 half it is a good variation but if p is very small and very large closer to 1 then this variation is not really that good. But no matter it is it's really a variation from above which means that if we will use this as a measure of closeness we will get a better result so to speak. I mean it's more certain results, that's the word. It's more certain results that if we will use this instead of the variance then it's more certain that this particular value would be close to the mean with a higher degree of certainty higher degree of probability. So this is an assumption and that's exactly an assumption which I'm going to criticize. I will try to make it better. Let's put it this way. And then there was another assumption which I can't do anything about quite frankly. Another assumption was that I cannot really very easily based on the variance evaluate how close the m is to the mean value of eta based on distribution of eta. Because this is a binomial distribution and it's not so easy to calculate quite frankly. However there is a central limit theorem which says that with large number of added together random variables the sum behaves like a normal distributed random variable. And the greater the number of components the closer distribution of this is to normal distribution with the same mean and the same variance. So without going into exact discussion of how close distribution of this is to normal I will just follow the practice of just saying okay let's just use the normal apparatus of the normal distribution to analyze this. Although strictly speaking it must be actually investigated as well and that would affect the certainty level of our conclusions. I'm not going into this it's kind of outside of the scope of this course so I will just assume that I can use the apparatus of the random variables with normal distribution with this mean and this variance. Or the variance actually is less than this one. To analyze how close my any particular value is to the mean value of this particular random variable and to analyze this using the apparatus of the normal random variable. I will use the rules of sigma. Now you remember that if you have a normally distributed random variable and the frequency is graphed as a bell curve. This is my mean and now sigma is actually a square root from variance so in this case I can say that this is less than one half square root of n since I have this estimate from above. So there is a rule of one single sigma, there is a rule of two sigma and the rule of three sigma so this is minus three sigma and this is plus three sigma this is minus two sigma and this is plus two sigma and finally this is minus sigma and plus sigma. So the rule of single sigma says that the probability of my variable of my random value of this random variable to be within this segment minus sigma from the mean is 0.68 I think 25 I don't remember exactly, I believe so. Now in this area the two sigma area, this one has the probability 0.9545 and finally the three sigma area and almost certainly my values will fall into this area is 0.9973 if I'm not mistaken. So this is sigma, this is two sigma and this is three sigma. So what I'm going to do is although I don't know sigma exactly I do know it's a variation from above so if I will use this one I will just increase my interval and if I will use the same interval of certainty I'm definitely in the right. So I can say that the probability of absolute value of m minus e of eta to be without, to be within level let's say two sigma. So the difference between these two values, the probability of this to be less than two sigma is greater than 0.9545 right? It will be equal if it will be real to sigma but since my sigma is actually greater then the probability will be greater because I'm expanding. Now so I can quantify everything except this is really not such a good approximation in case my p is closer to 0 or closer to 1. Now let's try to make it a little better. So my purpose for today, now this whatever I was just saying before it's kind of a repetition of whatever was before and now I'm going to suggest you the way to improve this particular expression for variance of eta to make it smaller actually to make it closer to the real variance especially it's important for smaller p or larger closer to 1p because in this particular case the variation which I have made which is a true variation for p from 0 to 1 it's not really a very good one for smaller p well let's just take a look at this for instance p is equal to 0.01 now what is this this one is 0.01 times 0.99 so let's say 0.00 what's 0.99 something like this right? Yeah I think I'm right. Now what is this? This is 0.25 you see the difference? This is obviously greater but it's so much greater that it screws up basically the validity of our variation. I mean we can evaluate much more precisely if we knew that this is in this particular range. So the difference is huge, the difference is 25 times I have increased my variance by the factor of 25 that's just too much right? So let's try to improve now the way we improve it is the following now you remember that and what did we do to evaluate p or evaluate basically which is the same thing the mean value of c? We used a arithmetic average of the sample right sample mean that's what we were using can we use something like this to evaluate this? Well basically absolutely let's consider what we did to evaluate the mean we took the values and took a arithmetic average of this now what is this? This is average deviation square deviation from the mean right you remember what's the definition of variance it's a mathematical expectation of deviation of our random variable from its mean square that's where it is. So we are averaging deviation from average square okay great now we don't know obviously the random variables we know just our sample but let's use the sample to do this we have already evaluated this with this right and we basically replace this with particular value so let's just use particular value x1 minus m square plus etc plus xn minus m square and divided by m make an average of it and we can call it s square n and we can call it the approximation of the variance seems to be reasonable right so instead of just saying that this is less than one quarter I use my sampling to basically evaluate more precisely the variance of c it seems to be making sense now what should we actually do to prove that this makes sense and what's good and what's bad about it well first of all let me talk about what's bad about evaluation of variance using this as a variation of variance well bad is it's not precise it's evaluation which means it introduces at a certain level of uncertainty you see how many uncertainties we have introduced we have introduced before that our binomally distributed random variable eta we have assumed it's distributed almost like a normal distributed that we don't know how much how close this distribution of this binomial variable is to the normal variable but we have introduced basically some element of uncertainty and we did not quantify it well which is bad in some way but again the quantification of this is kind of difficult it goes outside of the scope and well we took it for granted that it's not such a great disturbance in our certainty level now here we are introducing one more uncertainty we are replacing something not with the upper bound as we did before which did not introduce any uncertainty it just made our evaluation much less narrow as we could probably now we have substituted this instead of variance and this is a variation which means it's not precise it may be greater maybe smaller and that's the problem it introduced certain level of uncertainty which is difficult to quantify difficult but not completely impossible there are two very important aspects of approximation of this as an approximation of this there are two very important factors the factor number one a variation of something should actually be unbiased now in what way I mean unbiased actually we did discuss this before now what is this this is a single value of eta right now what's the mean value of eta because we know that values are somewhere around mean value right and how close is basically this described by the variance but we would like this mean value to be exactly what our evaluation is supposed to be that's what we're trying to evaluate mean value of this is as we were talking before is exactly this so mean value of eta is p and variance of eta as you remember is p times 1 minus p divided by m now we are trying so this value has the same expectation as this and as this so basically a single value of a random variable eta a good approximation in the sense that the mean value of this around which our single value is distributed is exactly the same as what we are trying to evaluate now what is this well this is a different variable let's call it zeta let's not apply zeta and how can I express this this is a single value of a variable of a random variable equals 2 1 over n and here I will have c1 minus instead of m we put c1 plus cn divided by m squared plus et cetera and the last one would be cn minus this sum okay this random variable which is expressed in terms of c actually is the random variable of which this is a single value because m is this and this is a single value of this and x is a single value of this and where c1, cn of course are identically distributed and independent variables with the same probabilities p of y minus p so it would be great if mathematical expectation of this is this variance if I can prove this then at least I will say that this is an unbiased evaluation of my variance okay so being unbiased is very good the second is how close these values are to this now to do this I have to evaluate the variance of this so my expectation of zeta should be equal to this and my variance of zeta should be very very small as n increases to infinity right well let's just try to check it so the rest of this lecture I would like to prove that the mathematical expectation of zeta is this well actually that's not exactly what I'm going to prove I'm going to prove that it's not exactly this but I will correct it in a way that it will be let's put it this way so I will have to correct this particular random variable and that's why I would correct this thing in a very small way very small touch to this formula to make this exactly expectation of this to be exactly equal to my variance alright so let's examine what this is so we need the expectation of zeta now first of all one n should go out that's the factor now here I have n components added together and mathematical expectation of some is a sum of mathematical expectations right and mathematical expectation of each one of them is exactly the same because these x1, xn and these are all identically distributed and independent so I will have n expectations of this thing expectation of so let me just take the first one of them because they're all the same so I will put c1 minus c1 plus etc plus cn n square expectation of this and obviously this goes out and that's what I have now how can I evaluate this expectation well let's just raise this into a square like you know a square sorry a minus b square is equal to a square minus 2ab plus b square right now this is my a and this is my b so what will be is the following equals expectation of c1 square right square of this minus 2 expectation I took factor 2 out of c1 times c1 plus cn divided by n plus expectation of c1 plus etc plus cn divided by n square right equals now what's expectation of c1 square well c1 is equal to c1 square is equal to 1 or 0 right because c1 is equal to 1 or 0 so c1 and c1 square are actually the same thing and that's why expectation is exactly equal to p so it's along with the probability of p 1 square or 1 and 0 is probability of 1 minus p so we have 1 so we have p that's easy now this alright let's just think about this if I will multiply it well let's just put it this way c1 c1 plus c1 c2 plus etc plus c1 cn divided by n right that's what we have and I will repeat this as is square divided by n square that's little easy right square here and here I put it separate equals ok my p is here now minus 2 now what is this well n obviously can go out so it's 2 divided by 1 now expectation of sum is sum of expectation so this one is c1 square and we know the expectation of this right it's p now all these are expectation of the multiple of product of two independent random variables and expectation of two independent variables is product of their expectation product of expectation of c1 times expectation of c2 so it's p and p right because expectation of each one of them is p so it's p square and how many times p square is n minus 1 time right that's what it is and now how about this one well obviously 1 over n square can go out what's inside alright here is how we can do it think about what is c1 plus et cetera plus cn square well that's c1 plus cn times c1 plus et cetera cn now we have n components here and n components there so they're all multiplied to each other how many components will be as a result well n square obviously right because with each of the first sum will be one of the any one of those components of the second sum second component of this product so we have n square different expressions some of them are squares like c1 times c1 would be c1 squares or c2 times c2 will be square how many squares would be n of them right and expectation of each one of them like expectation of c1 or c2 square or c2 square et cetera is p so it would be n times p now the rest n square minus n components would be equal to expectation of let's say c1 times c2 or c7 times c17 now expectation of product is product of expectation for independent variable so it's p and p so it's p square that's it now all we have to do is to do some algebraic manipulation which I have done in the notes for this lecture I'll just give you the result it's a little disappointing right this is disappointing you remember we are evaluating variance of c variance of c is this now as n is increasing to infinity this thing becomes greater actually greater and greater approaching 1 but for smaller n it's not 1 it's like 99 100s or 999 of 1000s et cetera so it's not 1 so this is not exactly equal to p times 1 minus p which means our evaluation whatever we were thinking about the sample variance as we called it is not really an unbiased evaluation so let me write it again here our evaluation was called sn square which is 1 over n sum of xi minus m square i from 1 to m average deviation from the mean square and this is also the sample mean obviously so this is not our unbiased evaluation can we do it better than that well there is a very simple solution let's consider sn minus 1 square which is 1 over n minus 1 sigma xi minus m square i from 1 to n what's the difference between them well the difference is factor so if I will multiply sn square by this factor I will get sn minus 1 because the sum is exactly the same it's all n parameters but I divide by n minus 1 instead of n now if I will multiply this by a factor now obviously all my calculations will be basically the same except this factor will go out from the very beginning and multiply by this factor my real expectation of 1 over n minus 1 sigma xi minus expectation of xi square right this will be exactly p times 1 minus p because now and this sum is again it's all n components from 1 to n but division is by n minus 1 so by dividing by n minus 1 instead of n here here this we make our evaluation unbiased considering this to be a single value of this variable we get exactly the same expectation as the value which we would like to evaluate the variance of xi ok so what's the bottom line of whatever I was just discussing so far yes we can more precisely evaluate our variance so instead of making a root evaluation I call it root we can make more precise evaluation using this don't forget this n minus 1 division of sum of n but divided by n minus 1 so it's not exactly a arithmetic average according to a classical definition of a arithmetic average so using this we will get an unbiased evaluation of this thing good or bad as a variation well that's actually a big question because I know that expectation of this is exactly this but I don't know the variance of this now to calculate the variance is possible but again it's too complicated for this particular course and I'm not going to do it however I will definitely tell you that the variance of this variable goes down to 0 as n goes to infinity so if you will calculate the variance of this it's n in the denominator so which means that we can definitely improve our evaluation of the variance by using this and with a relatively large numbers n the variance of this relative to its average value to its mean expectation value is really very very small and we will have the right to do it let's put it this way so we have now unbiased evaluation instead of evaluation from the top we have an unbiased evaluation of the exact value however it's not with a hundred percent certainty it has certain degree of uncertainty which is introduced into the whole system of equations and that's what basically makes this well let's put this way more precise but less certain and different people prefer different things let's put it this way but this is definitely a valid approach if your n is significantly large which means that the variance of this around its mean value is not really the big one well that's it for today I do suggest you to read my notes for this lecture it has a little bit more maybe elaborate calculations for those who would like to follow it and it will probably give you yet another view and another step in your conquering of this particular thing anyway I would like you to remember that whatever we did was actually in pursuit of certain evaluation and this evaluation must be number one unbiased and number two the corresponding random variables of which our evaluation is just a value should have a small variance that's what's important so all the values including our sample value are relatively close to the mean value and the fact that the mean value is exactly what we are trying to evaluate has been already established so the mean value of our new random variable like this one is exactly what we need in this case it's a variance but the variance of this should be small enough which means our sample should be big enough and in the next lecture I will consider the same problems I did before but before I did with this evaluation of my variance now in the next lecture I will explain what if I replace with this evaluation of my variance how significantly it will change the calculations of let's say certain number of experiments to achieve certain certainty and certain margin of error that would be the next for today that's it thank you very much and good luck