 Hi, I'm Zor. Welcome to Unizor education. Let's continue talking about a particular statistical problem. The problem was actually explained in the previous lecture, and this lecture is about the solution to this problem. All right, so the task at hand is we are dealing with certain Bernoulli random variable which is equal to 1 with the probability p and 0 with probability 1 minus p. And we don't know the probability, we would like to evaluate it somehow. Well, it can be something like, for instance, what's the probability of picking a spade, for instance, from a full deck of 52 cards? I mean, we know that there are basically four different suits, and the probability should be one quarter, right? But if somebody doesn't know this, and he decides to basically make an experiment to empirically evaluate this probability. If somebody makes an experiment, let's say, 100 times, and 22 times out of 100, he picked a spade. So his reasonable assumption is that since probability is a limit of the frequency of occurrence of the event. Now, the frequency of occurrence is 22 out of 100, so approximately the probability is 0.22. Well, it's not wrong, absolutely not. It's kind of a right approach. But I would say it's incomplete, naive in certain way, and definitely needs some mathematical foundation, and that's exactly what this lecture is about. So the solution is known. You just make experiments, average the results, and basically do away with this. The question is what exactly this means, and how to quantitatively evaluate the quality of our evaluation. All right, so how can we do it? Well, first of all, let's make a theory out of these practical experiments. Now, what does it mean that we conduct experiments with our random variable? Well, and we get n results, obviously. But think about this way. If you have a different set of n experiments, you will have different results. I mean, 1s and 0s will be in different places. What does it mean? Well, it means that the average, which you have calculated based on these n results, is actually a random variable. And the result of each individual experiment is also a random variable, exactly distributed like our original random variable Xe. And all these experiments, I presume, are independent of each other and identically distributed. So I will use the letters C1, C2, etc. as results of n experiments. And this is my average. Now, again, I'm using notation that resembles the random variables rather than constants. Yes, I understand that in one series of n experiments we get concrete values, kind of constants. But somebody else would do exactly the same and he will have different numbers. And what we would like to do, we would like to put some kind of a mathematical foundation to a statement that, regardless of the result, averaging would be close to p. That's why I have to consider what are different values of these averages. So that's why I approach this as a random variable which has certain distribution, obviously. So this is a random variable which we are dealing with and we would like it to use as approximation of the probability p. Now, obviously, since C1, C2, etc. are equal to either 1 or 0, sum is the number of times it's equal to 1. So that's the number of times this event occurred and that's why this ratio is exactly the frequency. And that's why I assume that this is an approximation. Now, the definition of probability is that as I increase the number n, this is supposed to be closer and closer to the probability. But the word closer is not so far. It's not really well-defined. I really have to define it mathematically. Because, let me just state it upfront, what are possible values for this sum? Well, obviously, as I indicated in the previous lecture, this sum can have various, basically, the sum in the numerator from 0 to n, right? 0 and all of them are 0 and n and all of them are 1. So if I will divide it by n, I will have different numbers from 0 to 1, right? So I will have 1 n, so I have 2 n, etc. So all the different possibilities from 0 to 1 with the step of 1 n are possible. So what does it mean that this value is close to some constant? I know without actually even thinking that p is from 0 to 1 because it's a probability. So how can I say that a random variable which can take any one of these values from 0 to 1 actually is an approximation of p if I know basically nothing about p except that it's from 0 to 1. Well, this is something which I would like to address more mathematically. What does this closeness actually mean, all right? Now, before doing that, let's just think about what exactly this particular random variable is. Well, it's called actually, it's distribution, it's called binomial distribution. Now, I have explained actually all the details about binomial distribution in the lecture in the theory of probability part of this course. So if you forgot about what this is, I do suggest you to just stop right now and go to the chapter of this course in the binomial distribution. Now, binomial distribution, although it does have in this particular case this variable, it does have all these different values, it takes them with different probabilities and that's extremely important. Here is what I meant. Now, how many different ways this particular sum can be equal to 0? Well, obviously if all of them are equal to 0, now what's the probability of this? Well, that's probability of this being equal to 0 times this being equal to 0, etc., so it's basically 1 minus p to nth degree. Now, if p is not 0 and not 1, somewhere in between, like one quarter for instance, this will be less than 1 and raised in the power of n, it will be smaller and smaller and smaller depending on how, as n increasing, this particular value goes smaller and smaller. Now, exactly the same is this. The probability of this sum to be equal to 1 is the probability of each component equal to 1, which is p to the nth degree. And again, if p is between 0 and 1, this is decreasing with n going to infinity. So, and it's smaller and smaller. So what I mean is the following. If I will put the distribution of probabilities graph here. So this is 1 and this is 0 and this is 1.5, this is 1.25, this is 3.25, etc., so these are with a step of 1ns. Now, the graph, which basically represents the probability for my variable, to take any particular variable, will be very small here and very small here as n really large because it will be like this. And then it will be greater because there are how many, for instance, cases when it's equal to, when the numerator equals to 1. Well, it means this can be 1 and these are 0, which is p times 1 minus p to the power of n minus 1, right? 1 is 1 and the rest is 0. But I can have this one as 1 and the rest of 0, or this one as 1 and these are 0. So I have n different variations and if I divide it by n, it would be, I don't have to divide it by n. It's n different variations. It means it's n different cases when my sum is equal to 1 in the numerator. And this is definitely greater than this. So I will have more here, more here, etc. So I will have graph approximately looking like this. So the middle values, like somewhere around around 1 half, will have the greater probability because there are more combinations of 1's and 0's which deliver this particular number. So, as I was saying, yes, the range of values of this thing of 8 that is from 0 to 1, but different values it takes with different probability. Now, let's examine two main characteristics of this variable eta. The characteristics are mathematical expectation and variance. Now, mathematical expectation is, now this is the sum, mathematical expectation of sum is the sum of mathematical expectations. The factor can be taken out of this, so it would be 1n times sum of these and each of them, mathematical expectation is mathematical expectation of c which is p, right? Now, obviously, mathematical expectation of c is equal to what? It takes value of 1 with probability p plus value of 0 with probability 1 minus p, right? Which is p. Now, the variance of c is equal to, now, variance is expectation of value minus its expectation square, right? So, now, we know what expectation is. That's p. So, c can take value of 1 in which case it's 1 minus p square with probability p or it can have a value of 0 which is minus p square, which is p square with probability 1 minus p equals, okay, 1 minus p out, then I will have 1 minus p times p plus p square which will be 1 times p which is p minus p square plus p square. So, it's 1 minus p times p. Okay, so, variance is equal p times 1 minus p. Now, when I'm talking about variance of the eta, variance of the sum of independent variables, random variables is equal to sum of their variances. But the factor goes out in a square. Well, because the variance is expectation of something square, right? So, it would be 1 over n square times sum of these which is n and p and 1 minus p which is equal to p 1 minus p divided by n. Now, there are two very encouraging factors right now we have established. Now, this is encouraging factor number 1. What does it mean? Average of the random variable is exactly what we want to evaluate. So, whenever we are using the value of this as an estimate of this, we know that on average it will be concentrated exactly where we need. Now, the variance describes how much it deviates on average from the from the center, from the mean, right? And as you see, we have n in the denominator which means as n increasing the variance of this thing decreasing. So, it will be tighter and tighter around its mathematical expectation. So, that's very important. So, for n equal to let's say 2 I will have 1 half and 1 half, right? For n is equal to 3 it will be something like this. For n equal to 4 it will be something like this. So, this is a graph of frequencies. And more and more experiments I conduct the more the shape of this would resemble something like a bell curve. Just keep it in mind. But anyway, these are two encouraging factors. Although I do not know p I can definitely say that my mathematical expectation of a random variable variation of p is equal to p which is a very good sign. And the variance of this is decreasing as n is increasing. That's also very important. Now, the problem is if to evaluate the quality of this evaluation I'm using my variance and I'm saying when the variance is small then the values are concentrated around p more and that's a better variation if the variance is greater than my deviation from the mathematical expectation would be greater and the variation would have a lesser quality. Great, but I don't know the p. So, how can I say whether it's big or small or whatever if I don't know anything about it? Well, in this case what can help is analysis of this formula. Now, look at this p times 1 minus p. What is this? Well, let's consider a graph of function x times 1 minus x where x is from 0 to 1. Now, this is a parabola because it's a polynomial of the second degree. Now, the coefficient at x square is minus 1. So, parabola has horns directed downwards, right? So, it's this type parabola. Now, where are the roots? Where are the values of x which bring y to 0? Well, obviously it's 0 and 1, right? These two solutions of this equation x times 1 minus x is equal to 0. So, it's 0 0 and 1 that would be my graph, right? In this segment it's 0 at 0 and 0 at 1, right? At 0 and at 1, 0 and it reaches maximum in the middle, 1 half, right? So, what is this maximum of this parabola? Well, 1 half times 1 minus 1 half is 1 quarter. So, I can definitely say that this is less than 1 over 4n. Okay, so I don't know exactly the value of the p but I have estimated from the top from above. Now, why is this helpful? Well, I still have n in the denominator which is very good and now I can say that, okay, my variance I don't know it, but it's definitely not greater than this guy. And that's sufficient to understand how tightly my distribution of eta is around its mathematical expectation. And here is how it can be approached. So, let's just wipe it out and I can say that the mathematical expectation of eta is p and variance no, not c, eta is not greater than 1 over 4n. Okay, so I know these two things. Now, is it sufficient to say to anybody? Okay, my evaluation is my evaluation of probability p is calculated using this formula where c1, etc., cn is the result of your experiments. You have conducted an experiment, these are results. You do the averaging and I'm saying, okay, this is the variation and the variation is not biased because the mathematical expectation is exactly p which is good and I also can say that the variance of this is not greater than this number. Well, this second statement is not really very informative for a normal person and here is what would be more informative and that's how the whole statistics actually is doing, is making these type of statements. I understand that my standard deviation is less than one quarter, one over 4n. However, I don't want to know my variance. I know a different thing. I would like to know that my value of p is within certain limits of delta from empirically obtained value of the random variable delta. So if I know that this is true then this delta which is the measure of error, it's a margin of error actually gives me from and to boundaries for its value. These are two equivalents, right? Because both mean that absolute value of the difference is not greater than delta. So I would like to know this delta. So getting an empirical result, let's say 0.22 if I can say that my real probability is within let's say two tenths from 0.22 no, not two hundredths let's say from 0.22 then it means it's from 0.20 to 0.24. That gives me the range. Range is much more informative than variance. But how can I get the range? Is there any delta which gives me such an inequality in absolute terms? Well, no. Because again since eta can be anything from 0 to 1 with different probabilities it means that basically the difference between p and eta can be up to 1 actually, which means it's absolutely of no value to me. However knowing that different values eta can take with different probabilities and those values which are closer to p are greater probabilities taken. I can say that this particular inequality is not observed in absolute terms, but it's observed in probabilistic terms. The probability of this event if I can say that probability is greater than some level of certainty which I consider to be acceptable for me, for instance I can say that, okay, with probability or with certainty level of 95%, 0.95 this is true with delta equals 0.02, for instance. This is much more informative than this. Now, so my purpose in my statistical analysis and basically that's the purpose in the entire statistics is to come up with inequalities like this. So I'm evaluating value based on my empirical factors with this precision with this level of certainty and how can I do it in this particular case? Well, in theory with computers or anything like this it is possible because eta has a binomial distribution we know it's a function of n of number of experiments and obviously formulas are rather complex but probably can be handled by the computer. Well, people don't do this and they're using something which is which I was actually addressing again in the course of theory of probabilities it's called central limit theorem. Now, central limit theorem states that the sum of random variables under relatively liberal conditions but in this case for instance they're all independent and identically distributed that's definitely a good condition. So the sum of these behaves like a normal distribution as the number increases. Now how close this distribution to a normal distribution depending on the number and I'm not addressing this issue right now. I'm just blindly using this theorem considering that n is sufficiently large so the approximation of the distribution of this binomial distribution with the normal distribution is correct. It's a different story when exactly it's correct. I'm just assuming that n is sufficiently large for our purposes. In which case I can replace binomial distribution of eta with normal distribution which has exactly the same expectation which is p and exactly the same standard deviation which is sigma where sigma square is equal to var of eta in which case sigma is not greater than 1 over 2 square root of n, right? So standard deviation square which is variation is not greater than 1 over 4n so sigma itself is this one and now what I'm doing next is using the apparatus which has been already researched for for normal distributions to obtain something like this and here's how for any normal distributions and again I address this issue in my theory of probabilities lectures we know that the random variable, well in this case eta is probability of eta to be within infinity of its standard of its mathematical expectation it's closer than sigma than this probability is 0.6 825 so look at the bell curve now this is my mathematical expectation now and sigma is 1sigma is from here to here so the area of this is equal to 0.6825 so the probability of our random variable to be very very close to my mathematical expectation is this now if I will have 2sigma so within this more greater interval well obviously the probability is greater probability of my random variable to come within this distance from its expectation is greater which is 0.95 45 and finally if I am taking a very broad margin of error which is 3sigma probability is almost 1 actually so these tails are very very improbable so to speak so now everything depends on how big my delta is supposed to be how certain I would like to be within this distance of delta from my mathematical expectation which is this case the same as my probability of c equal to 1 the very beginning which I wanted to evaluate so now I basically can solve all the problems if my delta is given and n is given and from n I can write sigma which means sigma is given so knowing n means knowing sigma so if my delta is given which is a sigma and my level of certainty is given so let's say I would like delta to be equal to 2sigma for instance which means and I would like certain level to be equal to 0.95 0.9545 then I know that this actually does happen right and knowing exactly this value my precision which I would like to evaluate I can calculate my sigma and from sigma I calculate n so basically from delta and p I can calculate n now if I would like greater precision I mean I would like to have equal to sigma for instance so it's closer to my then I have to be satisfied with either I have to be satisfied with lesser probability or I have to basically increase the number of increase the number of n to get to a smaller degree here so in any case my point is that knowing delta and knowing p you can come up with n knowing p and n you can come up with delta using this type of evaluation and knowing what n and delta for instance so you have already the number of experiments you already done it so you can change it and you would like certain precision then you have to be satisfied with whatever certain two level it gives you so anyway having these three main parameters of our statistical analysis number of observation operation precision which is basically deviation from the median and level of certainty p these are all connected and knowing two of them I can find out the third one based on this so let just as an example for instance you have attempted to do something 100 times say you are picking your space from the deck of cards right so if you have arranged n is equal to 100 so that means your standard deviation would be less than 120 right now if you have 0.22 frequency empirical frequency of number of space what you can say that with the probability 0.68 25 you can say that your p minus 0.22 probability of this to be less than or equal to let's say 2 sigma which is 110 now that's 1 sigma so we don't need this so it's 95 something 45 so you would like to achieve this probability this certainty of your answer, your evaluation then you can say your unknown probability of space from the empirical number the probability of this to be less than one tenth is equal to 0.9545 actually a little bit better because if you remember this is not really a real variation variance this is an estimate from above the real variance is like p times 1 minus p which is unknown so at least this level of certainty you will probably even better which means that your probability of getting space is with this certainty from 0.21 to 0.23 or again and precision you can find for instance this so any kind of combinations are possible in this particular case well that was a solution to this statistical problem which I have introduced in the last lecture I would like you to read again the notes for this particular lecture they are on Unisor.com maybe something is explained a little differently and in this case it's a very useful exercise of your statistical analysis well that's it for today thank you very much and good luck