 Hi, I'm Zor. Welcome to Unisor Education. We continue the course of advanced mathematics for high school students, and this course is presented on Unisor.com. I suggest you to watch this lecture and any other lecture of this course from this website rather than from any alternative one, because every lecture contains lots of very detailed notes, and obviously those who are signing in to the site, which is free, those people can take exams any number of times they want until they reach the perfection. All right, so today's lecture is about statistical distribution and one particular task which we will be dealing with is when you would like to evaluate to tell something about random variable, if you don't know which values it's supposed to take. There is no predetermined, predefined values, and obviously you don't know the probabilities. We do consider discrete values, and certain number of statistical observations, data, we will rely upon to analyze and to make certain judgment about statistical distribution of this random variable. Now, the previous lecture was when we did have predefined values our discrete random variable takes. Now, we don't have anything predefined, we just consider it just takes certain values and certain range discrete values, but we don't know exactly which ones. Now, I will talk about this particular task, which I call task B, actually. The A was with predefined values. Now, I will use a very concrete example and based on this example, I will explain the methodology which I can suggest to apply in this particular case. Okay, so what's the particular example? We need some real-life statistical data, and what I have decided to do is to take statistical data about the temperature in New York City throughout like 140-something years. Now, the raw data, they are in one of the government websites which are available for anybody. They are listed in my notes to this lecture. So, the raw data I took and they basically constitute a table where on the top you have months, January, February, etc., December, and then annual. And here you have years, 1870, 71, 72, etc., up to 2015. So, 145 years. Now, on this crossing, I have an average monthly temperature in New York City, in Central Park, for this particular month and this particular year. So, that's my raw data. So, I would like to analyze the random variable which basically is represented by these statistics. In particular, I would like to use this analysis to make certain judgment about whether there is some kind of a warming of the climate in New York City during these years or not. So, I assume, before going into any further calculations, I assume certain things which might not be necessarily true, but I just need to assume something to apply a certain mathematical methodology, right? Now, so what I assume is the following, that during one particular decade, like decade from 70 to 79 or from 1900 to 1909 or from 2000 to 2009. So, during these particular decades, I assume that my random variable basically has exactly the same distribution. So, there is no change within a decade of statistical characteristics of the temperature in January, the average temperature. In January, temperature in February, etc. So, these are different random variables for different months or annualized, but I assume that this, this, this and this all nine values are nine values or ten values from 1870 to 79 or from 2000 to 2009. So, I assume that all these nine values represent random experiments with the same random variable. So, there is no change related to the time of temperature distribution, average temperature distribution in January or average temperature distribution in December or average annual distribution. So, the climate does not, I assume that the climate does not change during this decade. Then, I will try to evaluate certain probabilistic characteristics of one variable, which is my observations during one decade and another, and then I will be able to compare whether certain shifts can be observed. So, let's talk about January only because February and December and annual are all the same. Maybe I will talk about annual as well, but it doesn't really matter because again, they're all absolutely similar and the calculations which I have made are represented in some spreadsheet, which is also listed in the notes. You can download it from my website. But I will talk about one particular random variable. Let's talk about January temperature, average January temperature in two decades. Decade from 1900 to 1909 and from 2000 to 2009. So, I assume that this decade is represented by certain, by 10 actually, different values of average January temperature and I assume that all these are values of the same random variable. The experiments are independent and identically distributed probabilistically. And then I will take 10 values from 2000 to 2009. I have also statistical distribution of 10 values which are here and I will make certain judgment about what exactly the probabilistic characteristics of the random variable describing the temperature, January temperature in in this particular decade. And then I will compare these probabilistic characteristics to find out if there is any kind of a shift. Okay, that's the idea. So, what I am supposed to do first is based on these 10 values, I can basically make a judgment about certain probabilistic characteristics. So, these are let's say C1900 up to C1909. All right, so what can I do with these 10 independent identically distributed variable? Well, obviously I can make their average and similarly, I can make an average of these. Now, what is this average basically represents? Well, it represents average for a decade, right, for 10 years. Now, if my probabilistic characteristics are such that this average monthly temperature in 1900, it's average of in January we have 31 days, right? So, it's average of 31 days and every day I don't know this, but I assume that every day there are more than one measurement. So, when everything is averaging, I think it's safe to assume that each one of them is normally distributed or almost normally distributed because it's actually by itself an average of many random variables. And we know that limit theorem is working in this particular case. So, I also assume that there is no modifications in probabilistic characteristics. So, all of these guys have certain expectation, which I will call 1900, and I put asterisk here, and certain standard deviation. Same thing with these guys from C2000 to C2009. They have certain mathematical expectation, which is decade of 2000 and certain standard deviation. Now, what can we say about these two variables? We can say that their mathematical expectation is exactly as this one, right, because it's mathematical expectation of a sum is sum of mathematical expectations, and then we divide by 10. So, it's exactly the same. And for this guy, we also have mathematical expectation, mu2, which is equal to original one. Now, as for standard deviation, well, that's not exactly the same because we know that variances are added together if we have independent variables, right? So, variance, so there is also the variance here, and sigma equals to square root of variance. So, we know that. So, we know that the variance of C1, or I can actually put variance 1, variance 1, is equal to one-tenth of the variance of each one of those, right? That's 1900 divided by 10. Why? Well, because the factor is going out as a square. So, it's one-hundredths, and the variance of sum is equal to sum of variances in independent cases. So, it's 10, and then divided by 100. That's why it's divided by 10. So, obviously, if we are averaging a certain number of random variables, which are identically distributed and independent, their mathematical expectation of this average is exactly the same as each one of them, but the variance is smaller by this factor. All right, same thing with this. So, variance 2 is equal to variance 2,000 divided by 10. So, that's good. Now, how can we compare the mathematically calculated probabilistic characteristics of this random variable, which characterizes the average temperature, January temperature in 1900s, and the average January temperature in 2000? Well, if we know mathematical expectation and we know the variance, in this case, in this case, we can actually make a judgment. Well, if mathematical expectations are different, then it actually means that there is a shift in the temperature. If variances are different, it means that there is a shift in spread around the medium value, right? So, we can actually observe two different types of change of the temperature. We can make a shift in temperature, up or down, and we can make a certain judgment about the spread around this average, right? Well, the problem is we don't know these numbers, right? All we know is statistical results. So, instead of Xc, I don't have these random variables, right? I have their experimental numbers, right? So, here I also don't have my real sum of random variables. I have some of my samples. Now, same thing here. I have X2, which is sum of 2000 plus 2009. So, we don't know mu1 and mu2, the real expectations, but we do know X1 and X2, and that's the best approximation which we have. Now, question is, well, obviously these are different, but question is, is it really like significantly different to make a judgment that the real mathematical expectations are different as well? Well, here how we can approach this? We don't know mu1 and mu2, the mathematical expectations of random variables, which represent 1900 and 2000 temperature in January, but we do know their single experimental value, single sample. We don't know the variances, but we can replace it with sample variance, right? So, the sample variance S1 square is equal to what? It would be 1 tenth of X900 minus X1 square plus etc. plus X1909 minus X1 square. Right? It's average of the square deviation from sample average. Well, actually, you know that in real life we prefer to have here instead of 10, we have to do actually 9 to make this evaluation of my variance to have the same mathematical expectation as this variance. Remember the formulas contains n minus 1 in the denominator. Now, same thing here. We can talk about sample variance S2 square, which is equal to 1 ninths of X2000 minus X2 square plus etc. plus X2009 minus X2 square. So, knowing this and this, we don't really have any better choice, but to assume that these are as good approximations to our variances as we can get. And these are as good approximations to our expectations as we can get. Now, how can I evaluate whether there is or there is no difference between mathematical expectations? Well, here is the very simple way to do it. What we have to really do is we have to compare mu1 minus mu2. Is it greater or zero, less than zero or equal to zero. That's what we want to know. To answer the question if there is any kind of a move in the temperature. Now, what is mu1 minus mu2? Well, this is mathematical expectation of X1 minus X2, right? So, we have to compare this with zero. Now, what do we know about this? Well, we know the only thing which we know is approximation of this. This is average of 1900 temperature and this is average of 2000 temperature. So, we have to compare how this is different from zero. Well, not being equal to zero doesn't mean much because maybe it's close to zero and again we have certain statistical distributions. So, question is whether not being equal to zero is sufficient to make a judgment about this not being equal to zero. Now, to help to resolve this question, we need the variance, right? Because what is X1 minus X2? Well, this is a sample, single sample value of random variable X1 minus X2, which is normal, right? This is normal. This is normal. The difference between normal is as normal and it has mathematical expectation like this and what is the variance of this? What is the variance of X1 minus X2? They are again independent. If they are independent, it's sum of variances. Notice the plus sign here, not the minus. Whenever you do plus, it will be plus. Whenever you do minus, it will be plus as well. Variance is the distribution around the average value, right? So, whether you're subtracting or or adding two different random variables, their distribution is supposed to be added together. The smallest of one will be added together with the smallest of another and that will be the smallest of their sum. If you have a difference, it will be still the same thing, right? So, knowing S1 as S1 square as a variation of this and S2 square as a variation of this, we can actually evaluate the variance of this X1 minus X2. So, we can say that the variance of X1 minus X2 is approximately equals to S1 square plus S2 square. S1 square plus S2 square. So, we have variance. So, we have a random variable called X1 minus X2. We have an approximation. We have one single sample, which is an approximation to mathematical expectation. Now, how close this to this? Well, since we know this variance of this particular, again, approximately know the variance, not exactly. So, we can say is the following. If this is some kind of distribution of C1 minus C2. Now, this is mu1 minus mu2. Our X1 minus X2 falls somewhere here. Doesn't matter. So, for instance, this is X1 minus X2. Now, since we approximately know the variance, we can always say that there is some two sigma, for instance. We can calculate the two sigma from this, square root of this and multiply by 2, right? And this would be the error of 95 percent certainty. So, we can say that with 95 percent certainty this thing falls within two sigma, calculated based on this, from the real mu1 minus mu2. So, the probability of mu1 minus mu2 to be different from X1 minus X2 by less than two sigma, it's supposed to be 95 percent, right? 0.95. That's what we can say for a normal distribution. Now, we do know approximation of the sigma. We do know this and this. So, we can make a judgment. My mu1 minus mu2 with the probability of 0.95, my mu1 minus mu2 with probability of probability let's put a level, certainty equal 0.95. So, with certainty level of 0.95, we can say it's minus two sigma and plus two sigma. So, that's what we can get from our real data. So, this is average January 1900 minus average January in 2000 or vice versa. It doesn't matter. Now, this is sample variance in January of 1900 based on ten sample values similarly in 2000 and that's what we will use to calculate the sigma. So, it's square root of this which is sigma times two. So, we know this and we know this and that's why we can make an evaluation of mu1 minus mu2. If this is negative and this is positive, then we cannot actually say anything about the difference in temperature. If this is positive and this is positive, then we can say that there is a positive movement from one to another or correspondingly negative. And that's the bottom line of all these calculations. That's how we can make a judgment about whether there is or there is no positive or negative movement of the temperature from let's say 1900 decade to 2000 decade. So, all these calculations I have made in the spreadsheet which I have linked to my lecture in the notes and what is the bottom line, I'm sure you're very curious to know. It's quite interesting actually. First of all, the average itself from 1900 to 2000 is almost all, yeah, I think it's like every month is increasing. In some months, like January for instance, the average increase from 1800 to 2000 is not very significant. And if I will make this calculation for January, I will have this negative and this positive, which means I cannot make a judgment about I cannot make a judgment based on 95% certainty. I probably can make it with lesser amount of lesser level of smaller lower level of certainty. I think that's the proper way. Lower level of certainty. Something like 70% for instance, if it's if it's interesting. I mean anything with 50% is probably just nobody cares about. But something like with level of 70% I can say that yes, January increase can be observed. Now in some other months, like for instance February, this increase is sufficient between the average temperatures because the the average temperature by absolute value is greater than 2 sigma and that's why both have the same sign. So are both negative for instance, if this is 1900, this is 2000, then this is negative and this is negative, which means this is negative. Or if I'm subtracting mu 2 from mu 1, it will be always positive. So which means there is a warming. So certain months, I think about half and half or something like 5 and 7, certain months while all the months observe certain increase in the temperature from 1900 to 2000, certain months have sufficient increase, so to speak, to make a judgment with 95% certainty. And other months are increasing, but not as much. Let's put it this way. And by the way, the annualized temperature is increasing from 1900 to 2000 sufficiently to basically make a judgment with 95% certainty that there is an increase in temperature. Now, this is undeniable. So all the months are increasing. Some of the months, like about five of them or four or five, I don't remember, are increasing with certain level of certainty, which is at least 95%. Others increasing, but not to that level of certainty, which basically makes me to believe that the climate is actually changing towards warming. And I actually calculated the slope of this warming. So the slope of this warming is that about in 100 years, we have an increase of about one degree Celsius, approximately. I mean, obviously the temperature is all over, but if you will write certain formulas to calculate the slope, slope will be positive and that will be about one degree of Celsius in some months. In some months more, I think there is a month like two point something degrees. Now, I'm not making any judgment, obviously, about what's causing this. This is completely outside of mathematical research which is being done there. All I'm just saying that there is a mathematical confirmation that there is a warming between 1900 and 2000 in New York. Again, I don't make any judgment about other spots on the surface of the earth or oceans, etc., etc. Only based on whatever statistics I have, which is New York temperature, average monthly temperature from 1870 to 2050. And I made a mathematical judgment that with 95% certainty, we can say that January temperature is not increasing. With 95% certainty, but February and March, for instance, is increasing. However, all the months have increased in the temperature. Now, as far as comparison of the S1 and S2 to find out whether the temperature became more extreme, again, I cannot say anything definitive because different months have different comparison and everything, all the numbers are in the spreadsheet, which I actually urge you to take a look at. So, the conclusion. Yes, there is observed movement upwards in the temperature, approximate speed about 1 degree Celsius per 100 years. Doesn't mean anything at all about the future. I'm not making any future predictions. All I'm saying is that there was this movement up until whatever. So, no predictions about the future. And by the way, my personal opinion is that whoever says that they can predict the future for 100 years forward, I just don't think they're just right. And secondly, I'm not making any judgment about what's causing this particular warming. I know there are some very very high temperature political discussions about whether humans really played an active role in warming of the climate. I don't know. So, whatever I said is strictly mass. And I just urge you to take it as such. And what I would also be very interested if someone will do similar calculations based, let's say, temperature of the ocean or temperature in Antarctic or something like this. And if you will send it to me, I will gladly put it on my website and I will make the reference to your calculations and your authorship and stuff like this. So, basically, that's it for today. Very much interested in your comments about this lecture and about the conclusions which I have made. That's basically it. Thank you very much and good luck.