 Hi, I'm Zor. Welcome to Ingezor Education. I would like to continue with certain statistical problems. We have already discussed a couple of them in the previous lectures. This is one other problem. By the way, also related to climate change problem. So, this is about the level of precipitation. I wanted to compare the level of precipitation in New York a century ago and now. I would recommend you to watch this lecture from theingezor.com. What's very important is that in the notes which always supplement every lecture, I put some references to real data and a couple of spreadsheets which I have created myself based on these data. So, I will not go into concrete numbers during this lecture. I would rather concentrate on methodology. So, what's given and what we have to obtain? Well, what's given is some data which has been obtained throughout the last century So, I took the data from 1906 to 2015 and the data basically contained the level of precipitation in New York during the months of January, February, etc., December and annualized for each year starting from 1906 all the way to 2015. So, on each crossing I have this particular year, this particular month and what's the level of precipitation? Well, I believe it's in inches, doesn't really matter, some number. So, I can actually follow the level of precipitation during a particular month, let's say January, throughout the whole century or annual data, the sum of all these. This is basically the sum of these 12 months. So, that's given. Now, what do I have to obtain? I have to make some kind of a statement about whether the level of precipitation is changing, maybe upwards, maybe downwards, and make it with certain certainty, which is statistically supplemented with certain calculations, up to a certain, usually 95% certainty. Okay. So, that's the task. By the way, I call it test D. That's when our random variable has continuous distribution and, well, level of precipitation seems to be continuous distribution and we don't really know the boundaries. It can be from zero to basically any number, any big number. Well, not any, hopefully, no floods. All right. So, right now, again, I put some calculations into spreadsheets and put the references in the Unisor.com notes for this particular lecture and here I would like to concentrate about how I approach this problem. Now, there are a few assumptions which I have to make. For instance, well, I cannot say that I do not assume anything about behavior of this random variable during this time period because if I don't, then all the distributions are different and obviously I can't really say about any kind of a law which governs the changes in this particular precipitation level. So, I have to assume certain periods of stationery and my assumption was that I would like to divide these hundreds of years into decades, 10, 10, 10, 10, 10, 10 decades, like from 1906 to 1915, from 1916 to 1925, etc. So, my assumption is that during the decade, I can actually assume that I'm dealing with one particular random variable which has certain normal distribution and my 10 numbers, 10 precipitation levels, let's say from 1906 to 1915 or from 2006 to 2015. So, during this decade, during these 10 years, the random variable which describes level of precipitation in each month is normally distributed and identically, independently from each other all these 10 years and they are identically distributed. So, if I will call, now let's talk about January because every other month and annualized numbers are treated exactly the same. So, I can talk about random variable, let's say this one which is the random variable describing January precipitation level in 1906. Then I can talk about 1907, etc., up to 1915 and continue 1916, etc., etc. But I'm talking about this decade only. So, I presume that these 10 different random variables are independent, level of precipitation in 1906 is independent from 1907. It's an assumption, it may be not true, but I need to do some assumptions to make some calculations, right? And I also assume that they are identically distributed which means basically that during a decade, the climate is not really changing in any kind of a direction. I will be able to compare the behavior of these 10 variables with the behavior of these 10 variables which is 100 years later. So, these 10 variables are identically distributed and independent and I assume normality of their distribution law and these are as well. Now, the next thing which I would like actually to talk about is how can I compare the distribution of these variables which is the same among themselves with the distribution of these 10 variables which are again the same among themselves. So, if I will compare the distribution of this with distribution of that and if I will see the difference, I mean meaningful difference which has a certain level, then I can say that yes, there was some kind of a change. Okay, how can I make a judgment about this? Well, first of all, I don't know these random variables, right? What I do know is I know their single value, a genuine value of precipitation in 1906, genuine value in 1907 and genuine value in 1915. Same thing here. I know only one single number. So, x I do know. I know all these 10 and I know these 10. I don't know the random variables themselves which means I don't know their distribution. However, based on these values which these random variables took, I can make certain judgment about distribution of each one of them because first of all, we are assuming that they are exactly the same distributed and they are independent which means that I can talk about actually one random variable called C1900 something. And this is also, since they are all independent and similarly distributed, identically distributed, I can talk about one variable which is describing each one of those and it took 10 different values. So, I know about this random variable, only it's 10 values. So, instead of one value per random variable, considering they are all identically distributed and independent, I can consider these as 10 values of one particular random variable. Same thing here. These 10 values are 10 values which this random variable took in 10 experiments. Why did I do this? Well, for a very simple reason, since I have now a multitude of values, I can make certain judgment about distribution of this random variable. This random variable and this and this and this, they are all the same distributed, right? So, I have these 10 values of, I can talk about one random variable. Now, how can I compare two different sets of values which describe two different random variables and make some kind of a statement about whether the distribution has changed or not? Well, in case of a normal distribution, normal distribution always defined by two main parameters, the mean and variation. So, what I will do is, I will just compare the means and I will compare the variation of these two variables using the samples which I have. And then I will be able to make certain judgment, right? So, what is the best evaluation of the mean of this variable? Well, let's call it X first, which is equal to X1906 plus X1907 plus etc. plus X1915 divided by 10. So, the average of these which are values, 10 different values this random variable took is obviously the best evaluation of the mean value of this. That's obvious, right? Now, how can that be, well, probabilistically justified? Well, very simple. If we consider the random variable X1 which is equal to X1906 plus etc. plus X1915 divided by 10, then this is one particular value of this random variable, right? And obviously, mathematical expectation of this is exactly the same as mathematical expectation of each of these guys because mathematical expectation is additive function, so it's 10 of them and divided by 10. So, by doing this calculation as I probably explained many times, I'm getting an estimate of the mean value of this variable. Now, similarly, I can talk about X2 which is X2006 plus X, etc. plus X2015 divided by 10. And this is the best evaluation of mathematical expectation of XE2 which is equal to their average. So, I have mathematical expectation of each of those guys, which is the same obviously, and I have a variation of mathematical expectation of each of those guys. So, it's X1 and X2. Well, I can compare now this average with this average of my empirical data of my sample. Well, and basically make, at least start making some kind of opinion about whether it's changing or not. Obviously, these are two different numbers. Obviously, question is whether they are sufficiently different to constitute a real change of the climate, a real change in this case of the precipitation. Well, how can we do that? Well, let's just think about it. The best way to approach this is to say that X2 minus X1 is estimate of mathematical expectation of the difference between them. So, instead of comparing them to each other, I will compare X2 minus X1 with zero. Now, that's much easier because now we don't have two different random variables. We have only one random variable, X2 minus X2, X1. Now, what is this random variable X2 minus X1? Well, I can actually make a hypothesis that mathematical expectation of this one is equal to zero and make comparison with zero of this value. And if it's significantly different from zero, then there is a change. Now, what is significant? The word significant should be justified quantitatively, right? Well, that's actually very easy in this case because if X1 is a normal random variable with mean mu1 and variation sigma1 squared or standard deviation sigma1 and X2 is normal variable with mu2 as a mean and sigma2 as a standard deviation. Then X2 minus X1 is a normal variable with what's its mean? Well, its mean is obviously mu2 minus mu1. And what is the standard deviation of this? Now, this is much more interesting. Standard deviation is basically a square root of variation. And we know about variation that the variation of the sum of two random variables is equal to sum of their variations in case they are independent as these are. So, this is average of the first decade, 100 years ago. This is average temperature in January of the current of the latest decade. And they are independent. And we assume that both are normally distributed. Again, it's an assumption. Nothing I can do about that. So, I know that variation is additive function for independent random variables. They are independent. The only thing is this is the difference. But does it really make a difference? This is a difference rather than addition. Subtraction rather than addition? No, there is no difference because variations are always added together. So, the variation is equal to sigma1 square plus sigma2 square. And square of them is the standard deviation of this random variable. Now, I have a variation of this. It's this one, right? It's an estimate of... That's easy. But I don't know about this because what should I actually say? I know that 2 minus x1 is an estimate of this difference. And I would like to know if this difference is zero or not. Whether this particular normally distributed random variable has the mean equal to zero. How can I determine it? Knowing the standard deviation. Look at it this way. If I have normal distribution centered as zero, right? I know that the values are actually going into this area with 95% certainty. If this is minus 2 sigma and plus 2 sigma, right? From zero. So, if I knew sigma, I can always say that with 95% probability, my variable, my number x2 minus x1, which is basically a value, a particular value of this random variable c2 minus c1, should be within this area, right? So, I will check this value. If I'm hitting something here, for instance, then I can say that, look, with 95% certainty, I cannot say that these guys are different. But if I hit something here, or here, with my x2 minus x1, then I can say that they are outside of the 95% probability of being equal to zero, which means it's not equal to zero, with 95% certainty, obviously, right? So, the only thing which actually prevents me from making this type of a judgment is this value. Well, as usually, since I have certain values, my random variable has certain values, right? So, I will use the sample, whatever I have, so I will use a sample variation. What can I do as a sample variation? Well, first of all, let's just think about, what are sigma 1 and sigma 2? They are variations of, correspondingly, c1, which is sum of xi divided by 10, and c2, which is also sum of xi divided by 10. This is up to 1915 from 1906, and this is from up to 2015, from 2006. So, I can actually calculate sigma 1 separately. I mean, obviously, not sigma 1 exactly, but it's sample variation, sample estimate. And sigma 2 for xi2. Now, knowing that all these xi1, 1906, etc., 1915, are basically the same as xi19 or star, or whatever it is, because they are all identically distributed and independent, and I know their values, I can actually make a sample variation of this guy, which is exactly the same as sample variation of these guys, right? So, if I have these 10 values, I can make a sample variation. I can calculate the sample variation. So, first of all, I calculate x1, which is equal to x1906, etc., plus x1915 divided by 10. That's my variation, my estimate of their mean. Then, I will take the difference, square. That's my, I have to divide, not by 10, but 10 minus 1, if you remember, for this variation to be non-biased. We already talked about this, so it's by 9, actually. So, I've got that, and this is a variation of sigma of 1906. Now, how can I get sigma1? Well, sigma1 squared is obviously equal to sigma19 star squared divided by 10, right? Because again, my variance is additive function, so there are 10 of them. Now, this factor goes out from the summation, because it's a square, right? So, it's 100, so it's 10 divided by 100. My variance of the average is actually the variance of each one of them divided by the number of participants. So, I can calculate this. Now, absolutely similarly, I can calculate sigma2000 and star, sigma squared, which is equal to, very similar to this one, but instead of 1906, I will have 2600 years later. And from sigma squared 2000 star, I can derive sigma2 squared, which is equal to sigma2000 star squared divided by 10. So, that's how I'm getting separately c1 squared and c2 squared, and then I can get into this, obviously, and that's my sigma. So, I'm getting, obviously not exact value, but a variation to the best possible scenario, whatever I have, with whatever data I have, I'm getting the variation of this sigma. Alright, so, knowing this, knowing sigma, I can actually check if I plot this if my mu2 minus mu1 is equal to 0. And that's exactly how I was basically doing this graph. So, I can say the following, that my mu2 minus mu1 are in between x2 minus x1 plus 2 sigma and x2 minus x1 minus 2 sigma, with 95% certainty. Now, all I have to do is to check whether this is 0, because this can actually be calculated, this and this. So, if this one is negative and this one is positive, then 0 is in between. And I cannot really say that the x2 minus x1 is significantly different from 0, because 0 is in between. However, if 0 is not in between, which means if both of them are negative, then 0 is somewhere there, right? Which means that this x2 minus x1 is significantly different from 0. And that means there is a shift negative, it means we are decreasing the precipitation. And if these are both positive, it means that 0 is somewhere here, right? So, if this is positive and this is positive, so 0 is on this side. And we can say that there is certainly a difference upwards of the precipitation during this 100 years. So, all I have to do is basically repeat this calculation for each month, let's say of 1906 to 1915 and from 2006 and 2015. And the same thing for annualized, which is the sum of the all months. And then make a comparison with all these things. And they did. Results are in spreadsheet, which is linked to in the notes. So, let me just do the summary. Well, the summary is that two months, I think two months, no, yes. I think it's two months. January and December have significant, no, not January, June and December, yes. June and December have significant change upwards, which means I can definitely say with the 95% certainty that precipitation in June in the 21st century is greater than precipitation in June in the beginning of the 20th century, 1906 to 1915. So, June and December at the same time, and by the way annualized as well. And primarily it's the June, which is actually contributing a lot to this precipitation of the annualized numbers. So, all other months, January, February, etc., except June and December, I think one of them has negative x2 minus x1. Most of them have positive x2 minus x1, but not significantly positive to get above the two sigma level. So, you can say that there is certain observed movement of the precipitation during the century upwards. However, in each month except June and December, it's insignificant. And actually in one of the, or two months it's even going down, by the way. In June and December and annualized, it is significant. I mean, it can be stated with greater than 95% probability that there is a movement upwards towards more precipitation. And I think amount of precipitation has increased. It's something on a level of like 3 or 4% in century. I don't know if it's significant or not, etc. So, I'm not making any judgment about what will be in the future. However, and again, I don't make any judgment about what contributed to change in precipitation. Human activity, not human activity, no idea. But the fact that there is a certain shift of annualized precipitation. Yes, I can certainly say that with 95% certainty there is a slight shift upwards in precipitation for annualized period as well as June and December. Other months I cannot say anything about. Now, how it's related to temperature, for instance. That was a subject of one of the previous lectures. Well, maybe somehow, I don't know how, and again, not making any judgment. I'm only about mathematics of that thing. So, this is the methodology which you can use if you are faced with the task of basically making determination about whether there is or there is no movement. No shift in certain parameters. Well, I wish you go into this website and go through the text which is accompanying this lecture very attentively. You can use your own spreadsheets. You can download my spreadsheets which I have on my website linked and see for yourself the numbers, etc. And I would appreciate all the comments about this. Okay, that's it for today. Thank you very much and good luck.