 Hi, I'm Zor. Welcome to a new Zor education. I would like to spend a little bit more time on linear regression in a very, very simple case of one independent and one dependent variable. Basically, it's a continuation of the previous lecture where we have derived basically the formula for the slope of this linear regression, this dependency, which I will remind you right now. Well, this lecture, as all others, is part of the advanced math for teenagers and high school students. This course is presented on Unizor.com. I recommend you to watch this lecture and all other lectures from the website, because every lecture has very detailed notes, and for registered students you also can take exams. And obviously, enrolling is also part of the functionalities, so basically the whole educational process can be facilitated using this website. Alright, now, back to linear regression. So let me just remind you what was done in the previous lecture, which is basically the foundation for a very, very simple exercise with data, which I'm going to present like a problem, basically. Something which you have to prove. Preferably, you will prove it just by yourself first, and then you can listen to this lecture. I do encourage you to do that. I mean, if you didn't try it, try it first, go to the text, the notes for this particular lecture on Unizor.com. It's all explained there, so try to basically solve this problem just by yourself, and again, it's using the lecture which was before. Let me remind you what it is. Now, let's consider we have a situation when there is a dependent variable, random variable, y, values of which we assume are almost linearly dependent on independent variable x. And this is the linear dependency, but I said almost because not necessarily each value of x, if substituted into this formula with certain fixed a and b will result in the value of y. There are some other factors which affect the value of y, and all these factors are summarized in some random variable which should be added to this, which you can consider as an error, or small additional factors which actually force y to deviate from this exact dependency on x. Now, where a, where epsilon is presumed to be a normally distributed random variable with mathematical expectation zero and some variance. Now, minimizing this variance is the problem which we have solved in that lecture before by choosing proper a and b. Well, let's forget about b for a second, let's talk about a, that's the most important part, but b is just a constant. So, what is basically the slope of this linear dependency, this variable a? So, what I have suggested as a calculation methodology for a, which would minimize the error. We can actually talk about minimization of the error, minimization of the deviation of y from this line. So, here is what was suggested during the lecture. For instance, we have statistical data for x and statistical data for y. They are x1, x2, etc., xm, y1, y2, etc., ym. So, what I suggested first is, let's calculate the average of both. So, the u is average of xm, v is average of y. Then, I suggested to centralize our data, which means consider x1 equals to lowercase x1 minus u and x2, etc., all. So, with capital X, I have this situation. Now, y capital X is better than lowercase x. Now, u is a constant, which we basically calculate once. We know these data, right? So, we calculate the average and we subtract this average from each of those guys. And we consider uppercase x1, etc., xm. Now, the advantages of this is that average of these is equal to zero, of course. Right? Because it's the sum of these and sum of these. Sum of these is this and sum of these is equal to n times n times n. So, if you will subtract them, you will have exactly zero. Now, similarly with y, I will use uppercase y1, which is lowercase minus v. Uppercase y2 is lowercase, etc., and uppercase n is equal to lowercase. And same story with capital Ys. Their average is equal to zero. Now, using these new variables, capital X and capital Y, instead of lowercase, I have come up with a formula for a, the one which minimizes its very convenient formula. And some summarization, of course, by i from 1 to n. So, it's x1, y1, plus x2, y2, etc., plus xm, yn. And this is the sum of squares of these guys. Now, let's talk about the problem right now. The one which I would like to present in this particular lecture. The problem is related to the following fact. Sometimes, in some textbooks or websites, the final formula is not expressed in terms of capital XY, but in terms of lowercase, X and Y, original data. And it looks obviously differently. So, let me just tell you how it looks in most of the textbooks. They suggest it as the following. It's xy average minus x average times y average, divided by x square average minus x average square. Now, the bar on the top means averaging. Now, in case of average X is the same thing as we were doing before, something which we have designated as letter U. Now, y is y1 plus yn divided by n. That's what we designated as V. Now, x square average is x1 plus x2 plus, etc., plus xm. So, all of them are squared. And what else? y and xy average is x1 y1 plus x2 y2 plus, etc., plus xnyn divided by n. So, these are averages. Sometimes, if you want, you can use a function called average. That's what I'm doing in my notes. Average of, let's say, x is the same thing, x1 plus, etc., plus xn divided by n. So, sometimes it might be a little bit confusing with all these bars on the top. With this function, it's a little bit better looking on the web. So, that's why I'm using this particular function. But it means exactly the same. It doesn't really matter. Let's use this bar on the top to signify the averaging. So, my question is, are these the same? Well, again, this is just a very easy exercise in data manipulations. And basically, all I want to do is to prove that this is the same as this, right? All right, so let's just try to decipher. These are presumably lower case, the original data, right? As explained here. And this is the upper case, which are centrally shifted original data. All right, so let's just substitute whatever we know instead of xi. It's expression using u and yi using v. And that would be the following. It would be sigma xi minus u times yi minus u divided by sigma xi minus u squared, right? That's our definition of capital X and capital Y. They are original minus, okay, I think I made a mistake here. That's supposed to be v obviously. The average of y's. So, this is just replacement capital X and capital Y with their original meaning, as it was explained in the previous lecture and in the beginning of this one. Now, how can this be converted into this? All right, let's just open the parenthesis. So, what happens? Now, each particular term in this sum is a combination of four different terms. One of them is xi yi, right? Minus u yi minus v xi and plus u v. And sigma summation is yi from 1 to n. Now, here I will also open the parenthesis, the square. So, I will have square xi square minus 2u xi plus u squared, right? That's what it is, okay. And this is equal to, now, summation. Now, the sum of these four is actually four of the sums, right? So, I put sigma into each one of them. So, that's sigma of this one, then minus u I can take out from the sigma because it's just a constant multiplier times sigma yi, right? Minus v sigma xi and plus u v. Now, here I will also convert it into three different signals. First is sigma xi square minus 2u sigma xi and plus u squared, okay? Equals, all right? Sigma xi yi we retain as it is, minus. What is sigma of yi? Well, that's n times v, right? So, it's 2nu v. Oh, I forgot one thing, by the way. It's not plus u v, it's sigma u v, right? Because sigma is going for each one of those, guys. Now, so that's this one. Now, this one minus sigma of xi is n times u. So, it's minus, now, why did I put 2? Oh, I missed it from denominator. No, just regular one. It's this one. Sigma of y is nv and there is a u. Now, here is also nv u. And plus, now, this is the constant summarized n times. So, it's nu v. That's what in my numerator. Now, in denominator, I have sigma xi square minus 2u. And I made the same mistake here. It's not just u squared, it's sigma u squared. Sorry about that. Alright, now, 2u and this is nu. And this is, again, we are summarizing constant n times. So, it's nu squared, okay? Great. Now, sigma xi yi, I can actually replace with n times average xy, right? So, in my numerator, I will have n xy average instead of this one. Now, this one. Now, this, look at this. Minus nu v and minus nu v and plus nu v, right? So, what do I have? I have minus n times uv. Now, u and v are average x and average y, okay? That's it. Now, in the denominator, I have sum of xi square. Sum of xi square is n times average of x square. Now, this is minus 2nu square and plus nu square. So, it's minus nu square. So, it's minus n and nu square is minus nu square. It's nu square is average of x square. So, it's x square, x average square. That's what it is, okay? So, as a result, what do I have? Well, basically, I have this because all I can do right now, I'll just reduce it by n and I have this formula. So, this was just a very simple exercise of dealing with data. Now, regardless of its simplicity, you really have to be very careful because as I was just doing it, I myself made a small mistake. Then I realized I made a mistake. I forgot to put sigma. So, what did I use, for instance, in this? I was using that sum of sums is equal to sum of the sums of... It's difficult to say, okay, sigma of sums is equal to sum of signals. How about that? Because this is summarization by index and this is summarization between terms. So, that's kind of obvious thing. Same thing is on the bottom. And you should not forget that this constant should actually also be summarized because there is a sigma in front of everything. So, it's n times. That's what I made a small mistake actually. But anyway, so my point was that our initial formula, which was, let me just remind you, I'm using capital XIYI divided by sigma XI squared. And this is the lower case actually. I should not put X like this. Alright? So, this formula looks a little bit simpler than this one. It's easier to deal with centralized random variables. So, that's why I decided I will centralize it first and then I got the simple formula. But for those who saw in textbooks or anything like this formula of this type, it's exactly the same thing. So, this particular little exercise was just to prove that this and this are exactly the same. Well, that's basically it for today. I just again wanted to join forces with some textbook information which you might obtain somewhere else and prove that this is exactly the same. Alright, that's it for today. Thanks very much. And by the way, I do recommend you maybe, if you didn't do it before, try to do again the same thing just by yourself to prove that one formula is exactly the same as this formula. These are capital X and Y which are centralized, which means from each data we have subtracted the average and this one is not centralized and that's why we have a little bit more complex expression for this particular thing. Okay, that's it. That's it. Thanks very much and good luck.