 Welcome to this seminar series on understanding the fundamentals of linear models. Now I've already done this on the whiteboard, I'll link that right up here or it'll be in the description down below. But as you can see here on the screen, we're going to discuss simple linear regression, one-way analysis of variance, one-way analysis of covariance and binary logistic regression. Now instead of explaining all these things on the whiteboard, I'm going to use a computer language as you can see here. I'm using a Jupyter notebook inside of Visual Studio Code and the kernel that I'm running is a Python kernel. So I'm going to assume that you understand a little bit of Python and even if you don't, you're going to pick up some code here. But it's more about understanding what happens inside of these models, how we use them, what the results mean, so that you can interpret at least your own models or those that you see in the literature. So in this first video then we're going to discuss simple linear regression. And right off the bat room in the simple refers to that we're only going to have one independent variable. So as we can see here in equation one, we're going to have an independent variable and from that we're going to try and estimate a value for a dependent variable. That is what modeling or at least the fundamentals of modeling is all about. And we use models in two sort of ways, two sort of main ways. The first one is to understand this relationship between your independent variable or variables and your dependent variable. We can express all sorts of values to understand that relationship. We can also just build a model that will predict or estimate a value of that dependent variable. So given any input values for our independent variable or variables, we can calculate an estimated dependent variable. And we would use that for instance in machine learning or in cases where getting the value, actually going out and getting the value of the dependent variable might be very expensive. Expensive in monetary terms, expensive in time that it takes or human resources. And if we have a set of data available to us, we build this model so that in future we only need these easier to collect independent variables so that we can calculate a value for the dependent variable. Now this table we're going to see all the time. Those are the, as I said, the four fundamental types of linear models that I want to talk to you about. And we see what type, what data type the independent variable must take versus what the dependent variable must take. So if we talk about simple linear regression, we're going to have some interval type. That means a numerical variable for our independent variable. Simple meaning we'll only have one. And for our dependent variable, we also have an interval scale variable or numerical variable. And that means if you have a data set and you have all your observations, there'll be pairs of values. One value for the independent variable and one value for the dependent variable. So here's our first bit of Python. We are going to import some packages. Remember Python is a base language where you're using version 3.9.10 as you can see right here at the top. And we extend the capabilities of that language by importing some packages. And those packages are written by people all over the world. And that is the beauty of open source software that there's so many people who contribute to this. And of course we've always got to state that these people have to be supported as much as possible, either in academia or in other financial terms because their contribution to our scientific world is just absolutely enormous. So we're going to import the Pandas package. And that helps us with creating a data frame that's going to store all our data. We're going to import NumPy. That's the numerical package NumPy. We're going to import the stats module from the SciPy or scientific Python module. And then we're also going to import the Patsy package. And that's going to help us create design matrices. And you'll see what that is all about. And I'm just going to click on run. And that's going to execute that cell importing for me all those packages. What you'll notice here, if you're familiar with Python, I'm not using namespace abbreviations. So we would usually say import pandas as PD so that I can use that abbreviation. But for here we're going to go for the full package name. Then for all my plotting needs, I'm going to import the plotly package and specifically the express module and the graph objects module. And then also the IO module so that I can say io.templates.default. And I'm going to set that to plotly underscore dark because I'm using a dark theme here so that my plots look nice on this dark theme. And then the heavy lifting for our models are going to be created using the stats models package. And we're going to just import two functions. So from the formula dot API module in stats models, I'm going to import the OLS function. And then from the stats dot ANOVA module, I'm going to import the ANOVA underscore LM function. Those are the two functions from stats models that we're going to use in the majority of our work. When it comes to linear models in the seminar series. So what is a simple linear regression all about? Well, it's about the research question. I have a research question and I want to state that very clearly with respect to the variables that I do have. So in this case of simple linear regression, I'm building this question. I'm going to have some intercept plus a slope times my independent variable. And that will give me an estimated dependent variable value. What does this mean in words? Well, it's asking, can I use my independent variable as a predictor for my independent for my dependent variable? In other words, stating this in terms of hypothesis testing, as far as a null hypothesis is concerned, I'm going to state that my independent variable is not a predictor of my dependent variable. The alternative hypothesis would be that my independent variable value is a predictor of my dependent variable value. And when we get to writing this in symbolic notation, I'll show you the symbols that we could use in our null and alternative hypothesis. First of all, a linear model, when we have a single variable algebra, that sort of must make you think of a straight line. And indeed, it is so. You remember from school c plus mx equals y, or you might have seen it as y equals mx plus c. That c is the intercept such that when x is 0, y is just going to equal c, and m is the slope, the rise of a run. So with these values, two parameters, the c and the m, you can give me any x and I'll give you a y. That's a straight line as you can see there. So let's just visualize that. I've got the little equation here, y equals minus 3 plus 2x. And I'm using the code here, first of all, using the a range function from numpy. I'm just going to have the values negative one to five, and I'm going to go up in step size of 0.01. That's going to give me a bunch of x axis values. Then I'm going to create a figure object. I'm just going to assign it to the variable fig. And it's a graph objects dot figure object. That's what we create there. And then I could just add some traces to that fig dot add underscore trace. And what I'm going to add is a scatter plot, also from the graph underscore objects module. My x axis is going to be all the x values. My y is going to be minus 3 plus 2 times x values. So each one of those we're going to calculate a y value for. And that's going to create a nice little line for us. And let's have a look at what that looks like. And as we suspect that's the straight line, this curve minus 3 plus 2x. And it has an intercept when x is 0 right there. We see that the x value is going to be at the y value would be negative 3. And then it has a slope rise over run. Now, when it comes to statistics, we use different notations. So instead of c plus mx equals y, we have these beta symbols. So beta sub 0, that's going to be the same as c in this instance. And beta sub 1, which is going to be the same as m. It's just a slope. And we multiply that by x. You see x is bold because it is a vector, the vector of all my independent variable values. And that's going to give me, if I do that sum, if I have these two values, the intercept and the slope, it'll give me this whole long vector of estimates for my dependent variables. So that's what we get. Now you see these little hat symbols there because I just want to draw your attention here to equation number 4. We see beta sub 0 without the hat plus beta sub 1, x plus some epsilon equals y. So the x and the epsilon and y, those are all column vectors. That's just a list of independent values, a list of all my error terms and the dependent variable value. So what we have here is what it would look like out there in the population as a whole. So in the population, we'll definitely know the value for beta sub 0 and beta sub 1. And so each individual will have an x value, an independent variable value. You see the x sub i plus their own specific error term and that will give them their real dependent variable value. But we don't know what that is. So if we write this out in words, by the way, it's the population intercept plus the population slope times the specific individual's independent variable value plus that specific individual's error value equals that specific individual's dependent variable value. That's out there in the population. Now we're estimating those because we only have a sample from the population. So in our instance, we put these little hat symbols on just to show us these are estimates of these two parameters in the population based on our sample. And if we have those two values, we are going to calculate an estimate of that, of the dependent variable value. So because we're only dealing with a sample, we're going to have a difference between what our sample's actual dependent variable value is and what our model suggests they are. And that difference y minus y hat gives us this error vector. So all the errors that we make in our model, and that is going to, we also call that the residual. So just want to show you what we build. So if we have this vector here of independent variables, 6, 8, 7, 6, that's our independent variable. And on the right hand side, we have these estimates for what our model is going to calculate for dependent variables. So if we have beta sub 0 and we have beta sub 1, we multiply this beta sub 0 by a constant term. So they're all ones because if you understand a little bit of linear algebra, you must know that we can't add these two vectors if they don't have the same dimensions. So this is dimension 4. That's dimension 4. And so we've got to have this set of constants. So multiplying the scalar beta sub 0 times each of these elements will just mean beta sub 0, beta sub 0, beta sub 0, beta sub 0. And if we have a value for beta sub 1, that's scalar times each of the elements in this vector. And that should give me then this. That's what we're trying to attempt. So how do we find these two values, beta sub 0, beta sub 1, those estimates? Well, they're different techniques. There's ordinary least squares or there's gradient descent. And that helps us calculate those two best values because I'm going to get estimates and I want them to be as close as possible to the actual values. So let's generate some data because this is all just words, words, words. Let's see what happens. Let's do this. See this in action. So I'm going to create some random values and I'm going to create them in a specific way. Now you can always import some data from a spreadsheet file or comma separated values file. But what I like to do is to generate my own data because I have control over the generation of those random values, which means I can better understand my analysis. So let's try and attempt that. Now I'm going to see the pseudo random number generator in Python here such that if I run this code cell, I'm going to get the same pseudo random numbers. And if you ran this code cell, you would get exactly the same value. So I'm going to seed it with the integer 42. And then I'm going to create two arrays of values. And first one, I'm going to assign to the computer variable independent and the second one I'm assigning. Remember the single equals sign means a sign assigned to what is on the right of the symbol to what is on the left. On the left is a very descriptive computer variable name independent and dependent. So let's set up some independent variables. I'm first of all going to use the numpy dot round function. It's going to take whatever the values are and I'm going to round them. That's comma to one decimal place. So what is these values? What are these values that I want? I take the stats module dot norm dot rvs. So that rvs function generates random values. And the norm refers to I want those values to be taken from a normal distribution. I'm going to set a mean and a standard deviation. So I'm setting the LOC argument to 100. That's a mean of 100. The scale argument to 10. That's a standard deviation of 10 comma the size is 20. So I want 20 random values from a normal distribution with a mean of 100 and a standard deviation of 10. And then all of those values, I'm going to get 20 values from that normal distribution. I'm passing all those 20 values, which will be in an umpy array. I'm passing that to the round function because I only want one decimal place. Now I'm just going to add some random noise to each of these. And that random noise is also going to come from the normal distribution. So I'm saying take all the independent variables, all 20 of them. And I'm going to add to each of them a very specific bit of random noise. And that random noise is also going to come from a normal distribution with a mean of zero and a standard deviation of 10. And you see I want 20 of those so that I can add element wise 20 to 20. And I'm also just want to round that off to one single decimal place. And that's what we have. We now have values for an independent variable and a dependent variable. And all I'm going to do is I'm going to add those as columns to a data frame. So it looks very much like a spreadsheet file. So I have my values, which I'm now going to attach to a pandas data frame. So we see I have a computer variable df and I'm using the data frame function from the pandas library. And I'm using a dictionary so that I have key value pairs. These two keys are going to be my column headers. And the values are just going to be those 20 random values that we just generated. Now let me call the head method. So I'm saying say df.head open close parentheses so that you can see the first five rows. Now remember Python is zero indexed. So it starts counting at zero. So you see an index column on the left hand side. So that's my first observation had an independent variable value of 105. And a little bit of random noise that we added to that dependent variable is 119.7. Once again, I took control over how I developed these and that will help me better understand what is going on. So they are the values that looks pretty much like a spreadsheet. I'm only seeing the first five observations. Remember, I have 20 observations or sample size of 20. And for each one of those observations, I'll have a value for an independent variable and a dependent variable. So you can attach any two interval type or continuous numerical data types there to imagine what it would look like in the field that you work in. And this is what I'm trying to show. Again, there's my column of constants. So we're always going to have this column of constants. And that is a column vector. So behind all of this or the easy way to do all these calculations is with any algebra. So in case you've never seen that, this would be called a column vector. And I'm multiplying a scalar times a column vector. That means whatever this beta sub zero is, I'm multiplying it by one by one by one. So it'll be all the same values plus beta sub one times each of my, the values that I have here are all my independent variable values. You see them there, 105.0, 105.0, 98.6, 98.6. So each one of these will be multiplied by beta sub one. And if I do that, so if I have beta sub zero plus beta sub one times 105, I want to get as close as possible to 119.7. And you see that is my dependent variable 119.7. For the second individual, second observation, second subject in my study, beta sub zero times one plus beta sub one times 98.6 must get me as close as possible to 96.3. And as close as possible means I have got to get these values, beta sub zero hat and beta sub one hat, those two parameters. They must take on values such that if I do this multiplication and addition on the left hand side, I must get as close as possible to those values. So we use these column vectors and scalar column vector multiplication. And let's express all of this as visually so you can see what I'm talking about. I'm using the express module here, the scatter function. I'm stating my data frame on the x-axis. I want the independent column on my y-axis, the dependent column. And the title is scatter plot of values for each of the 20 observations. And I'm also adding a trend line using ordinary least squares. So as I said, that's one of the techniques where we find these best values for beta sub zero and beta sub one. And it actually adds that trend line for me. And if I hover over that plot is very nice. It shows me the actual model. It shows me the intercept and the slope there. And it even gives me the coefficient of determination. We're going to look at all of those. So there's my model. So my model says for any input value, my independent variable value, I can plug that into my model, my straight line model. And it'll give me a value on this line. But you can well imagine there's a little bit of a difference between what the model is going to estimate or predict and what the actual value is. So there you see for each of my 20 observations, I see the actual dependent variable value given an independent variable value. So this plot really gives us a beautiful illustration of what linear regression is all about. We see our model. We see that for any given independent variable value, we're going to calculate a value that falls on the line and there's going to be these differences. But that line is the best fit line. And that line has two parameters, only an intercept and a slope. And if I get the best values for that, and through a technique such as ordinary least squares or gradient descent, I will get the minimum, smallest number of errors or residuals as we'll call them, the differences between what the actual value is for that dependent variable and what the model is going to calculate or estimate for us. So let's see in Python how we set that up. It's extremely simple. We're going to use this OLS function from stats model, OLS stands for ordinary least squares. And we pass it a little formula there and the formula has a tilde symbol in on the left-hand side and the dependent variable value on the right-hand side I can list with plus symbols all my independent variable values and in this instance I only have one. And these are the column names in my data frame because I say comma data equals df. And remember df was the computer variable that holds our data frame object. So inside of single quotation marks, dependent tilde symbol, independent. So dependent variable value given the independent variable value. And then I call the fit method on that, so dot fit and that's going to fit the data to this model and work out for me using ordinary least squares the best values for beta sub 0 and beta sub 1 and from there we're going to see a bunch of other results. So let's run this and see, by the way I've assigned this to the computer variable linear underscore model and now that I have this object that contains my model, I can call the summary method on that so linear underscore model dot summary and that gives us this beautiful summary table and this is the table that we have to understand. We see a variety of things here. We see r squared up there, our coefficient of determination. We see an f statistic and a p value for that f statistic and we see for given alpha value of 0.05 that's very significant the value that we see there. We also see a bunch of other things, most notably this little table at the bottom. We see an intercept there and we see an independent value there, coefficients for each of those, standard areas for each of those two, a t statistic for each of those, a p value for t statistic and then 95% confidence intervals around this statistic. So we've got to understand what all of these things mean and also this table, I want to use the opportunity here to introduce again our null hypothesis and alternative hypothesis. What we're interested in is this independent that we see right here in the table it has a coefficient of 0.8. Now that is our beta sub 1 hat and the intercept, the 12.76, that coefficient there, that's our beta sub 0 hat and what we're interested in, of course, in our research question remember we asked in our null hypothesis what's actually that this independent variable value is not a predictor of the dependent variable value and we express that as this coefficient, beta sub 1, that's the one we're interested in. So our null hypothesis is that beta sub 1 equals 0 and our alternative hypothesis is that beta sub 1 is not equal to 0 and if I had more independent variables other than continuous numerical variables that'll be a beta sub 2 and a beta sub 3 and our null hypothesis we state that they are all equal to 0 and in this instance we can see given a standard error we see a t statistic of 3.5 that gives us a p value of 0.02 so for given alpha value of 0.05 this is less than that we can reject that null hypothesis that this, the value that we see 0.84 is equal to 0 we can reject that and accept alternative hypothesis that it is not 0 in other words in our instance this independent variable is a significant predictor of our dependent variable it really is as simple as that so let's have a look at where these calculations come from and the first of all we're going to look at these coefficients but those are estimated parameters or the estimates for these parameters beta sub 0 hat, beta sub 1 hat that is what the coefficients are that's C O E F right in the table that's coefficients so let's see how that is done now I'm going to show you how to do it with linear algebra so you can just follow along if you're not into linear algebra you can't remember your linear algebra just listen along if you've had and you just need a little review so first of all to do this by hand I've got to create these two design matrices and the D matrices function is good in the Patsy Patsy package and that's going to express two things for me a column vector of Y values that is my dependent variable values and this matrix where the first column is going to be all ones that's my constant column and the second column is going to be all the independent variable values so I can just write the same little formula that we had before comma the data comes from the data data frame, D F data frame so if I run that I have values vector and I have a matrix now to do this by hand or by python code I'm going to just convert these two to arrays and so at the moment they are D matrix objects from Patsy but I'm just going to convert them to numpy arrays I'm going to call numpy.array and I'm going to pass Y to that again assign that to the same value Y so I'm just overwriting what is already in Y and I'm overwriting what's already in X so I'm just converting these to numpy arrays and now that they are numpy arrays I can use some linear algebra what I'm going to do here, let me just insert that let me just show you what these things look like now so let's take Y and I'm going to use indexing so I'm going to say just give me the first 5 so up to value number 5 so let's see what this column vector looks like and you can see there that's a column vector of all of my or representing at least a column vector as a numpy array of all my dependent variable values 119.7, 96.3 you'll remember them let's add another column and let's have a look at this matrix X so you can just see what that looks like let's just look at the first 5 values there as well let's run this cell and there we see it's a matrix so it's got 2 columns there and the rows there will be 20 rows because we've got 20 observations my first column is going to be a column vector of all ones and my second column is my column of independent variable values you'll remember the 105 and the 98.6 etc so this is what we're trying to do we set up our problem as a linear algebra problem we're saying given this design matrix X times this column vector of my parameters beta sub 0 and beta sub 1 that has got to equal this estimated values y and what we're trying to do here is the sum of squares and the way that we're going to do that what you can think of is the following just a little review of your linear algebra if I have m observation so m rows m for the military mic m if I have m rows in this first column vector of course the second one is the same in my matrix X here that column vector lives in 20 dimensional space now I only have two of these vectors one one one and the other one 105 98 so there's no way that I can span with those two column vectors I cannot span 20 dimensional space I'm going to span a subspace of 20 dimensional space my my vector of dependent variable values is not in the column space of the subspace that I'm spanning so what I want to do I want to create values beta sub zero and beta sub one such that I am as close as possible to this vector that this dependent variable vector that is now not in my subspace and it's that orthogonal projection onto the subspace that I'm spanning with these two columns that is going to give me the desired values and the best way to do that or the way to do that is going to give me the best values beta sub zero and beta sub one and so the way that we do this with linear algebra I'm just going to multiply left multiply by the transpose of my design matrix I do that on the left and on the right inside left multiply so it goes first and if I take X transpose and X now given that there's no linear dependence between these two I'm going to find a nice square matrix of which I can take the inverse so I'm left taking the inverse of X transpose X that's square matrix and if I take a matrix and it's inverse multiply that with each other I get the identity matrix that appears and right there I have an equation that will give me that column vector of parameters it's the inverse of X transpose X times X transpose times Y and that is going to give me beta hat and what we're going to do here is do that all in in code here so first of all I'm going to get the transpose of my matrix X so it's this X dot transpose I multiply that by X itself I take its inverse and then I do the whole multiplication so that in the end I'm left with beta and that it's going to have two values so let me just save them independently there you see 12.76 and 0.84 and this linear model that we created with the OLS function it has a couple of attributes one of them is the params attribute and that gives me the two parameters beta sub 0 and beta sub 1 and look what we just had 12.76 0.84 0.84 ordinary least squares using this technique that I've shown you in linear algebra that works and that is how we find those parameters it minimizes the sum of square errors so that difference between my predicted value and my actual value if I get that difference I square all of them sum of all of them that is what I want to minimize and that's exactly what we get here so now I actually have these values these estimated values so if I take my research question which was beta sub 0 plus beta sub 1 times my independent values that is going to give me the values that I fit this estimated values and let's look at the first five of those so that's going to be slight differences from what the actual dependent variable value was now fortunately for us this in our model is also a fitted values may we say fitted values there attribute and if I look at the first five of them you'll see that's exactly the same as the fitted values we see here so if we take beta sub 0 plus beta sub 1 times each of my independent variable values I'm going to get this column vector of and I'm just showing the first five of estimated values for my dependent variable and that's also hidden as I said in the fitted values attribute of my model those values are exactly the same the errors that we're talking about when we just generally refer to errors those are also termed the residuals and those as I said are the differences between what the actual dependent variable value is and what my value is in my model that I fitted and if I subtract those two I'm going to get the errors there we go so it's the actual values minus the fitted values and once again they are hidden in my model linear underscore model dot resist if I look at the first five there we see that's exactly what we have there the difference between the actual value and the fitted value now what I'm going to do here just to bring the point home I'm going to create a new column in my data frame I'm going to call it fitted value and I'm going to add all those fitted values to it which means I can show you the final result and that's what just what we had before when we just added the trend line to our figure so what I'm doing here is just creating a graph objects figure I'm adding some traces to that that's the independent and dependent I'm doing that as markers and giving it a name actual value I'm now creating a model so I'm creating a bunch of X axis values and that goes from the minimum of my independent to the maximum and with a step size of one I pass my little formula to that and display that as a line and then I'm also doing the fitted values so you can have a look at that code but this is what we have so these blue markers are the actual values and you can see for every actual value there'll be a value here on my model line and there's a difference between all of these and we square all those differences some of them and we try to minimize that through ordinary least squares which gives us this model that I've drawn here for you and you can see of course all my fitted values fall on this model because that's where it came from and you'll see for each one each of these pairs you'll see these differences between the fitted values and the actual values and now you know where these coefficients come from you know exactly what they mean and we've stated them in terms of a null hypothesis that that beta sub 1 equals 0 and alternative that it is not and in this case we saw that we could reject that hypothesis if we go just back just for one second here to the printout here on the screen of this model remember what will happen if we look at independent here there's our coefficient there's an equation for the standard error for all of these models there are specific equations for the standard error and if I take my coefficient and I divide it by the standard error I'm going to get a statistic and this statistic in this instance follows a t-distribution so given degrees of freedom we'll have different sampling distributions for that t-distribution and for 3.587 that gives us a very small p-value 0.002 and that value is then what we with that p-value for an alpha value of 0.05 we can reject this null hypothesis that this value is not equal to 0 accept alternative hypothesis that it is not so the null hypothesis that it is 0 the alternative hypothesis is not 0 that is what we accept and by the way then we use that standard error and the critical t-value for our degrees of freedom parameter to calculate confidence intervals bounds for confidence interval given a specific confidence level around that coefficient so those are the coefficients now let's look at the rest of the table values that we do find so we're going to look at these model statistics and we're going to use the ANOVA underscore LM function from stats models we pass our model as argument to that and so let's look at these results we see independent at the top and we see residual at the at the bottom we see degrees of freedom for each of those we see sum underscore sq that's the sum of squared errors we'll have a look at that the mean squared error an F statistic and a probability for that F statistic now have a look at this if I take my sum of squares and I divide it by the degrees of freedom I get the mean sum of squares divided by 18 I get that mean sum and if I take if I do that division of my mean squares I'm going to get that f-value and from that f-value given these two parameters for degrees of freedom I get a p-value so let's see how all of that fits together so in a model there are typically three errors and if we go back to this little graph that we see we can discuss these three errors the first error is the sum of squares due to the error or the residual so that's these differences between what my model suggests the dependent variable is going to be that estimate and the actual value so if I do that subtraction I'm going to be positive, some are negative I'm going to be positive for all of these that is my sum of squares due to the error but there's also hidden here sum of squares due to the regression what my model suggests the dependent variable is for any given independent variable value and the mean of the dependent variable because there is a baseline model here a very simple model that suggests that irrespective of my independent variable value I'm always going to predict the mean as this estimated dependent variable value and it's that difference between what my model suggests and what the mean is of the dependent variable that is my sum of squares due to the regression so if I do that little subtraction square all of those sum over all of those I'm going to get the sum of squares due to the regression the difference between what my model suggests and the mean for that dependent variable so let's get back down to where we were and that's what we have here the s s e the sum of squares due to the error so I'm summing numpy dot sum over the fitted value minus the actual value and you can do that in either way either way order because we're squaring over all of those so the sum of squares due to the error is the difference between what the actual value is and what my model estimates and if I square all of those sum over all of those I get the s s e so let's run that and I get a value of the year 1737.739 and look that's exactly what is what is there 1737.7 so for the residual which we mean we're using the word error here so that we don't get that confused with the regression so the sum of squares due to the error is 1737 and now you know what it is I'm just summing over all those differences so let's run that and save that's done and now we just have to think of the degrees of freedom which we see here is 18 so the way that we can think about that is our total sample size which was 20 individuals minus we can subtract one because we have one independent variable and we lose another degree of freedom because we also have the intercept so you can also think of that as 20 minus 2 which is just the number of parameters we have in our model so 20 minus 2 is 18 so that's 18 degrees of freedom when we look at the sum of squares due to the regression remember that's our fitted values minus the mean so for each one of those we're going to take the fitted value what our model estimates and the mean for the dependent variable the very simple model we square all of those differences we sum all of them and we get a sum of squares due to the regression 1242 if we go here and we look at this independent there we see 1242 the sum of squares due to the regression that now involves this value that we that we estimated and then we get this total sum of squares that is going to be the actual minus the mean so the actual value minus this very simple model the mean model and by the way that's also just the sum of the sum of squares due to the regression and the sum of squares due to the error so if we add those two we get this total sum of squares 2979 now that value helps us to calculate the coefficient of determination r squared but before we get there let's look at how this f statistic was done it is the ratio between the sum of squares due to the regression versus the sum of squares due to the error it's always what we have there and by the way when we look at analysis of covariance we're going to try and make that sum of squares due to the error smaller and the smaller the denominator the bigger the f value the more likely I find a statistically significant f value there so there we have the SSR and we divide that by its degrees of freedom and that will be the first degree of freedom of the two parameters that we require for an f distribution and we take the sum of squares due to the error and we divide that by its degrees of freedom and I showed you how to do that so if we do that here in this code cell and assign that to the computer variable if we get that 12.867 and if we go back to our little table there is the f 12.867 so you see it's just that ratio and by the way if we divide each step the numerator and denominator which is the sum of squares for the regression and the error by its degrees of freedom you get the mean squares for each of those and that's where we see there the mean square so that's the sum of squares divided by the degrees of freedom and there we get our f value now and remember for the f distribution we require these two parameters and I use this the stats.f.cdf cumulative distribution function there and I pass my f statistic value to it my f ratio 12.86 and then the two degrees of freedom the one in the numerator and one in the denominator which was 1.18 and remember that will give me on my distribution a little value and it will calculate the area under the curve up until that value but the area under the whole curve is 1 so I've got to subtract that from 1 and if we run that code cell we see that we get 0.0021 and that's exactly the p value that we saw in our little in over table the last thing that I'll just add here just for fun is that coefficient of determination that we saw and the equation for that is very simple it's the ratio of the sum of squares due to the regression over the sum of squares total and if we run that we get our r squared value which we saw before and the way that we interpret that is to say that our model in other words our independent variable explains 41.68 percent of the variance in the dependent variable that's how we interpret r squared so that's it for simple linear regression I hope you have a good understanding of this you saw the python code and what those values mean first it was so easy to do in python isn't it and then how do we interpret all of those results and how do we express them in terms of our research question in terms of these parameter estimates in terms of you know a goodness of fit so how well does this do as far as the r squared value is concerned and this overall f statistic for our model now we're just going to build on this when we talk about analysis of variance because it is exactly the same as this the only difference really is how we make that best fitted line because now we're not going to have a continuously medical variable as our independent variable in analysis of variance we have a categorical variable and but we're just going to build on exactly what we had here once you watch that video it'll cement what you understand here what you've understood here and it'll just expand your knowledge into having other data types as far as our variables are concerned