 So, in this very important tutorial, we're going to talk about linear modeling, the basics of linear modeling. And we're going to see tables like this, we won't do any of the calculations, but we've got to understand what is going on there. And what I want to do is to introduce to you what I feel is the four fundamental types of linear modeling. And if you understand those four and one is just going to build on top of the other, then you can understand the rest of linear modeling. So, which are the four that we are going to discuss? The very first one, very simply, is just linear regression. So let's put that out, linear regression. And from there, there you see all the results of linear regression on that side. And we're going to have a look at that. And from there, we just take a small little step and we're going to discuss analysis of variance, ANOVA. And you're going to see how those two are intimately related. And one just this ANOVA just builds from linear regression. From there, we just build a little bit more and we're going to go to analysis of covariance and COVA. And then lastly, the one that stands out a little bit, it's a bit a little bit different, but you'll see it's not a big leap from these and that is logistic regression, logistic regression. As I said, I feel that those are the four fundamental types. There's no such thing as the fundamental types. But I feel if you understand the fundamentals of those and you'll see this one just follows from that, this one follows from that, this one really follows from them, that you will understand most of linear modeling. So let's clean the board and let's start with linear regression. So what we can see there on the top is, as I said, typical results for something like linear regression, you're going to see some tables based on the software that you use. And so you're going to see a coefficient there and depending on how many variables you have in there, you'll see a coefficient for the intercept and one or more variables, the standard error, the T statistic for this coefficient and standard error and the P value. You'll see an F statistic in total for the model, a P value for that model, some metric, some goodness of fit, such as they are squared. And then you'll get another little table, show you the degrees of freedom, the sum of squares, the mean sum of squares, the F statistic for that and the P value for that. So these are typically the things that you're going to see. But what are we really after when we talk about modeling like this? We are going to say that we have an independent variable. Now, depending on the textbook or lecture that you have, these things can be called so many synonyms for these words and these acronyms. It's not even funny. So some predictive variable, some independent variable. And given that, we want to build a model so that we can calculate the value of a dependent variable. So in the simplest form, we're always just going to have a single independent variable, but we can have more. When we get to encova, we'll see we definitely need two, at least. And then we're going to have a single dependent variable. So that's all we're after. And one thing about these things, one thing about these models, I think we should really discuss and is to remember the fact that behind the scenes, there's just some mathematics going on. And those numbers don't really understand what you're doing. You plug in some numbers and you do a calculation depending on some equations and out pops the result. And if we think about, for instance, about research on humans and human disease or for many other things for that matter, biological matter, et cetera, something very intricate happens on a genetic level, on a biomolecular level. Something very, very complex is happening. And what we usually collect is data that are surrogates of something that's very, very complex. So we can go overboard and choose all these surrogate variables and build a nice model and get close to that. But sometimes you just have to stand back and ask ourselves, is this making sense? Am I chasing the numbers? So that's something that I always just caution against. Always stand back, look at your model and ask yourself, does this make sense? Is this the right thing to do or am I just chasing numbers? So that's always a very important thing to do. But we are going to use independent variables and we're going to try and calculate a dependent variable, which actually brings us to another important point, is why would we do this? Why on earth? So let's ask this big question, why are we doing this? Well, I think there are many reasons, but there are two main ways in which we can use this. And the first one is just to understand. We just want to understand something about this relationship between the independent variable or variables. Let's put the independence and the dependent variable. We want to understand how things fit together. We're not saying, this is really from a linear model, that these things really are the cause of that. As I said, if we think about human disease, there's some genetic, biomolecular, very complex things happening. And we collect some values for some big surrogate of all these intricate things. But we want to understand that relationship. And the other thing that we're trying to do is just to predict. So we're not trying to understand, although it will give us some understanding, of course, but we want to just predict. It might very well be that it's easy to collect these variables and to collect that variable might be expensive. And it's expensive when it comes to money, time, human resources, whatever the case might be. This might be very difficult to collect. We'll take samples such that we can build this model. And then from there on, we only have to collect these. And our model will help us calculate that. And we save that expense, whatever that expense might be. So those are two fundamental reasons, I think, for you to remember why we do modeling. So let's have a look at the types of things we're dealing with. So we are still with this very simple linear regression. So we're going to have some subjects in our study. And we can call that ID subject number 1, 2, 3. So we call these all our subjects in. Or we can all call that all our observations. And what we're doing in linear regression is that we have a numerical variable here, a numerical variable. And we're trying to predict another numerical variable. So for each observation, we're going to have two values. For each observation, we're going to have two values. Now, as I said, we can have more than one here. But as long as for what we're discussing now, these are all numerical variables. We're going to keep things simple. We're going to stick to one. So our independent variable is numerical. And our dependent variable is numerical. And in the simple case, why we stick to the simple case, because it's easy for us to draw that out on a two-dimensional surface, is that we have our independent variable value here, independent, and our dependent variable here, dependent. Such that this value will plug somewhere here. And this value here will plug somewhere there. And where those two meet, of course, that will be the value for that observation. And what we're trying to do in the end is to build a model. Now, this model is going to be easy. It's just in these two dimensions. And our model is going to be a straight line. So let's make this straight line somewhere there. That is going to be our straight line model. And that means, given any value, if we've built our model and we collect new data for this, we can plug it in our model. It goes up to the line. We use the equation for this line. And we read off what it would be. Of course, our model is never going to be perfect. So we might very well find that our model is going to predict for this independent variable is going to predict this value. And there's a bit of difference, of course, between what our model predicts and what this value is. So we can also say that let's just put in a few more points. You know, here are all the other values. But right here for this one, this is what we have. So there's difference between what our model predicts and what that really is. And that's going to bring us to all these errors that we're going to talk about. Because yes, our model does have some errors in it. And what we do with this, we always have this idea in our minds of some baseline model. And what we're going to use is the mean. So let's just think about that for a little bit. And that's just the mean of our independent variable. Let's make that a Y. So this is just going to be the independent variable's mean. So that's always going to be the baseline against which we want. Because you can think about it, our simplest model would be, let's just calculate the mean of this. And irrespective of what the input value is, we always just predict the mean. That's a fair thing to do. So we're going to sort of have this idea of that. Because now we can express these errors that we make in a model. So what color shall we use? Let's go for blue. There's three errors that is really happening here. The first one is this sum of squares total. So the total sum of squares. That's the first thing that we really have to get because we can see those values. You'll see them in a table. So we're going to stick to this one here as our example. This makes things easier. And the total sum of squares really goes, the difference between this mean and all the way up till this value of ours. Remember, it's all the way here, that value there. So that is going to be this total sum of squares. So it's the difference between a value and the mean. And what you're going to see is you take each one of those, the sub i, you've seen summation notation before. It's every value minus the mean. But we square all of those, remember? Because we're going to get negatives and we're going to get positives. Depending on whether it's below, that might give us a negative, that's going to give us a positive. So we square all of those and we sum that. That's the total sum of squares. And that total sum of squares is an important thing because we're going to use it in our coefficient of determination, the r squared. And as some kind of metric as to how well a model does. And we'll explain that. And then we get the sum of squares due to the regression. Now that's very unfortunate that r there in as much as, remember, this difference between what a model says and what the reality is, what the real value was, that is going to be called an error, an error or a residual. So a residual you're going to see everywhere. But that's not what this r is. That's the sum of squares due to the regression. And what you can see there is that little hat that we put on y. So let's just be very clear that this is going to be y. And let's make this x, you know, x-axis, y-axis. That this hat that you see up there, that refers to this thing that we are predicting. I put it in purple there, so let's stick to this purple. That is y sub i, so any one of these hat. Put a little hat on and say that's our estimated value or our predicted value or our calculated value based on our model. So that is the hat. So it's the difference between that and the mean. So that's the sum of squares due to the regression and this is our regression line. That's what we call it, a regression line. So the sum of squares due to the regression is our model versus this baseline model. As I said, our baseline model is always the easiest. We can just predict, all our predictions can just be centered around the mean of this dependent variable. So if we put it out there, let's put it out there in green because green is what I always use for definitions and that's a mathematical definitions of what this is. So this is the sum of squares due to the residual. So that's going to be the difference between what our regression line predicts this value to be and the mean. So that's the sum of squares of the residual and out here, of course, we have the sum of squares total. And that leaves us with this little one, the sum of squares due to the error. And we could say due to the residual because that means the same thing, but we've used the R there for regression. So it's the sum of squares due to the error. And that's a very important one. That's the one that's going to make the difference for us when it comes to our P values. Very important one. That's the difference here between what our model suggests it's going to be and what reality really is. That's the sum of squares due to the error. And as I said, we take that difference and we square it for each one, for each one there would be another sum of squares due to the error. So we just sum over all of those and that will give us the sum of squares due to the error, the sum of squares due to the error. Now you know what that is when you see a table like that. You're going to see the sum of squares and depending on the software that you use, it's all going to be very different. But that's what this means. It means this little, these little differences. There's the total and there's the sum of squares due to the residual, what our model suggests it's going to be or calculates that it is and this baseline mean model and the sum of squares due to the error, that's the one that we are after. So let's talk about that little one because how do we get this line? Where does this line come from? So let's make some space and let's discuss sort of where this line comes from. So for each of these four fundamental types, there's going to be a different way to construct this model. So let's just sort of put it back. We'll make it a little bit smaller so we have a little bit more space and there is our model, it's up there again. How do we get to this? So in the case of a simple linear regression, it's going to be just a straight line. Now you might remember from school that a straight line has an equation and because it's an equation, it's put it or definition, we'll put it in green. It says y equals c plus mx. Now of course in school you're going to see it, y equals mx plus c, doesn't matter. So it says any value that I get for x, I plug it into there, the m is a slope, rise over run. So if I have this line, it's rise over run. So that's delta y, change in y over change in x. You might remember that that gives us the slope. If this goes up, that one is more and this one is less, if it goes down, now this one is more and that one is less. That's just the slope of our line and that c is the y-intercept. That is, remember where this line crosses our y-axis because if I plug x equals zero in there, I'm just going to get y equals c. That's going to fall away, so that's our intercept. So what are we after here in linear regression? Well, this line of ours is not going to give us y, it's going to give us y hat. So it's not giving us the actual value, the actual values will be dotted around here and that's going to equal beta zero, which is nothing other than c. It's just different symbols, exactly the same thing, plus beta sub one x. So the m becomes a beta sub one, the c becomes a beta sub zero and that's what we have. And what we have to find is these two values, okay? What we do have is all these values, there they are, all the independence. Now we don't have y hat, we're trying to calculate y hat, but at least we have all of these because what we're going to have, of course, remember is we have all these little sum of squared errors. So we're going to have that the actual values are dotted around this, that's our actual values. And so in the end, what we're going to have is that y, the actual value, that is going to be the intercept plus the slope times the first one, times x, we only have a single x here, plus some error because we make this little error every time, every time, every time. And you might have heard that we try and create this line so that these errors are the smallest possible. We try to make the smallest possible error and that's going to give us the best line. And how can we change this? Well, we can change the slope and we can change the intercept. So those two things will completely define where this line is, the intercept and the slope, the intercept and the slope. And every time we'll make some error. If we add that error to every single one, so if we make this sub i for every specific one, for every specific x value and everyone will have its own little error, that's how I would get there. I'll have the intercept slope times the independent variable plus a little error and that might be minus, that's plus, that's minus, that's plus, that's going to give us every y. But if we take that away, we only have this, that's going to be a value on the line given any kind of independent variable value and that gives us our prediction on that line. So that's all we're after. And then it's very easy to just to expand on that. We'll have that, we'll have this sort of thing, b sub zero plus b sub one x sub one plus b sub, oops, there's a little one, b sub two x sub two plus et cetera. So I can have more independent variables. I'll just have to calculate more of these. And what we see here is exactly what we see there. The coefficient are those two things that we see here. So in this example that I've got on the board for you, the intercept, that's beta sub zero. So let's put it up there. That is nothing other than beta sub zero and this beta sub one is my independent variable there. And of course, if you bring in the data table and there's a specific word for that, say height or weight or whatever it might be, that's what's going to be written there in the tables as you print them out, but that's what those coefficients mean. So if we have those two values, we can say that if we want to calculate, so let's do a calculation. We're gonna say that for any specific value, let's do this one. So we're gonna have Y, or anyone, let's keep it generic. The prediction is going to equal 12.7618 plus the slope 0.8432 times this value of X sub I. So there's my beta sub zero, there's my beta sub one that comes straight from those coefficients. So that is my intercept and my slope for this best line. And no matter which one of these values I plug in there, it's going to give me a new value here, which for that specific X value, my specific independent variable value, wherever that is, it's gonna go up there and it's going to give me this calculation, this predicted value or estimated value of the dependent variable. And as I said, if we choose these, the best that we possibly can, those two, and those would be the best. That means that line is slope intercept and it's so perfect that these little errors, if I sum over all of them or the squares of them, because we always want positive values, because if I subtract, you know, in a different order, I'm gonna get a positive value, I'm gonna get a negative value, or other way around, we square them so that we always get positives. And that if I have those best values, these errors will be minimized. And you've heard the term least squares, we're gonna use the least squares method when it comes to linear regression to calculate this slope. But those coefficients, you clearly now understand where they come from and of course if we had more, that'll be plus beta two, plus beta three, et cetera, and it'll be each of these independent variables. So that is really what we are after and that's what gives us those values. So now that you have a good understanding of where these come from, gotta just give you a little bit of an insight as to this idea of how do we calculate these. So if you're into linear algebra, you've had a little bit of linear algebra, you'll understand what's going on here. If you don't, don't worry about it. The course is not about, and this tutorial's not about understanding how to calculate this is understanding what these things mean. It's to understand what these things mean. So if you don't, if you wanna skip this part, by all means, otherwise have a look at it because it's quite interesting. What we would do in linear algebra is that we have this idea that we want to solve for these two values. And if I had more, of course there'll be more, but at the moment we want to solve for those and what we're going to have something like this. X, which is a matrix, beta, which is going to be a vector and that is going to equal Y and that is also a vector. So these are column vectors. So if we just expand beta here, it's gonna be a column vector. So that's gonna be beta sub zero and beta sub one at the moment for us and that is a column vector. And Y is also column vector and that is our dependent variable values. All of them there. And X is going to be a matrix. So X is going to be this matrix such as the first column is always going to be ones and the second column we're only going to have two is these numerical variable values that we have here. So that is going to be what we have there. So if we can express things as a, might call this a design matrix, you might see that. We have our vector of unknowns, that's what we're trying to solve for this column vector and on the right hand side, we have our dependent variable written as a single column vector. We write that. So that allows us to use linear algebra because bar certain assumptions that we have to make and we're not gonna discuss all the assumptions when we come to analysis of covariance there, it's going to become more important. But this is not about the assumptions here, we're just trying to understand what is happening here. So we understand what, let's take this out and keep that one. We know what's going on here. What you could do in this instance is you want to have an invertible matrix. So what we can do, we can left multiply by the transpose of X. So I'm gonna get X transpose X beta. So on the left hand side, I did a left multiplication. So I'd better do a left multiplication on this side as well. And now I have a square matrix. If you understand a little bit of linear algebra, you'll know that this is a square matrix. There's some assumptions on this X. So this matrix X such that we can take the inverse of this. So I'm gonna take X transpose X, which is now a square matrix. And if those assumptions are met, I can take the inverse of that. It's not a singular matrix. I still have X transpose X there. I have my beta. And now I just have to do the same on this side as well. X transpose X inverse X transpose and still my variables there. I have the inverse of a matrix times itself. That'll give me the identity matrix. The identity matrix times a column vector is just going to leave me with a column vector. So I'm going to have X transpose X inverse X transpose and my column vector of dependent variables. So if I can solve that little equation, I get those two, beta sub zero and beta sub one. It's a lovely piece of linear algebra. As I say, there's some deeper stuff going on here, but the calculation once you have this is very easy. I can take me these values as long as I put one, one ones in the beginning because if I now plug in this, I'm not going, if I plug this values in there, I'm not going to get there. I'm going to get to my Y hats, remember, all of these values because I've shown you now where we just plug in those and we get an estimated value for our dependent variable. So now you can, if you went into, if you're not into linear algebra, don't worry about that. All we're suggesting is that all of these four fundamental linear models that we're going to show, they're all going to have some form of prediction line and they're going to be different and there's a way to do that. It's not important how we do it. It's important that you understand what these things, what these things mean. Now that we know what they mean, let's just step on to the rest of this table that you'll see at the top. The next thing you're going to see is the standard error of these coefficients. Now if there's a little thing about standard error that you can remember, remember the standard error of the mean is going to be something like the standard deviation divided by the square root of the sample size. So that's kind of the easiest form as we move on into complexity of these four linear types of models that I'm going to show you. That equation for these becomes more and more involved. So you can look up those if you're interested. Remember this fundamental one that just grows rapidly from there. So there is an equation for the standard error of that. And as long as you understand sort of it comes from this basic idea, as I said, it gets more and more complex. Why we use it though, is we can take the coefficient and we can divide that by the standard error. So it's going to be the coefficient, the coefficient divided by the standard error. And if I do that, I get a t statistic. And hopefully, in this part of your learning of statistics, you'll understand what a t statistic is because given its parameter for t distribution, which is degrees of freedom, I can express a p value for that. And the same is going to happen here. There's an equation for standard error. When it comes to beta one, we're going to plug in that into the equation or the software does this for us, gives us that we're going to take this, divide by that, that gives us a t statistic for its degrees of freedom. It'll give us a p value. So we have this idea. And remember, we're not really interested in the intercept. We're interested in this independent variable and we can say that that is a significant result if our alpha value is something like let's do 0.05. That is smaller than 0.05. So we reject our null hypothesis. We accept our alternative hypothesis that this beta one is significantly different from 0. Remember, our null hypothesis in this instance is always going to be beta sub one in this instance, equals 0. And if we had more beta sub two, beta sub three, beta sub four for an omnibus test, we want them all to equal 0. That's our null hypothesis. We reject that null hypothesis and we say that that coefficient, this is an explanation of that is significantly different from 0. That coefficient is significantly different from 0. Then you'll also see this kind of thing. We'll have an f statistic and a p value. So let's just discover where that comes from. We have learned what the total sum of squares is, what the sum of squares due to the regression is, difference between our model, the red line and the baseline model, which is the mean, sum of squares due to the regression and sum of squares due to the error, the difference between the actual dependent variable value and what our model collects because all of that plugs into the f ratio or the f statistic, which we are going to do. And there you see the equation for that. So let's run through that very quickly. That's a numerator and a denominator and each one of those has a numerator and denominator. So there's something fundamental going on here as well. We just have to look at this idea of making f bigger that makes our p value smaller and the way to do that to get this f bigger is to make the denominator smaller. So we want our sum of squared is smaller. So it's the way to make it the smaller and as we go on to something like analysis of covariance or we have more independent variables. There's way to take some of this away and almost put it up there and we'll certainly discuss that. But just keep that in mind if there's some way that we can do that. And then last thing just to remember that we have our squared measure of how well our model does and that's the sum of squares due to the regression divided by the total. So it's some fraction of the total and in our instance for our model here it was 0.417 and how we interpret that is we say that our model explains 41.7% of the variance in the dependent variable. So of course we want that to go up. One way to do it is just to have more, more, more, more, more independent variables but that's just wrong. So you'll also see something like an adjusted R squared and there's an equation for adjusted R squared. It kind of tries to bring, you know, you just adding more and more parameters or more and more independent variables into it. It sort of helps correct with that. You'll see adjusted R squared as well. The fundamental part there is to understand that this is trying to express the fact that our model explains a part of a fraction of the total variance in the dependent variable and at the moment it's only explaining 41.7% of the variance in the dependent variable. So that is what we're going to use as a fundamental basis for the rest of what we're going to try and achieve and that's analysis of covariance and first analysis of variance then analysis of covariance and then logistic regression and it's all going to build on exactly what we've discussed here. So now we're going to go on to analysis of variance, ANOVA. Now you might have heard of analysis of variance before and that was after perhaps learning about the t-test for independent samples and where we have a numerical variable and we have a categorical variable with two unique values such that we can split the data in two. So for those two groups we're going to compare the mean for that numerical variable and that we can use the t-test for students t-test but what if we have more than three groups? So we're going to have this categorical variable and this categorical variable is going to have for instance A, B, C, C, B, B, A and as our samples go sample one, two, three, four, five, six, seven, actually one, two, three, four, five, six, seven, et cetera and now we can divide our dataset into A, all the A's all the B's, all the C's and that is our, we might call that a treatment or whatever the case might be all these different synonyms for the same thing and now we can for a numerical variable so here we'll have some numerical variable so we can compare these three means for that numerical variable and we'll see if there is a difference between those three means what I want to show you here though and that's exactly what we use it for no problems there but I want to show you that this is nothing other than building on the model that we've had before for linear regression because this is going to be our independent variable so for here we'll just have all these numbers so this is going to be our independent variable so let's just put independent and we're going to try and predict a dependent variable so think about that for a moment so before that was numerical, predicting a numerical now we're going to have a categorical and there's only three there can only be three different values here I can only plug in either A or B or C and I'm going to use that to give me a numerical variable you can well imagine that well there's only going to be three possibilities for that calculation it's not going to be the straight line and for any input value for any independent variable a value I can have this continuous output of my straight line this is going to be slightly different there's only three possibilities here because there's only three possibilities there and this is some example of what we might see in a table such as that so let's just we have some idea of what these coefficients are now and here we see intercept and we only see B and C there's no A but in our example there isn't A, B or C and another complication that might sit in you know we can't do numerical calculations with symbols or you know nominal categorical variables so we better change that into numbers but I can't change that into a number like 1, 2, 3, 3, 2, 2, 1 because you know why did you choose 1, 2, 3 why did you not choose 10, 20 and 30 so we've got to come up with some other plan and as you can see there A is not even there so how does you know how does all this work and it works in this way so once again let's have our dependent variable here and we have our independent variable there now there's a little bit of a problem in as much as I only have A, B and C and here I might have some values and here I might have some values and here I might have some values now how do you draw a straight line through that and also if I put the A, B and C closer to each other you know the line will be different so that's not how we do the red line so let's call it the red line because we're always going to have this red line that's not how we draw the red line when it comes to analysis of variance the red line here is a little line there let's just do that I mean it's just for illustrative purposes a little line there, a little line there and a little line there that's our red line that's how our red line works because as I said I've only got three different input values for either A, B or C so I can only have three possible outputs and those are my three possible outputs and that is our model is going to predict the mean of this numerical variable for group A the mean for B and the mean for C that's always going to be its prediction so instead of that red line we had before this is our new red line it's always only going to be that but how do we do this numerically now we've got this problem so let's do our three groups A, B and C and we do what is sometimes called dummy variables dummy variables or one and machine learning for instance we call it one hot encoding one hot meaning just one of them is going to be a one and all the others are zero so imagine this first one was an A we'll put a one under A and a zero and a zero that would be one way to go and our second observation was a B so it'll be zero, one, zero so there's only one of them is hot one hot encoding and I've got three dummy variables now instead of a categorical variable that says A, B and C I now have in my spreadsheet a way of a three variables A, B and C and this is how we'll do it the third observation was a C so that'd be zero, zero, one the next one was a C, zero, zero, one and that's exactly what we do we have these dummy variables in our spreadsheet so we'll now still have the numerical variable here with its values and now instead of just that one category we now have three dummy variables or one hot encoding but you see the A is missing there because it really is redundant if you think about it if I take A away now you've got to be careful here which one you take away that is really going to be your base case the one that you decide you're measuring the others against okay we can think about it in that way so you can make it A or B or C for simplicity sake here we'll take A away because think about it if I have zero, zero what other, you know, what other chance does or what other you know, what else could A be? it must be a one if it's one, zero A has got to be a zero so given these two that one, you know, has to take on the value that it takes on it has no choice in that matter so this is our base case and if we have zero, zero we know it's an A if we have one, zero, we know it's B if we have zero, one, we know it's C we don't need anything else that A is now redundant so decide on which one you want and that's why we only have the B and C because this is what our spreadsheet's going to look like these two dummy variables and our numerical variables so we're going to use these two to predict that so let's have a look at you know what the equation is going to look like it's going to have based on those coefficients and here they go let's put them in that's beta sub zero, we know that that's beta sub one and that's beta sub two and if I had A, B, C, and D by the D there'll be a beta sub three there and all we're going to say is that our model is going to predict the value y hat that is going to be beta sub zero there we go in this instance it's one hundred and six plus beta sub one and whatever B is and B can only be zero or one plus beta sub one and C and C can only be zero or one so we have numbers we can do numerical calculations so if it's an A if we plug in an A you know A is not here it's redundant this will be a zero so let's do one for instance if this is a case of A we're going to have that our prediction is going to be beta sub zero and that's going to be well let's put it in beta sub one times zero and beta sub two times zero so those two fall away and my prediction for if I have an input that is category A it is just going to be beta sub zero if it is a B we're going to have that our estimated value is going to be beta sub zero plus beta sub one times now this becomes a one and beta sub two is zero these are values no problem and then as far as C is concerned then that's going to be beta sub zero plus beta sub one now this one becomes a zero plus beta sub two and this one becomes a one it's as simple as that and as you can see we're always going to get just three possible outputs given that we have those specific numbers and that's really it's as simple as that and now from here there's going to be some complicated equation to work out the standard error for beta sub zero the standard error the standard error for those and we take this one divided by that one gives us a t statistic this one divided by that one gives us a t statistic this one divided by that one gives us a t statistic and then based on the degrees of freedom I can work out a p-value for each of those and remember what we after here is our null hypothesis and our null hypothesis says that in this instance beta sub one equals beta sub two equals zero and are they significantly different from zero but you can see we're still dealing with linear equation we were just it's just linear regression here we were just clever in constructing these dummy variables so that we can still have our linear equation there we just have a different way of going about creating our best fit lines and there they are so let's take this away because I think you know you've got this now it's quite quite an easy thing to understand it's quite beautiful I think and that's wonderful so instead of me telling you we're using it just to compare these three means to each other you can see now what lies behind it it's just something that we've just built from what we had with linear regression so that's absolutely fantastic now we just have to think about how do we you know how do we look at our model how do we calculate this f statistic which you can see right there and how do we get a p-value which we can see right there we can see right there depending on how your software produces these tables this is what you're going to see so let's just talk a little bit again about the sum of squares due to the regression and our regression looks slightly different now it's not the red line the sum of squares of the error which is very important because it's still stuck here at the bottom and then for our r-squared value our coefficient of determination remember that the total sum of squares that's just the sum of squares of the residual plus the sum of squares of the error so remember those so let's just have a look at how these were done so for the sum of squares of the residual that says I've got to do this for each one of these I've got to do this for each one of those of my sample and that's why we're multiplying each time so what does this one say let's get all the a's together and we're going to take this mean of all the a's so it's that little mean that we have there and we subtract from that the overall mean because I can bunch them all together get the overall mean of that whole dependent variable so I take for each one of those it minus the overall mean so it's the red line minus there's going to be some oh I can't do it in green it'll hurt and there's going to be some overall mean for all of this so it's the difference between the sum of squares due to the regression just as we had for linear regression if you think about it it was the red line minus this base value which was always going to be the mean so it's this mean minus the overall mean but I've got to do that for each one of these cases so that's why I have that and I just multiply by how many there are that we're in group A and then I do that for B so it's the difference between our regression sum of squares due to the regression and the baseline model which was the overall mean so I do that for one, two, three, four, five how many ever I have in B and I do the same thing for C sum of squares of the regression nothing other than what we've seen before isn't that fantastic then the sum of squares due to the error and that is my real value minus my prediction and I have three different predictions here I don't just have this one so it's going to be each value minus its own group mean each value minus its own group mean each value minus its own group mean we square each of those and we sum over all of this and to that we add the same thing for the next group and the same thing for the next group and that's why you'll typically see in ANOVA that we say it is this ratio between the sum of squares between and the sum of squares within so the sum of squares between and the sum of squares within and how we are going to interpret this of course is just to say that there is a statistically significant difference between the means of these three groups that is how we use it but you can see the understanding the math behind it hey is it not linear regression in some form isn't it, isn't it beautiful and then there we go that will be the degrees of freedom remember that's going to be the same here so in this instance k minus one so in k for instance we had three parameters here minus one is going to give us the two that we had there and the sample space minus three gives us the 27 so apparently we had 30 samples, 30 observations 30 subjects in our study here because 30 minus three gives us that 27 so I still have this degree of freedom and that degree of freedom that means I can construct this F distribution based on those two parameters and we can work out where our F value falls and if we know that we can work out a P value isn't that absolutely fantastic we still have the R squared there as before and so this is the one that you have to remember and again I have this red line always going to have this red line and that is where we get our sum of squares due to the regression that is always, you know, my model versus the baseline model which will always be the overall mean you can think about it that way and the sum of squares due to the error sum of squares due to the error that is the prediction minus the actual value so it's the actual value minus their prediction the actual value minus their prediction the actual values minus their prediction. Absolutely fantastic. I think one thing I should always bring in here is we're getting this data and these surrogates of what's happening in a very complicated system. And so these things, I just want to just make sure, while I do all of these things, remember that we are trying to predict some dependent variable value, and we're always going to be slightly off. And so we put this hat on. It's an estimate. But remember that this is always going to be, I have my design matrix, and I have this column vector of estimates, because that's also an estimate. Remember, all these are estimates from our sample, and we're trying to infer that to a population. So these are always estimates. Just in case you come across that, it always actually has to have a hat on. All these have to have a hat on. Last thing I want to show you, again, with a linear algebra, in case you are interested in that linear algebra, now how would this go? Again, we're going to have nothing other than, let's do blue. We're going to have nothing other than, just think about our y, and that's going to equal x times beta. And that's going to be a vector. That's a vector. And that's our vector of actual dependent variable values. And our beta, in this instance, our column vector, that's going to be beta 0, beta 1, and beta 2, in this instance. There we go. It's right there. So that's what we're trying to go after. And what is this x going to look like, our matrix now? Well, that is going to be all 1s in the beginning, because we're going to multiply that by beta sub 0. And then we're going to have this b in the c. So we'll have 0, 0. What was that? That is going to be 1, 0, 0, 1, et cetera. So that's what this matrix is going to look like. And again, we're going to have in the end that this vector that we are after is going to be x transpose x inverse times x transpose times our column vector of dependent variable values. And if you do that, bar some assumptions and things that have to be met, we're going to get this beta. And that's going to be those best three values, sort of the best fit there. So can you see how wonderful it is, how these things all fit together? Now we get to the exciting one, analysis of covariance. And as you can immediately see, things look a lot more involved, but it really is quite amazing what's going on here. And we're just going to build on what we've already done. But let's just put and cover in its proper place, analysis of covariance. Why would I use analysis of covariance? I'm still building a model. I'm still going to have my independent variables. Let's put it independent variables. This time, we've got to have at least two. And I still have my dependent variable. So I'm still trying to do that. What I might use this for is a study where I can randomize subjects. So in this instance, we're going to have a categorical variable that we randomize them. And in this instance, you might also call it the treatment. And we're also even going to use the variable name treatment. The treatment and the treatment is going to have different levels. That's the unique elements in your categorical variable. Just as we had ABC before, we're going to have these levels to our treatment or classes. We call them in machine learning, et cetera. But I'm also going to have a numerical variable, which I need to control for. So we need to control for this because we don't have any control over when we chose our subjects. So imagine we've got a group of patients who come in and we can randomly assign them to one of, say for instance, three levels. But they all come with some numerical variable that is important in what our dependent variable is going to be that we're trying to predict. But we have no control over this. So mathematically, in statistics, we've got to control for this. So we'll usually say something like the following. So here is our example. In this case, we're going to use an example. And we have some new drug on the market. And we're going to form three levels for our treatment, three levels, unique values for our categorical variable, and such that someone might receive a placebo drug and someone might receive a low dose of our test drug and someone might receive a high dose. So we have those three groups in our randomized trial. But they come in and they each have a separate age. And we have no control over that age. We're not randomizing for that age. So somehow we have to control for that. So we're going to say this outcome is going to be dependent on which group they are having corrected for the numerical variable. And in this case, we call that a covariate. A covariate. Let's put it up there. Let's put it here. So we're going to say covariate. So we're going to control for the covariate because our numerical variable, we still have, all this time, we still have a numerical variable on this side. And in this instance, so this was an example of major blood loss for catastrophic penetrating abdominal trauma and vascular injury. So they have this estimated blood loss. And the way if you're interested close your ears, if you're not, we have suction devices. Of course, then you can see a number of blood. And the swabs are going to the abdomen, depending on how much they weigh. We can then see how much the estimated blood loss was. So imagine that is in the estimated blood loss, numerical variable, and this is our covariate. So we want to control for the covariate. What we want to say in the end is we're going to say, imagine then they all had the same age. Now, what is the effect of this main effect? What is the effect of the group that they were on? What was the effect on the outcome, on our dependent variable? If we controlled such that they all had the same age, we're going to control for this covariate. So that's the language that you're going to see. And what that's going to do for us actually is going to adjust these values such that we can compare these three groups to each other. So that's why you see analysis of covariance. There's sort of an analysis of variance in there. I have a categorical, nominal categorical variable as my independent variable. And I have a continuously medical variable as my dependent variable. But now I have a covariate. And there's the C that we add to the ANOVA but you can sort of see that is it really going to be different or are we just combining what we had before? And of course we are combining what we've had before. We have linear regression and we have ANOVA and somehow we are combining them. This time though we are going to talk about these different assumptions that we have to make for the use of this test because we really have to meet these assumptions for ANOVA. It's important here to talk about, really to talk about them. And I've listed the five here that we're going to talk about. This is really a linear model. We need these linear relationships. So the first one indeed is linearity. So let's take this away because I see, I think you can understand now, we're going to have this main effect we're going to control for the covariate and let's see how this all works. So imagine then I have my numerical dependent variable. So let me say that's our dependent variable. And here is our independent variable. So in this instance our covariate, independent variable. Here we have our covariate. That's what we mean here. Covariate, apologies for that lovely handwriting there. And let's use two colors here which is now going to be slightly different. What we might have is the following. That for someone who's in the placebo group we might have sort of this thing. And then for someone that is in the low dose group we might see something like this. We might see something like that. And then for someone who received high dose of that drug we might see something like this. So what we mean by linearity is we need this idea that there's a linear relationship between the covariate and the dependent variable. So what we wanna see if we were to build an ANOVA just for these linear regression, just for these we'll have that. And we'll have this. And we'll have this, we'll have these sort of lines, our model lines, what we had initially is just the red lines. We want this linear relationship between our covariate and our dependent variable. And you can well imagine that this might look different. They might all be squashed together. That means this independent covariate does control what the dependent variables can be as this one goes up, this one goes up but all lines are together. We might, on top of each other, we might have that they split apart completely. That means there is a relationship. Imagine there were just two, I just had two groups low in a placebo and active drug. They might look like this, which means for lower values of our covariate towards higher values, there's a sort of a difference developed. We can have that they also cross each other, that we have two lines that cross each other. You can imagine logically how this can really help you to see whether there's going to be a difference between these three groups. And what we want at least for now with this first one is concentrate on that we want this linear relationship between our covariate and our dependent. We really need that as our first assumption. And graph this out and this is really what you want to see. Now the second one is homogeneity of the regression slopes. And what that means is we really don't want any, we don't want this interaction between our covariate and our main effect there, our group. We don't want an interaction there. We want these slopes, we want these three lines to have the same slope. I've shown you or we've discussed, they might look like this or they might cross. That's not what we want to, to a great extent that's not what we want here. We don't want that interaction effect. We want to see similar slopes. Remember the slope of each of those lines, we want them. So between these two, we really want this homogeneity of these regression slopes. And for that we're going to look at interaction. So this is the result that you see up here, homogeneity of our regression slopes. So what you might do if you use a computer language, such as R for instance, so what we might do here is we're going to say something like this, let's put it, let's just put it here. We're going to say estimated blood loss that is our dependent variable given, we're going to have age, we're going to have group, and then we'll put something like age times group. And that is really just a multiplication. We're really just going to do multiplication of those two. And that'll give us an interaction term. And what you'll usually see then in your table of results, there is my main effect, my group. I have age, and then I have group and age, that interaction between those two. And bar anything else that goes on, we want to look at this interaction term but that p-value is not below our chosen alpha level, so for instance of 0.05. And if that is above our alpha level, we fail to reject that null hypothesis we have, but there is homogeneity of these regression slopes. There is no interaction between our main effect and our covariate. There's no interaction term there, and hence we've met this second assumption. And how you can see that visually, you really want these slopes to be the same. The next one we really have to talk about is normality of residuals. And what we're going to do there is we're going to do our actual, let's take this away here, we're going to do our actual analysis of covariance. We're going to run that. And given that, remember what that's going to do is going to give us predicted values for each of these. And it's the difference between what the predicted values is, our model is going to give us a predicted value for this. Remember, this is y, it's going to give us that y hat for each of those. And it's the difference between the estimated or predicted value and the actual value, that's the error or the residual. So I'm going to look at those residuals. So once again in Python or R or something like that, I can actually bring in a new column into my data set and just have all those residuals. So these are the ones that we're talking about, normality of these residuals. And they are tests to test for whether those come from a population in which they're able to, whether residuals is normally distributed and the one that you'll see very often is the Shapiro-Wolk test. So we're going to take all those differences between those two, not squared, just the differences between those two. That's a whole new set of numerical values. We do a Shapiro-Wolk test on that to test for normality. And all hypothesis is that they are from a normal distribution. So we want the p-value then for this test of more than 0.05 is if that is our alpha value. The next one we want is homogeneity of variance. Once again, it's the difference between these two, the residuals or the errors, that list of values that we're going to have after we've done our analysis of covariance and we're going to have three groups of them. So we're going to have all the placebos together, all the lows together, all the highs together and we do something such as Levine's test for instance and that is going to give us this homogeneity of variance is the variance equal across those three sets of residuals or errors. So that's very important and then the last one that's important for us is going to be the outliers and again, we're going to take those set of residuals and what we're going to do is we can standardize them in some way. So we want to know how many standard deviations away they are and we don't want many of them, we take the absolute value of all of those, we just don't want them to be more than three standard deviations away, the residuals, the whole list we can do of the differences between those two, we don't want them to be more. What you can also usually do in your software is look at a box and whisker plot and you'll see something like this and then if they are suspected outliers, they'll be out here, we don't want too many and I don't know how many is too many but we certainly don't want them to be out or as I said, you can just look at the standardized residuals and see that they're not more than three standard deviations away from the mean of those residuals. So these are these five assumptions, please keep them in mind, we really need them. What we want to do is to see if there's a difference between how we randomize people. The main effect there is the group that we assign them to, whether they've got a placebo, a low dose of the high dose of the drug, we want to see if that influences their dependent variable. We're correcting for a covariate, a continuous numerical variable and if we correct for those, what we really want is not to see, say for instance, as I said, we only had those lines we could have that they coincident upon each other, that's not a problem but we really don't want this or kind of this effect and the way that we test for that is just to create a model where we have that interaction between those two and we want to look for that p-value that we have for the interaction term. So very important this time to pay attention to these. So having done all of that, let's have a look at now doing our analysis of covariance. Let's take all of this out and just leave those as a reminder of what we're doing here and we'll, first of all, you'll remember in the beginning I said we're trying to make this one smaller, denominated to make that one bigger. So let's have a look if this really is so, building something more complex because I could just as well just have had a nova here, just have my independent variable being this categorical variable and my dependent variable, my numerical variable there. So let's just do that, we build a nova and again in something like R, you'll use a little formula like that I'm saying, the dependent variable given the independent variable and we might see something like this. So this is just an example of this data set that I've worked on here. We've seen this before, I'm gonna get degrees of freedom for my main effect, my group there and then there's my residuals and I'm gonna see a sum of squares and I'm going to see this sum of squares due to the error and unfortunately it says residual there but remember that's the sum of squares due to the error and we can see this is quite big at the moment, 9.7 million and then you'll see the rest of it. But let's think about now, I bring in an nova, I'm saying my estimated blood loss given age and my group. I'm going to correct for this age, make as if everyone had the same age, I'm correcting for that age and I'm now going to look at, does the group that they fall in, does that really determine then what the estimated blood loss is going to be? And then this is the result that you're going to get. So yeah, I've put my covariate in, that was a nova just without my covariate and now look at what happens to the residuals in this instance, I've gone from 9.7 million down to 717,000 there. So I'm gone from 9.7 million to just over half a million. So this sum of squares due to the error has shrunk because I've moved some of that. If I total all these and I total all of those, I'm going to get exactly the same value. My total sum of squares is still going to be exactly the same. But what I've done here is I have made this less. I've moved some of that out. Some of this is now born by that covariate of mine. It goes away from my denominator. I've made this smaller, hence my F value is going to go up, my P value is going to go down. So bringing in this more complex model, I'm taking away of some of that square values. Remember what I said right at the beginning though, never ever chase numbers. This has got to make sense in the setup of what you're trying to do. Always stand back and say, does this make sense? Do your proposal of your research first, say what you're going to do and then do it. Don't change it afterwards to try and chase numbers. But you can see here a beautiful demonstration of how we've taken away from the sum of squared errors that you know now what the sum of squared errors is. Such that if we do this, we can find all of these. Remember that that is just going to be this one divided by that one. We do the F statistic for that and we get our P values. And then in the end, we're going to get this kind of table. So again, we have the intercept, our coefficients, the low, the high, placebo is gone, as you can see. Because once again, remember what we're going to do. We're going to have these dummy variables. We're going to have a new one called placebo. Instead of this main effect column, we're going to have placebo. We're going to have low and we're going to have high. So this first person was in the high group, 0, 0, 1. And then still we're going to have age and then we're going to have estimated blood loss in our data set. The second one was also in high. The third one was in low. So there and then this one was in placebo. And once again, placebo, so that was a 1, 0, 0. Once again, we're going to have our base case. We want to compare this time. It makes sense to choose placebo as our base case. We're going to compare low dose and high dose to that. So this one becomes redundant. And once again, we know that if it was a placebo, those two would both be 0. So no problems there. We can take that away. So let's have a look once again at what those coefficients are going to tell us. They're going to say that the estimated blood loss, so estimated blood loss estimate, or y hat, that is going to be, let's put these in. In this instance, you've just got to see the order in which these are so, for instance, in R, you have to put the covariate first when you do this little formula. But your software, you've got to know a little bit about your software to know how to set these things up. But as it has done here, this will be beta sub 0. This will be beta sub 1. That will be beta sub 2, and that will be beta sub 3. So as we go there, we're going to say that the estimated blood loss is going to be minus 942.2 plus 697.85 times whether this was low plus 888. If it was high plus 147.1 times the age. So in this first instance, that is my actual blood loss. So if we're going to work out that one, it's going to be minus 492. And then that one will be a 0, so that will fall away. That'll be a 1, so that one will be there. And this age was 28 times that, and that's going to give us that first estimated blood loss. If it was in the placebo group, this one would be 0. That one would be 0. This whole, both of these terms will fall away. So that one minus that one plus 147.1 times the age of that person. And that is how we correct for this covariate. We correct for the age, and now we can look at does the group influence the estimated blood loss, our dependent variable. And very simply, that is analysis of covariance. Now, one thing we didn't speak about when we spoke about ANOVA and now also in an INCOVA, remember that if we have for our model here, I think I didn't put it on the board. But if we have a significant value there, we have to go on and we can do post-talk analysis only if we have significant values. And now we can compare placebo to low, placebo to high, and low to high. So we can have those pair-wise. And you've always got to do that with a specific test. We're talking Bonferroni, we're talking Tukey, HSD, et cetera. So those post-talk analysis, that's not what this is all about. I want to show you how beautifully all this fits in together with what we have just done. By the way, how is it going to look? Remember, we want to solve for beta. So we're going to have x. We're going to have beta. And in this instance, our beta is going to be these three, those four. And that is y, our actual values there. And again, it's going to be nothing other than this. We're going to have x transpose x inverse x transpose our actual y values there. And that's going to give us beta. And this vector beta is going to have these values, beta sub 1, beta sub 2, beta sub 3, beta sub 2 and beta sub 3. It's going to be this column vector. What does this x going to look like? Well, the way that this one was set up, this is going to be 1, 1, 1, 1s. All the first ones are going to be 1. And then we're going to have this low, high, and age. So in this first instance, that was high. So it's going to be a 0, a 1. And the age was 28, et cetera, et cetera. This is what this matrix x is going to look like. We take its transpose. So rows become columns, columns become row, times the original one, take its inverse, times the transpose again, times that column vector, and that it's going to give us those b. So just in case you're interested in it, of course, once again, there's all sorts of a little bit of deeper things that go there. Can you take this singular matrix that you can't take the inverse off, et cetera? And that's where we get some of the assumptions. It's from this underlying in the algebra. So once again, that's just a method for calculating those. So this one's slightly different. Please bear in mind those assumptions. So you've got to meet those assumptions really before using COVA. But it is a beautiful way. And sneakily, we can pass in this idea that we are taking away some of the errors. So we are correcting for this age, which we had no control over as we selected those subjects for our study. Well done, you've made it to the last of what I would call these four fundamental linear models. And we're going to talk about logistic regression. Logistic regression. There we go. We still have our independent variables. We still have our dependent variables. So we're going to take some independent variables. Remember, step back. Does this make sense? All those good stuff. And we're going to try and predict a dependent variable. Remember, it's either to understand this relationship or to build an actual prediction model. So logistic regression model you'll see often used as a first step in a classification problem when it comes to machine learning. But here we're using it in the context of statistics. So what is going on here? And why is this different? It is different because our dependent variable is no longer numerical. We had that for linear regression. We had it for NOVA. We had that for NCOVA. And now it is nominal categorical. And we're going to stick to the special case. We're going to say that this is dichotomous or binary. Dichotomous. So there's only two possible outcomes. Yes and no. Type 1, type 2. AMB, whatever the case might be. Our unique levels there for our factor there is this is 2. There's only two of them. So we've got to predict ye or ne in this instance. And for our independent variables, we're going to have three different examples here. We have a nominal categorical variable as our independent variable with different levels. And we have a continuous numerical variable as our independent variables. Nominal categorical continuous numerical as our independent variable. And we can also combine those. So that can all go into this idea of predicting what this might be. Now we've got an immediate problem. So let's do this. Let's have our dependent variable there. And it can only take, this is our dependent variable, remember, it can only take one of two cases. It's either going to be, and let's make this, as I said, the numbers here is 0 and 1. It's the only values that it can take. And 0 and 1, because we do have this one hot encoding. And we'll talk a little bit about that. And let's look at the continuous numerical variable here. Let's look at IBR. I'll tell you what that is shortly. So given this, I'm going to have values here, and here, and here, and here. And I'm going to have values there, there, there, there. That's all I can have. There's no ways for me to draw a straight line there, because I want my output to be either 0 or 1. So let's just talk about why this is 0 and 1. Remember, we create our dummy variables. We'll have no and yes. And if it's yes, it will be 0, 1. And if it's no, it will be 1, 0. Again, this one is redundant. If this is 1, that one has to be 0. If this is 0, that one has to be 1. And we've got to choose which one is 0 and which one is 1. Which is the one that we're interested in predicting? Which is the one here that is of interest to us? And we'll make that the one. We can have this is the one, and that one is the 0. Then we're after all the no's, or the b's, or the a's, or whatever the case might be for this binary dependent variable of ours. But as you can see, we only need that 1. And we're still doing dummy variables. We're still doing one hot encoding. And that's why we have 0 and 1 there. So that should make absolute sense. Now, how can we draw a model here? The problem that we're dealing with here is if we make a prediction, our famous red line. We've got to put our famous red line in here. Our famous red line has to be constrained in all of that. And what we're going to build is this kind of thing. We're going to have this thing that we're going to build. And what this is going to turn into for us is a probability. A probability. The outcome is going to be on this line. And there is, we can choose where it's going to be. But for now, let's make it at 0.5 so that the output, given any input value here, the output's going to be 0.5. And then we'll say, if it's above 0.5, our model will say yes. And if it's below, it'll say no. Whatever you encoded those to mean what you encoded the 1 and the 0 were to mean. So we can have this idea of a probability of the outcome. So in this instance, it's always the probability of the one outcome. If IBL was down here, we'll say the probability of a 1 or a yes was very low. If it was here, it'll be there. And we'll say the probability of a 1, whatever that 1 means, is very high. That's what we're after, this probability. So somehow, we can't draw the straight line because then we're going to come, we perhaps can come below. And we're going to go above 1 as this line goes up. And we can't have a probability less than 0. And we can't have a probability more than 1. It's constrained on that interval from 0 to 1. So this is the equation that we're going to be after, the probability of it. So how do we link probability? And I used the keyword there, link. How do we link probability on this side? So we're going to only have a probability on the side between 0 and 1. And I'm trying to link that to my independent variables. How can I do that? Now, it was easy when this was continuous numerical. That was a very easy thing to do because I could just have this idea that y hat is now a continuous numerical variable. It can be anything or anything that we had before with ANOVA and ANCOVA. And this is going to be beta sub 0 plus beta sub 1, x sub 1, plus et cetera. But I don't have this now. I don't have pre-rein. So these independent variables of mine, how do I connect that to a probability? And you heard me. I used the term link. There are link functions that link an independent variable with a probability as opposed to with a continuous numerical variable. And that's what we after is this link. And there's lots of different link functions. Let's say that they're different link functions. Depending on what you are, we're going to talk about the common one specifically to this case. And that is the logit function. And there's different ways to pronounce it. Let's stick with logit. So that is going to be our link function. And you'll see that there's probit and there's other ones as well. So that's going to be our link. So the first thing we have to talk about really, and I'm sure you've done this before, we'll just review that suddenly. And that is the odds of something happening. The odds of something happening. And that is the probability of it happening over the probability that it doesn't happen. Probability being constrained from 0 to 1. In other words, it's the probability of it happening divided by the probability of it not happening. Not happening is 1 minus it happening. So if it's the, if you flip a fair coin, the probability of heads is going to be 0.5. See there, I chose heads, heads or tails. And this time, I'm going to do some heads. So it's 0.5 divided by 0.5. 1 minus 0.5 is 0.5. So the odds is 1 to 1. For heads, so for getting a heads, this is not getting a heads. And that is the odds that we're talking about. And in the end, that is going to be this probability that we're trying to model here. And what we're going to do is the natural log ln. Now the natural log, remember if we talk about log base 10, that's 10 to the power of what gives me a number. So if I have 100, that's going to be, the answer to that is going to be 2. Because 10 to the power of 2 gives me 100. The natural log, we have Euler's number there. E, 2.7, blah, blah, blah. I shouldn't say that, that's a beautiful number. It's one of the beauties of mathematics, it's Euler's number. So the natural log is what we after of this one there. That is going to be our link function. So what this is going to do is the following. Let's make some space. It's going to allow us to do the following. This link function, this logit function that we're going to have. We're going to say that the natural log of the probability, which we don't know, and it forms a binomial or the special case of that, the Bernoulli distribution, because we only have those two possible outcomes, that is going to equal beta sub 0 plus beta sub 1 x sub 1, plus if we had others. So that is the link between our dependent variable and our independent variables, because we don't have y hat here anymore. So that means I'm going to find these values, beta sub 0, beta sub 1, just as we've done before, and you've seen me use the linear algebra, so those coefficients, it gets a bit more complicated than that, but the same sort of principles, I can, calculations for me to work out beta sub 0, beta sub 1, et cetera, I can plug that in. But if I do so, this is what I get. I don't get the probability, I get this function. So how do we get from the probability that we really want? How do we get to that? Now, that's quite easy. I'm going to exponentiate both sides, which means I take Euler's number again, e to the power of this, e to the power all of that. And if I do this, if I have e to the power of the natural log of something, this comes out of the natural log. So it's going to be p one minus p, and that's going to equal e to the, let's call all of this, let's just call it alpha, just to make things, and that's not the alpha from before, please, that's not the way we want our significance. We'll just put the alpha there, which means this becomes very simple algebra, p is going to equal e to the power alpha one minus p, just bringing this to that side or multiplying by this on both sides, that is e to the alpha, and then minus p e to the alpha. I can bring these two together, p plus p e to the alpha equals e to the alpha, and now I can take p out as a common factor, so that's going to be one plus e to the alpha equals e to the alpha, and p equals e to the alpha divided by one plus e to the alpha. So there's a little bit of algebra involved, and remember this alpha is just all of this. So having gotten my coefficients, I can plug my coefficients in here, so let's put this one out here, my expected, my estimated probability that is going to be e to the power beta sub zero plus beta sub one, x sub one plus all of that divided by one plus e to the power beta sub zero plus beta sub one, x sub one plus all of that. That is going to give me the probability and that is going to fall somewhere on that red line, our famous red line. So this is the link between our dependent and independent variables, because we've got to constrain ourselves to this estimated probability on that interval zero to one. So I hope that makes absolute, I hope that makes absolute sense. So there we go. There we go, there we go, there we go. We can take all of these away. Now, the way that, you know, I'm just going to put that word out there, maximum likelihood estimation. So that's the way that is what we're going to use. You know, that's the method that we're going to use. We're not into all of that yet at the moment. We just aren't to understand what is going on. So let's examine this small section of a study and I've just chosen three of these or two of these just for illustrative purposes. Of course, this is not the final model. This is not what we're after. In this instance, we had ischemic bowel. So that's a part of the small bowel that becomes devoid of blood supply and it actually dies. You know, we have to do emergency surgery and we have to remove that piece of bowel. So this was the length of the bowel that was dead, that was necrotic. So the ischemic bowel length that had to be removed and so that's in centimeters. And we had the seniority of the principal surgeon there. So it was either senior resident, it was an attending or it was an emergency surgery specialist and acute care specialist, acute care surgery specialist. So those were the three levels of our treatment and we could get that information, the ischemic bowel length and then whether they needed a relook, what we call a relook laparotomy. So patients would have the index operation come in, it's an emergency, they get their surgery by one of these three levels of seniority of the surgeon and some of them will require a second or even more surgeries and there are different ways that we set that up. We do on-demand versus planned, et cetera, but that they need a second look, yes or no. So we're trying to use these two to predict that and this is logistic regression because we want the probability of that, that is what we're after. So we've got three examples here of what you might see in the tables and we've got to understand what is going on here. And this first one, our only predictor is continuous numerical. So we have a continuous numerical variable, is our deeper independent variable and we have this relook as our, this nominal categorical variable as our dependent variable. So what are we gonna get in these tables? We're going to get, depending on the software, it might say constant, it might say intercept, whatever it might say, usually constant and then the coefficient there for our continuous numerical variable. And once again, that says, if we do the following, the probability of a relook, let's do it in blue, the probability of a relook in this first instance, probability of a relook that is going to be e to the power, so that is minus 5.71 plus 0.0519 times the ischemic bow length divided by one plus e to the power not minus 5.71 plus 0.0519 times the ischemic bow length. That is what that means. We plug that in and we're going to get somewhere on that red line and we can draw the line in the sand somewhere. If we draw it at 0.5, if it's higher than 0.5, we'll say we are predicting, our model's predicting that that patient will have a relook and if it's below that, it's not. And then we get into all sorts of other things, where to draw that line, we're thinking about under the receiver operator characteristic line, et cetera, all those kind of things, which is not what this tutorial's all about. So that is how we would use those coefficients. So in the second model, we only have the nominal categorical variable there. Again, one of them is chosen as baseline and you can see they've gone there. And you know now the dummy variables, choose one of them, we can get rid of those. So we're only going to see this. And then in this last one, we're going to have the nominal categorical and the numerical variable in there. And for that, we're just going to have in this instance, a beta sub zero and a beta sub one for our coefficient values, a beta sub zero and a beta sub one and a beta sub two. And here we're going to have a beta sub zero, beta sub one, beta sub two and a beta sub three. And you know now that that is just going to extend those. No, you know, no problem there. And so we've looked at the odds, remember odds, that is just the probability of something happening over the probability that doesn't happen. But now we've got to discuss the odds ratio because that is what is hidden inside of these coefficients. So let's say that we have an unfair coin and it has a probability of heads, probability that we have heads, let's write it like that is 0.7 and we have a fair coin and this fair coin has a probability of heads, oops, heads of 0.5. And now we can look at the odds for each of these and then the odds ratio. So the odds here would be 0.7, one minus 0.7 divided by 0.5, one minus 0.5. And that will be the odds ratio. So it's an odds over an odds and that's going to give us the ratio of the odds of this getting a heads on this unfair coin over this fair coin. So that's the odds ratio. Now let's see how that corresponds to what we have here. We see here that we have coefficient, coefficient, coefficient and here we'll have a bunch of coefficients and we've seen how they fit in with that link function of ours to work out the probability. Now though, we need to do the following. Let's take, let's take blue, let's take wireless number E, the base E and we're going to exponentiate that 0.0519. That's what we get there and that is going to give us, if we do that, that is going to give us an odds ratio. So I'm going to grab my calculator, let's see what that is. So there we go, the trusty old HP prime here on my phone and we're going to say E to the power, what do we have? 0.0519 and if we enter that, 0.0519 and if we get that, we see that we have an odds ratio of 1.05. So that's going to be equal to 1.05, an odds ratio. What does that mean with respect to this ischemic bell length? Now the units in which we measure ischemic bell length is in centimeters and the unit is one centimeter increase in those units and what this odds ratio is telling us, if I have an ischemic bell length of 130 over this ischemic bell length of 129, so every one unit increase increases my odds of being in my number one class by 1.05 and that's any one unit increase, I can go 134 to 135, doesn't matter, it's that one unit increase for numerical variable and that unit, whichever this was measured in, a one unit increase is going to increase the odds, not the probability, the odds of being in my number one class by 1.05 and how do we express that 1.05 as a percentage? Well, we look at it, is it more or less than one? If it's more than one, we take that odds ratio, 1.05 and we subtract 1.00 from it and that equals 0.05 and we multiply that by 100 and that equals 5%. So we would say that every unit increase, every one single unit increase in ischemic bell length is going to increase our odds of being in the number one class in this instance of ours, it's a relook by 5%. And what we've never put out here is on the right-hand side of all these tables, you'll also find these 95% confidence intervals around our coefficient and those confidence intervals actually become quite important because imagine we have an odds ratio of just one. If we have an odds ratio of just one, it means I can subtract one from that and that's nothing. So it doesn't change our odds of going there. So if we have 95% confidence intervals and they go from something like this, 0.73 to 1.84, it straddles one. So that one, which means in our instance for this coefficient that it does not change the odds ratio there, it does not change the odds of getting into the one class, doesn't change at all for every one unit increase. It straddles the 1.0, which sits here somewhere between those two. That will mean that this p-value is not going to be significant because what we're saying is in the 95% confidence interval, we both decrease or increase. It's somewhere between the decrease and the increase because if we for instance have, let's have e to the power, let's look at this one, minus 0.2231, let's do that. Trust the HP prime, let's do that. So we're gonna do e to the power, we have minus. So let's do that minus, not there. There we go, minus 0.2231. And we look at that, that is 0.800. So that's less than one. So what do we mean by less than one? Well, this is less than one, so we take 1.00 and we subtract that 800 from it and that leaves us with 0.2. And that 0.2, that is our decrease in the odds of getting into the number one class, a decrease in the odds. And once again, if our 95% confidence intervals for this was something like 0.3 to 0.9, both of them are below one, they are both below one. In other words, we are going to get, in that 95% confidence interval, we are going to get a decrease in the odds of being in that class for every one unit increase. So don't worry about that now, I just use the value here so that we can get to something that is below. So if this is a negative, this is going to be below one. So if it is below one, it decreases our odds. If it's above, it increases our odds. If it does straddle one though, remember always in our 95% confidence intervals, as I say, you'll always see it there at the end, we're not going to get a significant p-value. Now, within reason. Now, let's have a look at the second one. Let's put the phone down. Let's have a look at the second one. Now, I have a numerical, I have a normal categorical variable there, and we see that we have those two levels, but we don't have that level. What does this mean? Once again, we'll have a look at those. It'll be e to the power, and now we can use that example, 0.231, and remember that was going to give us an 0.8. That is less than one. We're going to get a decrease. We're going to get a decrease in the odds of having to go back to theater for a second look. And that is increasing or decreasing the odds over the base class. And here, what was our base class? It was this one, the senior resident. Now, we have a more senior person, and a more senior person, we would think the likelihood of having to go back to theater would be less. And that's certainly what we, in this example, at least what we are seeing, the fact that this is less than one. And because we see here, we actually had a p-value of 0.76, so that wasn't significant, and we would have probably seen if we had the whole table there that our 95% confidence interval would have been something, maybe it was 0.4 to 1.7, something like that. So one is captured within there. So within a 95% confidence interval, it might decrease the odds or it might increase the odds. The important thing that we have here, though, is this is the odds ratio. If I take e to the power of that or e to the power of that, that gives me the odds ratio. So how does the odds change compared to the base class? So that's why it's also very important to choose your base class. This is always going to be over the base class, over the base class. Yeah, it's per one unit increase for that. And when we get to this, this is going to be over the base class, odds ratio over the base class and over one unit, a one unit increase. That is what these are going to give for me. So something else hidden within these coefficients. Not only can we use them in our equation with our link function to calculate a p-value, but we can also look at these odds ratios. And remember, for numerical variable, it's for every one unit increase and for a nominal categorical variable as far as our independent variables are concerned, it's over the base class, over the base class, which is that is what we've chosen. So you can see it's two important things that you have to decide. Which one of these is your one and which one of those are your zero? And the p-value, the probability, the probability estimate that we're going to work out is the probability of being in a one. And the other thing that you have to think of when you have these nominal categorical independent variables is which one of those is your base class, because your odds ratio is always going to be how that odds, as your odds higher or lower for being in the number one class over and above the base class for those two. So that's it for me, really, the four fundamental linear models. And if you understand those four, I think you can do the rest. And there's so many more that you can do here. So really for me, important to understand. And I think some parts, please watch it again so you clearly understand what is going on as far as these results are concerned. And as you can see, we started with linear regression and we just built up and all of these things that are just really beautifully tied in with each other. So thank you very much for watching this tutorial. Please subscribe, leave a comment if you've got any questions. Now all of these results, they all come from some data and that's all inside of some notebooks that I have. Those are written in Python and those will be in Jupyter notebooks. And if you want those, please look at the link below for my Patreon account. If you are a member there, you can have access to these Python files as Jupyter notebooks. So you can run all this analysis for yourself and see how we get to these results and what they mean. Thank you very much for watching.