 Welcome to a very windy but very warm late summer's Saturday afternoon here in Stellenbosch. Now this is the third video lecture on these four fundamental, what I call fundamental linear model types and we're going to talk about the analysis of covariance. If you haven't watched the first two videos please do so because we just build on top of what came before. Linear regression, then analysis of variance and now we're going to combine those two in analysis of covariance. So look out for some more information on our website or look in the description below. If you're watching on a computer there should be a card right up here that you can click please watch those first. So here we are in a Jupyter notebook. Once again I'm going to make use of Visual Studio Code and you can see I create environments, special separate environments for each of my projects and so this would be a data science environment set up by a mini-conda and for your interest I'm running Python 3.9.10. So let's learn we're going to use code because I actually heard a very nice quote by a very famous person in the NumPy development community to say that we see the knowledge inside of our world these days through the lens of code. We extract knowledge, we learn to understand things through the lens of code and I thought that was a very powerful statement. So we use Python in this instance just to make life very easy first of all to learn how to do these things, what they mean, what to understand what's going on when we talk about these kind of models and secondly just to show you how easy it is in the end just to run a few lines of code and do your own analysis. So as always we're going to start with by importing our packages that we're going to use in this notebook. Once again you notice that I do not make use of abbreviations so I just import these packages in their full names and if I have to reference one of the functions inside of those packages I have to type that full name. So we're going to import pandas that's for our data import and data wrangling we're going to import the stats module from scipy we're going to import numerical Python and we're going to import the patsy package as well let's run this code cell let's move on we're going to do a bit of plotting as always I enjoy the interactive plotting of Plotly so I'm going to from plotting I'm going to import the express module the graph underscore objects module the subplots module so you'll see that in action and then also I'm going to import import export so IO so that I can use IO dot templates dot default and set my defaults plotting style to plotly underscore dark as a string there so that I can plot nice dark plots on this dark theme IDE so from stats models dot formula dot API we're going to import the OLS or nearly squares function and then from stats models dot stats dot ANOVA we're going to import the ANOVA underscore LM function because that's it's going to give us nice tabular results so here we are we back with these four fundamental types now you won't find them really as fundamental I'm just using that term because I think these are the four fundamental ones to understand and then it'll be so much easier for you to understand all the other models so in italics here we see we busy with the third one one-way analysis of covariance once again remember this is modeling we're going to understand this relationship between our independent variable and this instance variables and a dependent variable and you see the data types there so we're going to have an interval type or continuous numerical type and a nominal so categorical and a numerical variable so we're going to have two independent variables here of these two very different types our dependent variable is still going to be interval type so continuous numerical so what we trying to do is we investigate the effect of the nominal categorical that's that independent variable on a continuous numerical variable so we've heard that before that's just analysis of variance but this time around we are going to correct for a covariate and that's the term that is used a covariate and that's where nCova comes from analysis of covariance we have a covariate and that's a numerical independent variable now usually we should try and control for these independent variables in our experimental design but sometimes it's impossible to correct for control for at least in the experimental design for all our independent variables and if you know if we fail to do that we can control for them to some extent statistically so that's what we're going to do here we're going to control for this covariate and what happens then is it helps us discover the true relationship between so we're just going to call it the treatment because we're using health care examples here but by treatment remember we just mean the independent variable and all the unique sample space elements of that categorical variable we just call levels there's so many terms and synonyms for the same thing don't get confused you know between all of them but we're going to try and understand this relationship between that treatment the categorical independent variable and the continuous numerical dependent variable by controlling for the covariate and that brings out a true relationship between those so what that does mathematically and as again if you've watched the previous two videos in the seminar series you'll understand what's going on here it's very important mathematically we're going to decrease the sum of squared errors and remember that isn't the denominator of our F ratio you're going to get a higher F value and you know that's going to improve things for us so let's have a look at our simulated project so this simulated research project that's on blood loss during trauma surgery for major vascular injury during penetrating trauma and our dependent variable is blood loss in millilitres so that's a continuous numerical variable that's our dependent variable so that fits the ball of our little table of data types our independent variables is really that of the categorical variable that is one of three levels and during the experimental design we could control for that so some of the participants would have received a placebo drug some would have received a low dose of a drug investigated for decreasing bleeding and then a higher dose of a drug same drug investigated for decrease in bleeding so the search is always on to help decrease active bleeding during these big surgeries for penetrating abdominal trauma major vascular injury now during the experimental design we suggest here that the researchers could not control for the age so there was no couldn't in the sign for age as as participants were entered into the study so we can to control for it statistically so that's our covariate its age and that is a continuous numerical variable so you can see how remember with linear regression how independent variable was continuous numerical and with analysis of variance our independent variable was nominal categorical and now we bring these two together the nominal categorical variable is what was controlled for during the experiment could not control for the continuous numerical variable we call that the covariate and we're going to control for that covariate instead of using Python to generate some data for us we've already got data it's in this CSV comma separated values file that's a we stripped down spreadsheet file you haven't seen those before always the best format for your data so it's in the EBL underscore data CSV file and we just going to import that with a read underscore CSV function in the pandas package so pandas dot to read underscore CSV that function we pass as a string this CSV file now the CSV file the importantly lives in the same folder directly on my computer as this Jupyter notebook file so I don't have to put in a long address on one of my hard drives you know to this file it's a state the file this CSV file and this Jupyter notebook lives in the same they exist in the same folder or directory on my computer so I can just reference this file directly and I'm assigning that remember equal sign is assignment and I'm assigning that to whatever is on the left hand side and that is a computer variable named DF so we're gonna sign this what is going to create for me as a pandas data frame object and we're gonna sign this to the variable DF so let's do that it's done we can use a bit of indexing so square brackets colon 5 that is give me up to row 5 remember Python a zero index so that's gonna give me the zero with the first the second the third and the fourth rows of data there and there we see 01234 as far as the index is concerned the first five participants here we're all in the placebo group there's the ages that we couldn't control for and there's their blood loss during the surgery now that's measured by suctioning the blood out of the abdomen or absorbing the blood in abdominal swabs and those are weighed to give us a fairly accurate amount of or measuring at least the blood loss now this group variable is categorical so we're gonna use the pandas dot categorical function just to make sure that we tell pandas that this is not we don't want it to be seen as a string we want to be seen as a categorical variable so we're gonna call the first argument they df dot group so we know what series we're talking about this column here group we want it to be ordered and we want the categories to be placebo low and high now that's very important that order because we are going to use dummy variables of course and we have to have the base case which in our case is going to be placebo and next time around when we get to logistic regression that's I'm wanting to say even more important to choose that base case but as important here so the base case is going to be placebo and we're measuring the low and the high so let's once we've run this at least let's call the info method there df dot info and we see now and we've got 45 participants we have these three columns group age and blood loss we see group is now a categorical variable and we have integers whole numbers for age and blood loss so let's look at our research question we're asking is treatment level a predictor of estimated blood loss after correcting for age so that's very important now with our independent variable we have to do a dummy variable we can't do arithmetic with names and so we have our treatment variable there that was called group remember in our data set so it's called group so we're just calling a treatment there so participants could either be placebo low dose or high dose now we create three new variables dummy variables placebo low dose high dose remember that's will be the list of all the unique sample space elements in this nominal categorical variable and if it's placebo we'll get a little one under placebo and zero for the others and you've seen this before dummy variables are quite easy to understand there's redundancy and remember we are setting this placebo as our base case so that it falls away so our two dummy variables are only low and high so participants getting the low dose of this experimental drug and those getting a higher dose and if it's zero zero there's no other choice but for that first participant to be in the placebo group and if it's one zero then the low dose group and zero one then the high dose group so we certainly don't need that placebo there it's it's redundant so let's state our research a question now as a research question as an equation so what we're going to have is estimated blood loss is going to be an estimated intercept all the hats on top remember this is not the true values in the population we're estimating that those two values at least from our sample so that's the intercept plus beta sub one times the age also an estimate these all estimates plus beta sub two times low dose plus beta sub three times high dose now just think about it for a moment this low dose can only take a value of zero one high dose can only take a value of zero or one so it's either going to be beta sub zero plus beta sub one times the age it's going to give me an estimated blood loss or beta sub zero that would be zero zero if it's a placebo participant so beta sub zero beta sub one plus age plus beta sub two times one that high dose will be a zero so I'm just adding one if it's low dose or I'm just adding one if it's high dose so think about that very carefully depending on what these values for beta sub two and beta sub three are so be one times beta sub one or zero times beta sub let's say that again zero times beta sub two or one times beta sub two and zero times beta sub three or one times beta sub three so they can only take values of zero zero one zero and zero one now see how that would influence this value for the blood loss so these are you know fairly constant values only value that will change a lot is this continuous numerical variable and you can see where it comes from that the understanding of it comes from that we're correcting for the age this covariate of ours so let's state our hypothesis and I'm going to state that in terms of this mathematical statistical approach which we've been taking so that you can clearly see how linear regression analysis of variance analysis of covariance fit together so in terms of the coefficients the null hypothesis should state that the treatment level is not a predictor of blood loss after correcting for age so beta sub one equals beta sub two equals beta sub three equals zero alternative hypothesis at least one of those not zero so let's be a little bit more verbose as you can see this is all we are after mathematically to be verbose about it we state that the null hypothesis would state that there is no difference in the mean of the dependent variable between the different treatment levels after correcting for the covariate or in our example we would say there is no difference in mean blood loss between the three treatment groups after correcting for age so definitely this last statement is what you would see in a research project but we know now after understanding linear regression and analysis of variance it's these coefficients that we after and we can see from stating our research question as an equation why we say these things you know if beta sub one is the zero beta sub two zero and beta sub three is zero that means that we always just going to have the intercept as a predictor of the blood loss so all the blood losses are going to be equal to each other no matter what group the participant fell in because beta sub one and beta sub two and beta sub three is zero and that's our null hypothesis that none of those things matter and if one of them matters there'll be a substantial difference from zero so I hope you can see how clear this can be to understand and how easy it can be to understand so as always we're going to start with some exploratory analysis so let's call the values underscore counts method on this DF dot group that is going to be a pandas series object as soon as you call one column what gets returned is a pandas series as opposed to pandas data frame the difference being there's only one column it still has an index every rows and index but on that column or on that series we get the we call the value underscore counts method and that's going to show us the count the frequency of each of the values that that can a nominal categorical variable could take so placebo there were 15 participants 15 on a low dose and 15 with a high dose let's look at the blood loss per treatment group so we're going to do some summary statistics on just a blood loss it's a continuous numerical variable and so we're going to say df dot group by so the group by method we want to group by the group column so we're going to separate out the participants in the placebo low and high dose group and from once we've done that separation or the group by we're going to look for the BL column and then call the describe function on that this is sort of a little bit of a thing to get used to with pandas how you string these things together and there you say we've done a group by the group column so there we see placebo low high and then on the blood loss column we're going to just do the use the describe method and that gives us the count the mean the standard deviation and minimum and maximum and our three quartile values so you can see the mean blood loss so we want to know if there's a difference between these three means after correcting for age so let's visualize this data we're going to use a box and whisker plot so I'm calling the express module dot box first arguments my data frame on my y-axis I want blood loss I'm going to color it by the group that means I'm going to get a box and whisker a little marker for the placebo group the low low dose group and the high dose group I'm giving my plot a little title and I'm replacing the labels that will appear generically inside of my plot based on the column headers in my data frame I'm just replacing them so if it finds BL I'm replacing it with blood loss and then the units and I do that as a Python dictionary object as you can see there so let's create that beautiful plot love these you know as you hover over them you obviously get all this information remember you can also click on these legends on the right hand side so that you can take some of them away and you know bring them back in whatever order you want and for a variety of plots and plotly that might be very useful but as we hover over them we can see the median we can see quarter one and quarter three there and since we don't have any suspected outliers here our whiskers will go out to the man and max values and you can clearly see here for the placebo group and the low dose group there doesn't seem to be that big of a difference but when we look at the high dose group they seem to have had substantially less blood loss and we'll we have to find out whether you know the level of treatment was in some way predictive of the estimated blood loss after controlling for age let's look at that age let's do the describe method for age after grouping by the group column and we see 34 33 34 so the ages were quite close so if we do a box and whisker plot just there of the ages you see that there wasn't visually at least a big difference in the ages between our three groups lastly let's say I just visualize the correlation between our covariate and our independent variable so age and blood loss but we're going to group by those three groups I'm using the color argument there for my groups and have a look at this because this is a very very important plot so we have age in years on the x-axis and we have the blood loss on the y-axis and we can see there is really these this linear relationship and that linearity is sort of parallel for each of the three groups and that's a very important thing because what we're going to do this time around we didn't do it for linear regression and analysis of variance but here it becomes very important we have to look at the assumptions for the use of analysis of covariance and I'm really going to highlight the ones that I want you to always consider the first one is linearity and that refers to a linear relationship between the covariate and the dependent variable so we can do that visually let's do that and we've already done that you know by these three groups but if we do for the whole data set there seems to be a linear relationship at least you know as age increases there's also an increase in the blood loss so that assumption we really want met we want that linear relationship we want to correct for the age so we want this relationship between age and blood loss and what we could actually do just you know let's just remind ourselves of how easy it is to create a linear regression model I'm using the OLS function I'm using my little formula the blood loss given the age my data is in DF I'm calling the fit method on that so I can fit the data to my model and I'm just calling the summary on that and there we go we see our age there we see a p-value for that age is very small and so at least we reject the null hypothesis here and we say that that coefficient for age is significantly different from zero and you can see the coefficient there it's 145 so quite high we see an r-squared value there of 0.552 so age explains our model here which is just age explains 55.2% of the variance in the estimated blood loss so I think we're fairly satisfied that there is a good linear linear relationship between our covariate and our independent variable the next one is homogeneity of the regression slopes so let's look back here homogeneity of these regression slopes remember this model line of ours that's a regression slope and homogeneity means the slopes are equal and look at these lines they are parallel that's what we want what you don't want is scenarios where these lines this criss-cross forming a big old cross over each other like a multiplication sign or a divergence or convergence like they start off far apart and then meet on this side or all start from the same point and then move out that means that there's a difference between this relationship between the covariate and the dependent variable dependent upon which group they are in and that that is what we don't want we understand what happens if we see graphs like that it's very easy to interpret as I've just explained but we don't want to see that we want to see homogeneity of the regression slopes and there's one good way to look for that and we're already going to build a model here and we're going to see during these assumptions we're already going to do encoder encode before we actually get to encoder but it's the way that it works I want you really to think about these these assumptions and I'm going to call it the interaction underscore model I'm using the ols function and I'm saying blood loss given age multiplication so that's the star symbol that's multiply age multiplied by group and that's going to give us an age as a predictor it's going to give us the groups as a predictor and it's going to give us the interaction between ageing groups and I'm calling the nova model I'm using type three there as you can see we don't we don't have to go into that at the moment but there you see age colon group so that's the interaction between those two that's an interaction term if we go to the p value for that it's 0.9 and our null hypothesis which we won't reject in this instance states that there is homogeneity of variance should say homogeneity of the regression slopes uh don't listen to that little slip of the tongue homogeneity of regression slopes and we can clearly see this they are homogeneous they are parallel to each other and that is an assumption for the use of analysis of covariance and what you do as I say you look for this interaction term see there's group there's age but then I have this interaction term and the way to do that is not to say age plus group which is what we're going to do find cova we say age multiplied by group and so what the code would do it will take group and age and the interaction term but clearly we can see a p value that is larger than a chosen alpha value of 0.05 we fail to reject the null hypothesis and we have this homogeneity of the regression slopes next one is we're going to look at the residuals now this is where I mean we actually have to build on cova model and so that we can calculate these residuals remember the residual or the error that's the difference between the actual dependent variable value and the estimated one due to our model so we've actually we're actually doing and cova right here before we do and cova so just say blood loss given age plus the group data is fit the tf data frame dot fit method and I'm assigning this to a computer variable called model so if we run that we have our model I'm now going to add a new column to my data frame so I've got to put it in square brackets I'm going to call it residuals and I'm using the result attribute of my model so model dot result that gives me all the residuals and that's going to go into a new column in my data frame which means I can look at a little histogram of all these residuals and you know it's it's a you know we've sort of have to agree that there is some form of a dull shaped curve here so it looks like these residuals at least you know are sort of normally distributed we can be you know it's a bit difficult because it depends how wide we make these bins but we can be a bit more precise and use something like the Shapiro walk test the null hypothesis would state that these data points are from a population in which that variable would be normally distributed so I'm calling stats dot Shapiro passing this series object the f dot residuals that's just the value in the residuals column passing that to the Shapiro walk test we see a statistic there and a p value of 0.89 so we don't reject that null hypothesis and we say that we've passed this assumption for normality of the residuals but it's important that we look at the residuals there we can also look at homogeneity of variance and I want those three groups of residuals to have sort of equal variance Levine's test we're going to use for that think about the stats dot Levine it wants these three Python lists that I'm going to pass so I'm going to call it Resid placebo Resid low and Resid high and I'm just using a bit of notation here df dot lock for location square brackets so it's column comma rows I want the column to go down this df rows comma columns I should say df dot group equals equals placebo so it's going to go down all the rows of that group column and only select those double equals sign so or only the ones that return to are going to be included comma I want the residuals value for that and again at the right at the end I'm using the to underscore list method so it's just going to export this as a Python list object so I've got three lists and those will be the residuals of each of those groups which are then pass as three arguments to the Levine function and there we go we see a p value of 0.40 the null hypothesis of the Levine function is that these variances are equal and then lastly we care about outliers and sort of as a rule of thumb we don't want too many values as far as these residuals are concerned to be more than three standard the standardized residuals at least to be more than three standard deviations away from the mean and you can see this long function that I have to do so what I'm doing here is I'm calling the residual underscore studentized underscore internal values so the model dot get underscore influence open close parentheses dot residual studentized internal and I want to know how many of them once they are I have the absolute value of them how many of them are more than three because remember some will be more than minus three standard deviations away and some more than plus three standard deviations and I'm just you know that's going to return either true or false which is one or zero and if I sum over all of those I'll get how many of them are more than three away and I see none of them and we could actually see that with the box and whisker plots there were no suspected outliers and when it came to the actual data you can plot these residuals as a box and whisker plot as well so very importantly these are the assumptions that I really want you to consider anytime that you are considering using analysis of covariance check for these assumptions so we eventually going to come to building our model now we've already seen that we've used a code to build the model we've assigned it to the computer variable model but let's just start off with just building a normal analysis of variance I'm not normal let's just build an analysis of variance model because I want to show you the differences between correcting for that age and not correcting for that age or at least not bringing age into it at all so let's use the ols function so blood loss given the group just normal analysis of variance one way analysis of variance I'm going to say data equals df I fit the data to my model and I'm calling the ANOVA underscore lm function passing this model that I've called ANOVA assigned to the variable ANOVA let's have a look at it and there we go we see our sum of squares due to the regression and our sum of squares due to the error so let's just save these as a as a computer variable so we can just compare that to our model which remembers our ANCOVA model so first of all is the sum of squares due to the regression remember that is the difference between the predicted value and the mean of the dependent variable so in each one of the groups we just take the mean that's our base model and there is an attribute called ESS so again as I say in many synonyms for all of these things here in stats models the sum of squares due to the regression is called ESS I'm going to assign that to the computer variable SSR ANOVA so I remember that this is the sum of squares due to the regression of the ANOVA model and you see there let's have a look one two three one two three four point about four point five million there so let's look at the sum of squares due to the error the difference between the predicted value the estimated value in the model versus the actual value unfortunately the attribute is called SSR but we know we want to use the term SSE in the seminar sum of squares due to the error so I'm going to save that as SSE ANOVA and you can see how big that error is so my sum of squares total for my ANOVA it's the sum of the sum of squares due to the regression and the sum of squares due to the error and we can see it there such that we now can express the coefficient of determination you know how what what fraction or percentage of the variance in the dependent variable that the model explained and that is this ratio between the sum of squares due to the regression and the total sum of squares and we see there about 40 percent there's also the r squared attribute that we can use so we can see exactly the same value we know now where it comes from how to calculate it let's go back to the ANCOVA model though remember we saved it as the variable model so I'm calling the ANOVA underscore lm function from stats models passing the model as an argument so let's have a look at that that looks a bit different from the ANOVA model first of all there's age there we are correcting for the age but we see that the sum of squares due to the error here by the residual remember the error that's a lot smaller we've taken away some of it by correcting for the age so let's save the sum of squares due to the regression of our model I should say the error that's the SSR unfortunately named attribute there that's the sum of square due to the error I'm going to save the sum of squares due to the error of my ANCOVA model remember we've already done it for the ANOVA model so let me save that and you can see that's a lot smaller it went from 9 and a half million almost down to 716 000 so I'm actually printing that to the screen so that you can see those two values 9.7 million versus 716 000 I've taken away a lot from the sum of squares due to the error so my predicted value my estimated values are much closer to the actual values as far as my dependent variable is concerned let's look at the sum of squares total so I'm taking the sum of squares total as the sum of squares due to the regression and the sum of squares due to the error add them to each other this is for the model so this will be sum of squares to the sum of squares total due to ANCOVA let's save that and you'll see that it has not changed between ANOVA and ANCOVA the sum of squares due to the regression is absolutely the same between the two we have two error terms here we now have one for group and we have one for age adding all of this it has to stay the same I've made the error sum of squares much much smaller so you know it has to go to the numerator there but if we look at it the sum of squares total remains absolutely the same between the ANOVA and ANCOVA model nothing happens except me making my sum of squares due to the error much smaller look at what happens to my r squared value now where we had 40 percent before we jump to 95.6 percent so we've got this bit well to some extent our goodness of fit test here shows us a much better model we see a truer reflection see how all these words come back a true relationship between the the dependent variable and the independent variable the independent variable being the treatment or the group as a predictor of the dependent variable let's have a look at the coefficients there's a params attribute that's going to give me those three coefficients and now remember we can fill that in as far as our equation our equation for our research question is concerned our research equation as I like to call it so the estimated blood loss is going to be minus 53.63 that's beta sub zero my intercept plus 147 times the age minus something for the participants in the low group or minus something much bigger for someone in the high group and you can see why that's going to drag down someone who has a zero for low and a one for high that's going to drag the estimated blood loss down quite a bit and you can also see how what we mean by we are correcting for age I mean we can have low zero and high zero so they both zero that is just going to be basically a linear regression model because we just have the beta sub zero and beta sub one there just as we saw before but if it's one zero or zero one we're going to subtract one of these two terms so we are correcting for age extracting a better understanding a better relationship between my independent variable and my dependent variable remember how these intercepts work we can calculate a standard error for each of them and there's a different equations for all of these different model types but I can just use the bse attribute so model dot bse remember model is my encoder model and there we have two to four point four one three forty eight and forty eight and six meaning I can calculate a p value that's just remember the coefficient divided by my my standard error that gives me the three t statistics and then given a parameter which is the degrees of freedom we can work out what these p values are and we can see there as far as beta sub one beta sub two and beta sub three they are substantially different from zero and that is really what we're after we also see of course our confidence intervals around our comfort our coefficients and we saw in the first two lectures how to calculate those so let's have a look at this effect that this correction for age has on our fitted value so model dot fitted values we use my model it's going to pass all the values to it as far as the independent variables are concerned and return for me the estimated blood loss I'm going to assign that to a new column in my data frame called estimated and now I'm going to plot that so I'm going to stop plot so you can have a look at that code for plotly but now you can see this is the actual values and this is the estimated blood loss so we've corrected for the age and you can see a little bit of a difference when we hover over these for these values you can do that but if you just look I've plotted as well see their box points equals all so I can see all the actual values and you can see that they are different now that we've corrected for age in our model here so let's have a look at the linear algebra of all of this I want to show you that it is nothing different from what we've had before with linear regression analysis of variance now analysis of covariance I'm going to create my vector of dependent variables in my design matrix using the D matrices function from patsy I pass that same formula blood loss given age plus group data comes from the df data frame and so let's have a look at x just the design matrix remember that first column of constants which is what we multiply beta sub zero by the zero zero that would indicate as far as dummy variables was concerned that these first lot of participants are all in the placebo group and then we have age I'm going to convert these to numpy arrays so that I can just work with them as matrices and vectors so remember how do we get this vector of estimated beta values well it's this equation that we saw before remember if you understand a little bit of a linear algebra you know that we cannot possibly span all of the vector space in our matrix x so we have to do this projection onto the subspace that is spanned that column space and we'll get the best possible values for our coefficients that way this is how we express it in code now there's obviously much easier ways to do it I want you to investigate that code so you can understand how I just build parts of these all together so that I eventually get my my vector of beta values and there they are we see the same coefficients that as we had seen before let's have a look at the params attribute it's exactly the same as what was calculated in in using stats models so there you go I hope you have a very clear understanding now how we've built from linear regression to analysis of variance now to analysis of covariance the terms haven't changed the almost the mathematics behind it hasn't changed we interpret it in a different way but you can see where these words come from we want to understand the relationship between our dependent variable our independent variable after correcting for some covariate and you see you saw at least exactly how that worked in that how that helped us actually to decrease our sum of squared errors so next one's going to be a bit different we're going to look at that logistic regression so on the Saturday afternoon as the sun is setting you know it's getting towards winter time here in the south and it gets darker quite a bit sooner and yeah we're going into winter and we wish we're in the northern hemisphere so we can go into summer I'll see you for the last of the series the seminar series on what I term these four very important types of linear models