 Welcome back through the videos on multiple linear regression. In this video, we're going to talk about how we can integrate interaction effects into our multiple linear regression model. And so I listed here that we can use a star or colon, and that depends on how you want to actually use the interaction effects. So a star tells Python to consider the term separately and as an interaction where colon says, just consider the interaction. And so we're going to demonstrate both of those ways here. And so we're going to, once again, go back to our SMF OLS model. And we're still working with that NOx and RGGI dataset. So that is our Y variable. And in the previous analysis, we included all of these different variables. For simplicity sake, I'm only going to include NOx base and the state variable. And so our first model is going to look at both the interaction between, or is going to just look at, I'll do the colon first. It's just going to look at the interactions between NOx base and state and not consider them individual variables. And so the data is still DF merged. And we still need to do our has cons equals true. So even though, as we've done throughout the whole thing, we're removing that we still need to set that in order for it to run correctly. And I'm actually going to come up here and change this to results three so that I don't override what I did before. So let's turn to results three summary. So here we're at 9.81. So even our best model with all of these variables only got 786 last time. And so here we are at 9.981. So this is a really good model. This is, you know, where the probably one of the best models we can get. Because we have this interaction. So having the two separate variables didn't help too much. But having them interact needs to all of them being significant. And so this is, you know, looking at these interactions becomes really important for developing these really good models. And then I'm going to demonstrate what it looks like when you do that. Asterix. So I've changed the name and then all I've done is switch that here. So we can run this. And we do see a little bit of an improvement point 982. So still an improvement, but not anything massive. But where it really changes is now in our results. We now also have all of these variables as individuals in addition to the interaction terms over here. You know that you still get the interaction when you do the star, but you also get all of these extra variables. So if you don't think you need all of the individual variables, you can always just do the colon instead. But nonetheless, this is probably one of the best models that we can get. And so we can go back to what we did up above. We first plotted this data. And then we did a residual analysis. So I'm going to just copy this visualization code and come down here. So this stays the same. We still need to get results, but now we're going to be using results for. So we want to make sure we actually pull out the correct data for our results. So we do results for we have nox base nox RGGI color by state stats move. And then we've got our special color scale. And so if we run this, we can see that now we've gotten the lines actually match what we would expect. So once we put in that interaction term between nox base and nox art or nox base and the state. We end up getting these individual best fit lines that actually look really good, especially when we compare them to what we got above. Where we can start to see some differentiation, but they're all basically still following the same thing. But essentially what we have shown is this nonlinear interaction between nox base and state is what the critical predictor is not necessarily the both variables being included. And so this figure looks like we are doing a good job at plotting data. But of course we need to check the residuals so even though it looks are adjusted our squared is really high. Our figure looks like it's very good. There's still we still want to see is our model statistically valid. So for that we need our residuals analysis. And so we can override that residuals column and say DF merged nox RGG I minus Dia merged why hat, which we just recalculated up here. And then we can do our GG pot. So our X data is why hat, and our Y data is residuals. And so we can see here that it actually looks okay, it looks much better so we're not seeing any real distinct fanning or wedge shape. The variance does seem to be, you know, fairly evenly distributed. There maybe are some outliers that we might want to look into but by and large, this is a pretty good example of a residual plot that is meeting all the conditions. We might have some worries about this being so large negative 500,000 to 500,000. But given that our why hat is going up to 5 million. That's maybe not as worrisome so when thinking about the magnitude of error. In terms we want to think in relative terms like 500,000 could be a large difference if your data is, you know, on the 100 to 200 range, but in the scale of 5 million, perhaps 500,000 is not terrible. Still not ideal, but not terrible. Using these multiple linear regression models and using these interaction term, we can actually arrive at not only a better fit, but also a more statistically valid model.