 Hello. Welcome back. We need to finish up information criteria. All we have left is ensembles, and then we're going to send the rest of the week, most of the day, and all of Thursday talking about interaction effects, which will be relaxation, some extent, but we're going to keep applying everything we've learned before. So, you did some steep gradient ascent in hill climbing last week, learning information theory and everything else, right? And you're mastering that in your homework now. And you'll have a chance to rest a little bit this week before we start another steep pitch in a few weeks. So, let's do ensembles, where I just got to the point where I wanted to say that I'm encouraging people to still get these jokes. It's encouraging that it's very often to see people use model selection because they're hunting for the true model in the set. And I want to say that this makes no sense. It really does make no sense at all, because none of the models you are fitting to the data are the true data generating models for any of the things people in this room study. I will agree, which means social scientists and biologists. I'm sorry, none of the statistical models you're writing down, especially linear regressions, are the actual data generating models to the phenomenon you study. That doesn't mean they're not useful. It just means that if you're hunting for the true model in the set of models you fit to your data, you need to turn around and look the other direction, right? And towards theory, not towards your statistical models. Nevertheless, these sets of models can be very useful. So, you'll see this a lot of biologists in particular like to use whichever information criterion that they prefer to try to identify the true model in the set. And this really isn't what information criteria are designed to do. So, think back to last week where I was spent forever in the dark road, right? In normal ways of speech. All that was about finding a model that makes good out of sample predictions, given the training set you have, given the constraints on information that are present and the structure of the model. And that, and information criteria I showed you in those simulations do exactly that. They do a good job, assuming all their assumptions are met, of finding a model that will have the best expected out of same convenience. But that doesn't mean it's true model. And I kept saying that information criteria may select a simpler model than is quote unquote correct because you can't estimate all the parameters in the structurally correct model. And it can also, now I want to say, select models that are more complicated than the true model. However, this is not bad because in both of those cases it's picking the model that makes the best expected out of sample predictions. So you're saying, okay, how can a more overly complex model be best out of sample predictions? Because as your sample size goes up, the parameter estimates for the terms that are pretender terms converge to zero. And so there's no foul. And AIC or any of the other ICs will say, yeah, this overly complex model is the best one. I know, yeah, by the way, it has a bunch of beta coefficients that are almost exactly zero and have no impact on prediction. So it doesn't do any harm. This is important because you'll sometimes read that information criteria are inconsistent for model identification, which is correct because that's not what you're designed to do. Inconsistent in statistics means, as a sample size goes to infinity, it doesn't do that same. So inconsistent for model identification is, as your sample size goes to infinity, AIC, BIC, WIC don't pick the data generating model, but they will pick perfect predictions. So there's no harm in that. It doesn't matter. So that's what I'm trying to say here. And as a consequence of this, what you want to do is you, as always, focus on prediction. None of these models that you're putting into data are the true biological or social data generating processes, but they can be useful descriptions of it. Every model you fit is, even if it's structurally correct, will overfit to the sample to some extent. So we focus on predictions and we need some way to use all the models in the set to construct predictions. And this is where we turn to the next useful procedural scale, which is constructing prediction ensembles. Often this is called model averaging. There's a big literature on this. And there are lots of different ways of constructing model averages. We're going to proceed with the information theoretic way of doing it by using the Aka-Ike weights of the models. And we construct using the differences in deviances, because those are relative differences in distance from the target. That is their proportional differences in KL divergence for each model. So here's the way to think about it when we compute predictions. For a single model, I've tried to make you compulsive about using the full posterior. Looking out just the map is a formula for overconfidence. And I use the full posterior every time. Likewise, when you fit more than one model, it'd be good to use the predictions of all the models and average over them. So we can do that as well. And this guards against overconfidence in model structure. And it allows us to construct predictions that are less overfit than the predictions of any single model in the set. So think about this as the ensemble problem. Here's the procedure. First, you compute the information weight for each model. So you've got your sample. You fit a series of models to the same data. And from this, you can compute your information criterion of choice. I'm going to have you guys use WAIC as you are in your homework. From that, you can compute the Aka-EK weight or information weight for each model. And those always sum to one. So they give you the proportion of the prediction ensemble you want to come from each particular model. And then you compute the distribution of predictions for each model just like you've been doing all the way up to this point, right? Sim, link, whatever you want to do, same procedure. And then you mix them in the right proportions. And you could do this manually. And in previous iterations of this course, I was cruel and I did make people do it manually. And they thanked me for it because they developed Stockholm syndrome, as I keep saying, over this. But as I'll show you on the next slide, I now have a utility in the rethinking package that automatically does these steps for you. And I know you guys are already fighting with it on your homework because you're four thinking people and you're good at time discounting, right? Actually, that's what's great about grad students this campus. You guys think like 10 years ahead, right? You're like, I want this kind of job. So you're good at planning backwards. So why do we do this? Well, because these ensembles routinely outperform the predictions of any single model, even the one with the highest weight. Because every model is overconfident to some extent. So the ensemble often does better. This is really big in lots of areas of prediction where accuracy is important. In climate science, everything's an ensemble, right? Last year, we actually had some client scientists in the front row, they were grooving on this section. They told me lots of interesting things so I don't know if we have any more. But occasionally, physical scientists wander into this class and I always like you guys to mix things up. So let's see how to do it. Here's an example with the primate milk data again to stick with that same data set. And when I left you guys last week, we had fit a series of four models to this. They're listed down here on the bottom. 6.11, 6.12, 6.13 and 6.14. 6.11 is the intercept-only model. It has no predictors. 6.12 and 6.13 have one of the predictors each. One's body size is one of the predictors and the other is neocortex proportion. Am I remembering this? Yeah, I only wrote the book. And then 6.14 has both of them. And so let's focus on the top left for a moment. Let me say that is just the same sort of code you've always seen where we generate the predictions for one of the models. And in particular, that model that we're generating posterior predictions for is 6.14. So it's the highest ranked model by far. It has the most weight. And in the graph in the upper right, those predictions are shown by the dashed trends here. There's a dashed line in the middle. That's the posterior prediction for mu as a function of neocortex proportion. And then the dashed intervals, that's the best model only showing you the uncertainty, the confidence factor around that average. Right? Make sense? So there's nothing surprising about that code. You've seen it all before. It's the usual business where we construct counterfactual predictions and plot them up. Now down at the bottom, we're going to do the same thing for all of the models at once in an ensemble. And it's made automagical by this function called ensemble. And what ensemble does is it does exactly the procedure in the previous slide. It just calls Lincoln Sim. It's really all it does, literally. Inside of it, go ahead and type ensemble with nothing else on your R prompt and you'll get the raw code for it. And you'll see that it just calls Lincoln Sim with maybe some error checking, I forget. And what it returns is a list with two elements, two matrices. The first one's called link and that's the output of link. And the second one's called sim and that's the output of sim. And you can extract them as you need them. So in the second line, we summarize just as before. We apply to our milk ensemble the link element over the columns we want the mean. And then we get the percentile interval, same way over the columns we get percentile intervals. And then we plot them and do the shading on the same graph. Now the solid line in the middle is the ensemble mean prediction. Notice it's almost exactly the same. It's only the slightest little difference. And I use means here so you can see some differences. You use the median, it would be almost exactly the same. And medians are usually better. I'm using the mean because means are sensitive. But look at the shaded region now because now the confidence interval uses all the models and those lower-ranked models, yeah, they're unlikely. But they make extremely pessimistic predictions about the relationship, the predicted relationship, about between neophortex proportion. And milk energy. And so it has a big effect on the shape of the confidence region, right, the whole thing. And this is a routine thing that happens is your risk changes a lot in this and you see those rare events. So if you're doing something like an extension analysis, this is priceless, right, because the rare events are what kill leopards, right, or whatever it is you want to say. So if you add more models, will the confidence interval always get bigger or will it sometimes get smaller? The question was you add more models will always get bigger or will it sometimes get smaller. I think if it changes at all, it's got to get bigger, right. But it depends upon the whole portfolio of models. Absolutely doesn't what happens with the shape. All kinds of stuff could happen. If these were complicated polynomials, remember from the Tuesday last week with those weird polynomials and I took the prediction regions, you can get weird snake shapes and all kinds of stuff happening. It depends upon the details. But I think if it changes at all, it's got to get bigger in some area. It would make more sense here. You could plot up the contours by doing shaded regions of different percents. That's 95%. But if you did like a 50 and an 80 and a 95, you'd see that it's actually, the topography is interesting. And there's this broad flat plane basically out here that includes zero for the low-ranked models. Does that make some sense? Yeah. So it depends upon the case. There was another hand. Maybe it was auto-grimming. Oh, sorry. When did the ensemble function take the weights into account automatically? Yes. The question was does ensemble take the weights into account automatically? Yes. It takes those list of models and it just holds WIC for each of them against those. It calculates the weights. There's a function in the rethinking page called IC weights. And if you just pass it any information criteria values you like, it returns weights back. So it calls that internally. Does this make sense? What's going on? So you've got that homework problem where you're going to do something with this and then you'll get the idea. And we'll be using it to continue through the quarter. But it's important to understand this is convenient. But mechanistically it's just doing that top part for each of the models and then mixing them together in the right proportions. And the zero. Yeah, yeah. So in this case the thing is a zero slope. So the lowest-ranked model has a slope of zero. The lowest-ranked model is the... Well, it's not even the lowest-ranked model. The second-ranked model is the null model I think in this case. Something like that. And I forget the details. But there's some weight assigned to the intercept-only model which makes a horizontal line prediction between kilocalories, per gram of milk, and neocortex percent. And this ensemble is capturing that. That's why that shaded region includes a slope of zero. So there's more uncertainty carrying over with the ensemble? Absolutely. Always. Because it uses all of them for doing it. Exactly. And sometimes, depending upon how risk averse you want to be in your prediction, it'll capture these extremely unlikely but perhaps catastrophic things. Here there's no catastrophe. It's not whatever. But again, if you're doing population viability analysis, this is precious stuff to do. I think. You want to use an ensemble at some time. I don't think it was first dated. So one of the models that we're testing here does not use you. Forgetting just the intercept model. We have a model that has a predictor that's not neocortex. And does not include neocortex. That's right. What I'm not quite understanding is if that model is a good predictor of kilocalories per gram, then isn't it by definition that the relationship in that model between neocortex, percent, and kilocalories per gram is zero? Yeah. Absolutely. So there are two models that predict horizontal lines on this graph in the counterfactual universe, which is what we're drawing out. Now I emphasize the counterfactual part. Because of course in reality, all of these things are correlated necessarily by the biology of real primates. And you can't actually make an ape that magically disentangles body mass from the other things, right? Unless we get a laboratory and do an ethical experiment. So if you have a set of new predictors, other than neocortex, you take a model without neocortex. You're going to get somewhere in your ensemble confidence envelope. You're going to get a zero and you're going to get a prop zero. In this dimension. In the other dimension it's not. But all the models go into this. So it's like there's a combined way for two models that are pessimistic about the relationship between neocortex and kilocalories per gram. Okay. The visual field I need for this classroom is really like Katrina. So say you only have two models, you have a null model and then you have a more complicated model that does a better job of predicting. And then if you put those together, then you'll get probably something like this, right? But then say you have another model that's almost exactly like your model that does a better job of predicting. Then if you put all three of those together, wouldn't that like weight the null model less to non-null models are basically like the same thing? Not necessarily no. I think this is a great idea for a homework problem. I should make this. If you've got two identical models, you can try it this way. Just take the same model and fit it twice and give it two different names. Put them in the set. It'll divide the weight between them. But isn't it like WAC independently? Yeah, but they'll have the same deviants and now you'll have a list with the same deviants replicated twice. Or the same AIC or WAC replicated twice and then your worst one. But your null model isn't going to change in size, so the proportion that you out- That's right. So I'm saying you should do this experiment and see what happens. And you can predict it entirely from the just the estimated out-sample variance is what's going to happen. And you'll see it does matter. So you don't want to do that. You don't want to replicate the model multiple times. It'll shift the pie a little bit. But isn't that like concerning? Yes. Absolutely is. So I mean if I had to take a question, I mean it's up to us to decide which models to use. And there's no way the system can automatically discover the same like this. Because unlike if you put two predictors that are the same, then it will tell you something is wrong basically. But if you put two models that are the same, it will just weight it twice as much. And that seems like really problematic. Again, you can do it. I don't think we're understanding one another. You do it and you'll see what happens. But this is a great idea for a homework problem. So maybe I'll put it in up next week or something like this. This is a great idea actually. This is a good philosophical issue. And yeah, there's no general fail-safe device in this business. Right? These are go-longs and they will wreck problems. Absolutely. And now listen, I don't want to terrify you. We can be sensible about this. And in this case, the idea is theory has nominated the set of predictors we've got here. So we try the obvious main effect candidate models. And this week we'll also look at the possibility of doing an interaction and what those correspond to. But you've got to use your domain knowledge to decide otherwise you're in big trouble. Sensitivity analysis is a great thing. If you don't have domain knowledge, then I always tell people, well, vary the assumptions and see if it makes a difference and report whatever happens if that makes sense. So I have no reassuring message other than that. But this is my general message that statistics is no substitute for science. Right? It's like, often, I mean, it is a funny thing to say because of course no one's going to disagree with that statement. But often people act like they want statistics to substitute for the scientific process which takes generations to operate in many cases. And there's just no guarantee that any statistical procedure can discover the truth. That's just not what they're good at. Yeah. But again, there would be really bad to say that statistics is substituting for science, right? Because you're just exploring things. And the contentious discussion of the scientific community is what makes progress there when progress happens. It's not always. But I think we agree. I mean, I don't think there's anything wrong. There's a slide in a little bit maybe about this. We'll get there. The curse of tupacanou. We'll get there in a second. Okay. So about ensembles. That's how you do ensembles and that's all the tricks you need for doing your homework. Yeah, yeah. Every time I tried to think of how to define it, I actually... The probability, the standard kind of out-of-sample if we did a bunch of trials of these models using these estimates, and the insimulation of what happens is the weights correspond to the probability that each model has the best out-of-sample deviants on any particular resampling procedure. And in nice Gaussian context and simulation studies it is like that. As I said before, I'm nervous about the definition but pragmatically it works pretty well. I think heuristically you can just think of them as a rescaling of the distances, estimated KL divergence distances between the models so that as a model approaches one in the set it's so much closer to the target relatively speaking than the other models that it's a slant of better. So I'm not sure to make a better prediction. Does that help? There is squish about this but statisticians don't agree. And I've said this to you guys sometimes and some of you come to me in office hours and those of you who are in my department you hear this all the time. But what's fun about the field of statistics relative to say the field of biology is that in statistics everybody uses many of the same model types and inference procedures but they disagree about what they mean. So statisticians are philosophically contentious people and at least until very recently. So they fight all the time about how to interpret a regression coefficient but they all use regression and it isn't like in biology we're fighting about the new applications we don't fight about basic issues like does actual selection happen? Yeah it happens. So statisticians it's like you scratch any topic and immediately blood gushes for it and you go to like statistics blogs it's like this there will be a post that's about some new regression method and crickets right no one cares but there's a post about how should we advise people to interpret a linear regression coefficient you get like hundreds of contentious replies and people argue about Bayesian philosophy and it can go on forever. It's fascinating. It's really as an anthropologist I study people for a living right this is all data for me this is incredible. And I would say it's bad it's just the flip side of most fields where the foundations are secure they're totally debated about what they mean it's fascinating and fun and this is not making you feel good I can tell but don't worry you'll get some experience here okay let's talk about New York blizzards so there was a blizzard in New York anyone remember we're out here in California so we're like ha right new people on that other side of the country where there's water but no so they basically shut the city down and then the blizzard didn't really happen I mean it was technically a blizzard but it was like the roads were never even covered it was a trivial thing for the northeast and lots of people then got angry at the governor and the National Weather Service and lots of stuff and this is an interesting case where you can think about ensembles so did they make the wrong prediction in New Jersey too shut down and it turns out if you do the triaging on this and follow the paper trail of what happened almost everybody was basing their forecasts on a model from what's called the ECMWF the European Commission for Medium Range Weather Forecasts and they had this model that made a really extreme prediction of more than 30 inches of snowfall in New York City most of the other models that were available made like with the catastrophe model to do predictions on it and the question is was that bad or not I don't think this is an easy thing to decide actually for a couple of reasons now it depends upon the idea I think the criticism comes about by they're saying you called it wrong meaning it was an inaccurate forecast and yes it was but it was a risk averse forecast so it's true if they had used the ensemble if you had taken all the available snowfall accumulation predictions on them you wouldn't have shut down New York City absolutely correct the information available at the time if you constructed a prediction ensemble from it it would have been more accurate ensembles went absolutely true weather forecasting and every meteorologist knows that nevertheless that was not the job of the mayor the mayor's job was not to call it accurately the mayor's job was to prevent disaster and stay in office it was an incredible forecast from a very well respected group of meteorologists the ECMWF it calls things really well much of the time and they said there was a high probability as much as 30 inches of snowfall so they shut it down welfare considerations aren't the same as accuracy and this is what makes predictions so frustrating so the people here who do some applied biology conservation biology you know this of course right because this is your business and this is the analysis you're not interested in the average outcome that's not your problem it would be extreme disasters you want risk averse predictions you want to behave towards some estimate other than the posterior or your median and that's a different problem that's all I want to say about that does that make some sense though I think the criticism of the National Weather Service on this is a bit overblown because they always call things in a risk averse way and things are coming to shore they always exaggerate the potential wind speeds because they want to stop people from surfing which is what happens because why chromosomes this is like you know you tell young guys there's going to be big waves and they're like dude so they have to exaggerate the potential damage to save lives it's a different problem than accuracy and you just got to keep this straight for those of us who do basic research myself included but if you're doing applied things or someone will use your science for some applied goal it's a different problem actually and accuracy is not what we want it's a funny business those of us who study human evolution accuracy would be a great goal if we can get there okay but then you have the problem of we're not listening to you right yeah that's usually my problem yeah exactly so one of the things people do when you can get drunk on information criteria you can start trying every possible structure model you can think of and this sometimes happens and I just want to warn you off doing this too flippantly there are times you can justify like if you've got new data and you have no idea what's going on you just want to explore but this runs afoul of the false positive problem right so you've got a bunch of potential predictors and their interactions which you'll start learning about today so the sample will excite that model a lot and look really good and that's just the even if the false positive rate of signals like that is low if you try enough models it will happen this is a basic problem and I want to help you memorize this and keep it in mind by thinking about William Henry Harrison perhaps the United States worst president ever perhaps go read his Wikipedia page and see if you agree with me the great story we had where he was because of the terrible things he did he was cursed by the Native Americans and he died shortly into office and then for decades afterwards every US president who was elected in a year ending in zero the first was William Henry Harrison died in office and I think it was Harrison he died just a few weeks into his first term because he caught pneumonia while giving his acceptance speech something like that and he deserved it go read his Wikipedia page Genocidal Adiac he was a genocidal adiac anyway that takes the laugh off of it but there you go anyway he was called old tip of canoe because of this big battle he was a general a battle of tip of canoe anyway so Lincoln after him Garfield, McKinley, Harding and F.D. Roosevelt they were elected in a year ending in zero and they died in office J.F. Kennedy was the last assassinated in 1963 Ronald Reagan broke the curse he was elected in 1988 despite at least two assassination attempts that we know about managed to live but through both of his currents and the curse was broken assigned any significance to that you like anyway I don't believe in the curse but this is a great example of how for enough coincidences in any sufficiently long data set you find something really compelling like this doesn't it seem compelling like all of these presidents elected in year ending zero they died in office tragic circumstances the curse worked the curse did not work you can play, you can go on the internet you can find all kinds of creative games with the letters and presidential names there's all kind of Kabbalah-esque things about the numerology but you think it's a small sample size there are a lot of dimensions to it and if you dredge through those predictors long enough you will find something really exciting and this is the curse of typical nuanced statistics we dredge through enough model types we'll find something that fits the sample really well and of course with presidential elections you've got to wait a long time for the sample to replenish the country hasn't been around that long there aren't that many presidents so the take-home message if you try all possible models instead be thoughtful about what's going on I know that's often hard and I grant sometimes there are new data sets and you just want to dredge and see what's there but you have to be playful with it take it as a hypothesis testing situation model averaging helps a ton because if you have a ton of models the curse of typical new will be in there and it will do well but there will be lots of other models that drown out its weight and it will get spread down and it will have a smaller effect on prediction model averaging is a way to guard against this kind of thing because if we pre-register our analyses then it's hard to get away with these things okay does this make sense I want to get to interactions today and I'm almost there so last thing I want to say because I feel guilty that I'm encouraging you guys to always use simple models sometimes the complex model is the right model to use in particular all this information criteria and stuff you don't always have to do it if you've got a particular model nominated by theory then that's the one you fit to your data that's the one you talk about and that's what it requires and that's fine but that's what people want to know about if your community wants to know about the impact of country music on suicide then that's what you estimate and then you report that estimate and then you talk about well this is probably over fit to the sample so we might worry it's over fit but under these estimates the plausible range you can only explain this many suicides or something like that so it's just to say there are a few levels theories of good thing to use okay alright let's shift to let's shift to interactions so sort of the most basic concept in statistical modeling that lets us get somewhere with inference is a phenomenon called conditioning so let me introduce or reintroduce conditioning to you in a different way using two true stories about data and what these things are if you've read ahead you do but those are manatees in schematic form from a particular manatee sanctuary in northern Florida and manatees had the only natural predator of the manatee is the speed boat and sometimes slow speed boat but a propeller boat and these scar marks that you see on these bowling pin like figures and almost all living manatees in Florida have some propeller scars so you can go and audit them actually there are studies of this because the Florida Wildlife Commission checks these things out and you can ID individuals from their propeller scar patterns and so this has led to public campaigns to put cages on propellers for example those of you who work in the Delta you might know this when you're boat stops but that protects the manatees to some extent but it turns out it doesn't help the manatees very much to protect the to guard the propellers because propellers aren't what kill manatees think about this for a second here's the selection effect we're only talking about living manatees the dead ones have fewer propeller scars because what mainly kills a manatee is the keel of the boat and you can put that in a cage and it doesn't matter it still hurts them conditional on being alive propellers hurt manatees but they don't kill them very often they're painful don't get me wrong it's a horrible thing we should cage the propellers but the major threat is the keel of the boat the major threat is that there are boats that's the major threat so you make the wrong inference thinking of living manatees it's conditional on surviving but keel probably didn't hit you and so you don't see keel wounds you do autopsies of dead manatees at the bottom of ponds you see the opposite pattern does that make some sense? alright that was depressing I'm sorry manatees are adorable the bottom is a famous story from statistics I say a little bit about it in a little history box there were two bombers with bullet holes in them simulated bullet holes I drew those myself with a very careful formula so there was this common problem the Royal Air Force had with these bombers they came back from missions dropping bombs and pamphlets into Germany and during the war the Royal Air Force was severely limited in raw materials they were asked where should we put additional armor we can't armor the whole plane but because we don't have enough armor and they wouldn't be able to get off the ground if we did because they'd be too heavy so where should we put the armor and so the first idea is well let's look where the bullet holes are and let's put armor there but this is exactly the opposite of what you want to do conditional on having returned from a mission the bullet holes are in places that you might answer so notice for example none of these planes have bullet holes in the cockpit there was a bullet hole in the cockpit the plane didn't come back this is a famous story in statistics and there are some interesting papers on it that I cite in the book if you're interested but these are both examples of where the sample we have available is conditional on one of the things we want to make an inference about these are poisonous situations for inference but we can make progress but we have to be careful about conditionality this week is interaction effects which is one way of getting more conditionality into the models and interaction is a way to make the influence of a predictive variable conditional on the values of one or more other predictive variables and often nature is like that in some ways everything in statistical inference is conditional on something either on the data everything's always conditional on the data we have at hand like in the manatee and the bomber case that's why I want you to think about them as golems little constructs nothing necessarily true about them and also on our priors I like to think about it the information state that the machine begins with when it makes its inference inference is conditional on that it may be only weakly conditional on it but it's still conditional on it and interactions are a way of constructing linear models that ask questions about how the influence of a predictor on an outcome is conditional on the values and I think most of you drink coffee and or tea because you all need caffeine in your lives you can add adding sugar to coffee by itself doesn't really sweeten it very much and stirring your coffee by itself doesn't sweeten it either but doing both sweetens it a lot I'll say that again adding sugar by itself to your coffee doesn't actually sweeten it very much and you know why because you get this sludge and you mainly don't get that until the very end and then it's delicious isn't it and stirring does nothing for the sweetness probably I don't know it could be a placebo effect but if you do both then the sweetness goes up a lot that's an interaction effect if you had a dummy variable for added sugar and a dummy variable for stirred by coffee the main effects are nothing the things we've been examining so far but when both happen together there's a big change in the sweetness of the coffee so we're going to start with these effects as we go oh yeah I have it on I should have advanced, there it is influence of sugar in coffee depends upon stirring the influence of a gene on a phenotype depends upon the environment the influence of skin color on cancer depends upon latitude something I'm acutely aware of in my family and those of you who can't see me on the screen I'm a virus descent generalized and then we're going to master these things in the context of ordinary linear models with Gaussian likelihoods where it's easy, where inference is pretty benign and then when we get to generalized linear models in a couple of weeks everything necessarily will interact even if you don't explicitly put in an interaction effect because there will be what we call ceiling and floor effects on the outcome you don't have to understand that right now this is just a promise that when we get there and multi-level models part of their power comes from the fact that they are essentially massive interaction engines where you interact every parameter with the identity of the case that comes from of the entity in the data that comes from and you can estimate tens of thousands of parameters that way and make quite powerful inferences about the variation in response in the population and that's incredibly useful especially in cases where the average effect is not an interest think about pharmaceuticals the mean effect is not an interest the variation and what predicts that variation is what's of interest so you're going to need to understand interactions to get multi-level models as well so that's why I'm going to spend the rest of this week on it it seems like a pedestrian thing usually there's like three pages on it in a textbook I gave it a whole chapter which seems pretty dumb down compared to well the chapter that came right before it was like information theory but I think this is actually subtle and lots of professional scientists are looking at interaction effects and I think making mistakes is a consequence not their fault right there are victims of a curriculum so let me introduce a new data example that we'll stick with for the rest of today I bet here's a data set where we're looking at the correlation between something called terrain ruggedness across countries and the log GDP in the year 2000 this is called real GDP as you can see in Justin and all that so think of this as a crude measure of the economic performance of a country and some feature of its geography and terrain ruggedness index I give you a little bit more information and an end in the book is something geographers figured out that has to do with the energetics of movement it's about transportation expense is the idea but the idea is it's big in countries the Paul very rugged there's no flat land in the Paul shouldn't say that there's probably a little bit of flat land somewhere and Kyrgyzstan Lebanon surprising the rugged and Switzerland famously so all mountains little cute villages in between everybody's stolen not the gold sorry I had to say it and then there are lots of countries which are pretty much flashed on the other end globally here I'm showing you if you just fit a linear regression predicting log GDP in the year 2000 by terrain ruggedness in general countries with rugged terrain have poor performing economies and this has been known about for a long time transportation is more expensive and that hurts markets is the idea and it's not so far fetched it makes a lot of sense and countries go out of their way to deal with these issues blasting holes through mountains we're going to explore the impact of ruggedness depends upon continent so let's split the sample so I'm fitting two linear regressions on two separate samples now I've taken all the African countries out and put them in one data set around a linear regression on the left here that I'm showing you the lower right same variables log real GDP in the year 2000 also I say this is per person these are bigger than others against terrain ruggedness the slope is positive in Africa there's a lot of uncertainty because there are fewer countries but the posterior median is definitely positive the 95% interval barely hits zero there the rest of the world is on the right and it's still the same relationships still quite reliably negative which would make sense so the question is the relationship between the outcome and the predictor rugged depends upon something else and in this case it's a binary variable whether you're in Africa or not you being a country you are Nigerian congratulations what we want to do though is do this in the same model what we've done here is very illicit in a sense we've constructed two samples so we've given up a good amount of statistical knowledge we get no estimates about the reliability of the split we'd like to be able to do that and we'd like to be able although I don't say it on this slide I forgot to put it in we'd like to be able to do a model comparison between the model with the interaction where we take into account continent and one without that and yet we've not got three data points one with just Africa one with the continents that exclude Africa and then the whole world and model comparison and also there are parameters like sigma and the linear regression which want to use the whole sample even if you have interaction effects and when you split the sample you don't get to use all that data to make those common parameters so we don't get to pool is what I say here pool information in that way and when we get to multi-level models we'll actually be able to do pooling across the continents you can wait to understand that until we get there but first let's start with what doesn't work just to train your intuitions up to it putting in the dummy variable for Africa definitely doesn't work because all this does is allow Africa to have a difference intercept in the regression line so that's what I'm showing you here here's the linear regression where we've got little r sub i is the ruggedness for country i and capital a sub i is an indicator variable whether the country is in Africa or not the height change the intercept it adds a constant to the value for view or not it moves it up and down and you can see that in the predictions here the blue predicting trend is for Africa and the black one is the non-African countries all plotted up here you'll see they have the same slope because the model requires because the only thing that affects the slope of this line is the coefficient in front of ruggedness and the length of the line can change so all this regression tells you is that on average African countries have lower GDP but we knew that colonialism we have explanations for that what's going on we're interested in the ruggedness effect which we're getting nothing out of from this model so let's add a real interaction think verbally what you want to do we're asking the question we want to write down a model that answers the question what's the parameter in this model betas of R which is the model's answer to how the outcome depends upon ruggedness now we want to complex make that parameter complicated and here's what we're going to do it's turtles all the way down it's like you're saying we're going to take that parameter and we're going to make it a model you're like what, okay yes, it's my favorite thing to do is take parameters and replace them with models and it's like the exhibit thing and you guys flatter me you understand my jokes at least you're pretending so let's just rename it for a second gamma sub i and gamma's not going to be a parameter in the posterior distribution it's just going to be a label for another linear model that I'm going to add in here where we redefine the influence of ruggedness the association between ruggedness and the outcome intercept beta sub r which is the coefficient we had before and then it has an adjustment when the country is in Africa it gets an adjustment so essentially this gives you two different slopes it gives you a slope beta sub r when the country is not in Africa and it gives you a slope beta sub r plus beta ar when you are in Africa and we'll step through this multiple times so make sure you get it so when we expect this thing and then there's a linear effect of Africa on the slope and Africa here is discrete it's zero or one but if it were continuous this works the same way you're constructing a linear model of the coefficient and this is what a classical interaction effect is it's a linear model in a linear model that's how it works does this make some sense you're willing to keep trucking along at least for the moment there's a conventional presentation of this let me show you here take gamma and substitute it into the mu line and expand you get the conventional representation of an interaction model of this kind for now we have three terms there are two terms the so-called main effects that we've been working with up to this point and then there's a new term where you have the product of two predictor variables and this just comes from the algebra taking gamma here putting it in there and then expanding and you'll get this I leave that algebra to you at home in the models but when you use map you can actually put in as many linear models as you like don't call me on that actually there's probably a theoretical limit which will explode but it really just cascades from the bottom up and just keeps substituting symbols to build the linear model all the way up into the likelihood so in theory you could have as many as you've got memory to do here so we just write it just like this and it reflects the two linear models we had before we started inspecting the posterior distribution from this model just do the WAIC comparison model 7.3 is the model that doesn't include continent model 7.4 is the model that includes continent only as a main effect no interaction and then model 7.5 is the interaction between ruggedness and continent and what I want you to see is it's pretty much a slam dunk for the interaction model which you've got before when we looked at the split that was a big difference the slope changes direction almost completely it flips the sign basically changes in and out of Africa so let's march forward just looking at model 7.5 so we can keep the discussion easy but if you want to go back through and do this with ensemble that's a good exercise make sure you understand it it won't make much difference one of the models is reasonably close but still does a much better job so there's probably some overfitting in the slope in Africa and you have a homework problem at the end of this week where you look at that that would be equivalent right because the third one's got like zero weight but you can put all three in and it's harmless because the bottom one's essentially zero it'll get like one sample so let me show you I'm just putting the predictions back the results from the interaction model the full code to do this is in the book there are no surprises it's just an issue of you're putting up the raw data for each continent and then you compute counterfactuals across the horizontal setting the Africa variable to one or zero on the left it's one on the right it's zero and you get the difference because it turns one of the parameters on and off does this make sense and that's how you can figure out how these interaction effects work but interpreting interaction models from the coefficients is even harder than interpreting models with I mean of course it's harder but it's basically impossible I really warn you off the idea that you can understand interaction effects by looking at coefficients and I'm going to spend some time explaining why I think that now if you get really pro at this you can do it but it's hard and I said I see published mistakes all the time marginal posterior distributions just summaries of their moments and they don't show you anything about the covariance between them and that's one of the major hazards that's going on here the other hazard is all the parameters change their meaning when you add interaction effects we're going to spend a little bit of time talking about that as well this is all meant so that you don't try to interpret these tables I'm afraid that's what goes on we'll have another example on Thursday I'm trying to understand these parameter estimates and I'm going to show you you can do a little bit but again I want you to plot stuff instead do counterfactual experiments and then your readers will understand because your readers will never understand your model as well as you do so if you're having trouble with it just dumping it on the page and your reviewers kind of like not okay then it gets published it'll go into the pits of lost science but if you make plots it'll go into the pits of lost science so here's the summary I replicate the model up there for you just for reference so you can see where the parameters go in so A is alpha and B R is beta R and B capital A is beta capital A over there and B capital A little R is the so-called interaction coefficient which expresses the conditionality of the relationship between ruggedness and the outcome on continent so let's ask where's gamma because it tells us the slope of the line connecting ruggedness and the outcome log GDP so if you want to compute it and that's what you essentially do when you make those prediction plots you know the model so you can plug it in and say take the expression for gamma in the case of a country in Africa the slope the relationship between ruggedness and log GDP per person in the year 2000 will be the sum of both of those parameters because there's the value of the predictor which is 1 which is an indicator so this is about minus 2 I'm going to do some rounding because you know we're not launching space shells so we keep saying your o-rings are not going to crack to do a little bit of rounding and eventually challenge your jokes will no longer make sense to anybody it's definitely not too soon that's what I like about them some of you were not alive so a lot of you were not alive and this was obviously from a someplace time when I used different priors 0.35 and you get an estimate of about 0.2 for the slope in Africa so positive just about 0.2 and if we do outside Africa it'll just be the BR term so it's just about negative 0.2 which is the flip on the other side of zero do you see how to construct this now this is just map estimates let's think about how to do that we actually want the distribution of gamma in and out of Africa that's what we want to make inferences about and the difference between them I'll say that again what we actually want to make inferences about is it's distribution of gamma the posterior distribution of gamma in and out of Africa and the difference in the posterior distributions of gamma in and out of Africa so this might be confusing if you're paying attention it will be because I said gamma's not in the posterior parameters, and parameters have distributions. So anything that's a function of parameters also has a distribution. So that's why gamma has a distribution, a posterior distribution. We can calculate it exactly as you've been doing stuff so far. We extract samples from the posterior, and we use them as a substitute for doing integral calculus. This problem is all Gaussian, so integral calculus actually proceeds elegantly and beautifully, and it's actually a joy to do. But some of you haven't had, of course, an integral calculus, and yet you're really superb scientists, and I'd like you to be able to do this. So as always, we're going to do it with the trick way of doing integral calculus, just using samples. And remember, later on we're only going to have samples. We use Markov chains, so you need to learn how to do this anyway, even if you're an integral calculus wizard. So extract samples. We just use the formula for gamma, substituting it to our code to do this. The first one, gamma dot Africa is gamma for an African nation. We take the posterior distribution for Br. We add to that the posterior distribution for Br times 1, because it's an African nation. And it takes all the corresponding samples down, adds them up. We get as many values of gamma as we've had samples from the posterior distribution. So we get a distribution, and that distribution is shown by the blue distribution here in the graph on the bottom. Gamma in Africa centered on point 2, just as we learned before, but now we get the whole distribution. You can see there is some probability below zero. The model says, yeah, there's like, I don't know, a 3% chance that the slope is negative here, slightly negative. Does that make sense? I'm not saying so far. For not Africa, same idea, but now we put a zero in there. Same function. Essentially, as you can tell, that just gets rid of those, and it's just the posterior distribution of Br. But that is the definition of gamma, not in Africa, centered on minus point 2, little bit above zero as well. Now, here's the trick. You might be tempted, as I've seen many unfortunate accidents in print do this. So look at the overlap between these distributions. It's an indication of how different they are. You cannot do that. That is wrong, logically wrong. If you want to know the difference between two parameters, you must construct their contrast. Their contrast is the distribution of their difference. These two things are correlated, and you can't see the correlation on this graph. All use, these are marginal distributions, right? There's a two-dimensional space here that's not being shown very well. So how do we do that? We just calculate the difference, and the difference is, I showed you the code in the book. You take gamma Africa and you subtract gamma not Africa, and you store that in a new symbol. That's the distribution of the difference. It's that easy. That's what a contrast means. It's just the distribution of the difference. And the correlation structure is preserved in the samples always. As long as you keep the ordering, right? The correlation structure is preserved. That's why I like this way of doing the course, right? Set it in a book calculus, too. And then you can see the difference, even though both of those distributions overlap zero. So the model isn't completely sure that Africa has a positive slope, and it isn't completely sure that the rest of the world has a negative slope, although it's pretty sure. It's almost perfectly confident that the difference is positive, that Africa has a bigger slope. So even if Africa has a negative slope, it's still the model thinks it's gonna have a bigger slope than the rest of the world. Does that make sense? And that's the inference you're interested in. Yeah? Some of you are liking this. I don't know. Trying to read your your faces. I don't know. Tess is bored, I can tell. I was thinking about wooddecks. The difference, the contrast? Yeah. Yeah, you just take, and the code is in the book, I apologize for not having it up here, you just take gamma Africa and subtract gamma not Africa, the symbols, and store that in a new symbol, and just call it diff, and plot that. And that diff will hold the vector of differences, and that's what that distribution that I just have is. And that preserves all the covariance among them. Now you are thinking about wooddecks, I can tell by the smile on your face. Is that just the distribution of B to A to A? It's the coefficient of the interaction? No, it's not, because there's a correlation between these differences, right? So this is just like a profile view of gamma Africa, and here's a profile view of not Africa, but these two things are correlated with one another. So in, what the difference always being positive tells you is, as gamma Africa gets smaller in the posterior, so does gamma not Africa. They have a positive covariance, and that keeps the difference positive. Almost always, except for like, I calculated in the book, it's like 0.038% or something like that, or 3.8%, I forget what it is. Some really tiny amount, as you can see there. And it's because these two things are correlated. Yeah, so if you take this value here, you have to take a value down here, is the idea. Which parameter? Yeah, what's the meaning of B to A? Great question. I think we're going to get to that on the, if not the next slide a little bit later. The quick answer is, when you ask what is the interaction coefficient means, well, it's the adjustment to gamma when the predictor changes. It doesn't have a clear, cognitively interpretable meaning on the outcome that people can easily wrestle with. It has a clear algebraic definition, but it's really hard to wrestle with verbally. And that's why I discourage you from trying to like, you can't inspect it alone because the relationship between the predictor and the outcome now depends upon more than one parameter. And you want to process it. It is really tricky. And these covariances like this can actually lead you astray. You can get situations like in the left and right leg example, where the fact that there's a really strong covariance in those two parameters, and it looks like the model thinks there's no relationship between leg length and height. But actually, the model predicts that leg length and height are strongly related. You can get that interaction effects, too. We're just looking at the marginal distribution of the interaction parameter itself. It'll overlap zero a lot. But it covaries with some other parameter, like the main effect. And so there's some maybe reliably above zero, right, because of that covariance. It's super frustrating and it's, you know, golems, man. I don't know what to tell you. It's just how it goes. Is this starting to make some sense, though? So tables of coefficients, trainwreck, nearly always. Unless you're really pro at this, you can easily make mistakes. And the evidence of that is that famous scientists have published mistakes and interpretation of their models through this. And I'm not going to call anyone out, because those people outranked me. But when they retire, I will come back to me. So, okay. So let's think about, I've got a little bit of time here to do some useful stuff. So the thing about interactions, too, that makes interpretation a little bit tricky. Although I want to argue that this is, we're going to turn this into a bonus. Is that they're bi-directional, logically bi-directional. At least linear interactions are. So if the effective ruggedness depends upon continent, then it is necessarily true that the effective continent also depends upon ruggedness. I'll say this again. If the effective ruggedness depends upon continent, then it is necessarily also true that the effective continent depends upon ruggedness. Even though they, when you say it in natural human language, they sound like different claims. In the algebra of the model, they're exactly the same logical relationship. And this is a very elusive and weird thing. And now, but here's the bonus. So the first thing to be careful of is you have to realize that you're asking both of these questions simultaneously when you define the model. The bonus is you can plot it both ways, and it often teaches us different things that we didn't realize before. It's like this amazing thing about math that it's all totology, and yet you can learn from it, right? That's why I do it for living. People pay me money to do totologies for living, right? Because we somehow, it was all there in the assumptions, but we couldn't see it until we processed it. And that's what plotting in both directions is going to do. So I want to, we're going to do that with the examples this week of interactions. And in your homeworks, when you do interactions, I encourage you to plot it both ways and get some comfort for it. It often teaches you different things. So let me show you how to breakly why they're the same claim. So I've taken gamma in this linear model here, and I've just replaced it with its formula, right, in parentheses, plugged it in there. And we can factor this now. So first I expand it to the conventional form of an interaction. And then we can refactor it so that we pull the a sub i out of all the terms in which it's in. And now we've got the same gamma reappears again, right? But now with a different intercept. And now it's asking the question, so these things that gamma up here asked, how does the effect of ruggedness depend upon continent? Now we're getting how does the effect of continent depend upon ruggedness, multiplied by the continent thing out here. Same exact algebraic relationship. The math doesn't see any difference. You get the same estimates, no matter how you put it in. And that's why this middle line is usually the way we enter these models. Because it sort of admits the bi-directionality of it. In the book I use this, and it's going to help you understand this, this tale of Burden's ass, which is this strange Greek logical paradox where there's an ass that a donkey that has two piles of hay, I think it was in the classic story, and it's equidistant from both of them, and so it starves to death because you can't decide which to eat. It's a silly logical puzzle. These are things that rich Greek people thought about in ancient history. But these models are logically like this. There's no way in the model for you to decide which of these interpretations is correct because they're logically the same, and you can't make up the mind. But they're going to feel different to you in interpretation. You may learn different things from them, depending on how you visualize them. So let's plot it the other way now. Let's plot the effect of being in Africa as if it depended on ruggedness. What that means now is we fix ruggedness at different values in each graph, and we vary continent on the bottom, on the horizontal axis, still plotting GDP. And so this is just counterfactual predictions from the model. The only trick I've done here to make this a little bit more appealing is I've faded out countries that are far from the ruggedness value on the top. So I'm using transparency on the raw data points that I've plotted to show all the ones that are in dark blue there have a ruggedness near the minimum, near zero. So what we're seeing is here, countries that are pretty flat, the model says that if we put them in Africa, their GDP goes down. That's what it says. Now let's move up ruggedness a little bit to about the median ruggedness of about three. Now it's still negative, but not very much. There are fewer countries out here by the way. Most of the world is pretty flat by this index at least. But it's gotten, it's gotten, ruggedness isn't nearly as bad now. And African countries, if you're a sort of median ruggedness country, it doesn't hurt you very much to be in Africa. But now what we're getting from this is like a different story, but it's actually the same story. And then finally, for countries with their really rugged, there's a mild advantage to being in Africa actually. So this is the story it told as the effect of ruggedness depends upon, the effect of being in Africa depends upon ruggedness. So it's this weird thought experiment. Like if I could take Switzerland, if I could drop it in Africa, would it be better off? The answer here is yes, it would be better off. Now of course you can't do that. It's a weird counterfactual experiment. And that's the domain knowledge that prohibits this interpretation, right? Because usually the intervention would be we can level hills. We can change the ruggedness of a country, but we can't put a country in Africa, right? So that's what means this interpretation is a little weird. And the other one makes a little bit more sense. Does that make some sense? Yeah. But still, this may help you understand the general phenomenon that's going on and what the model says. Okay. All right. I have 15 minutes. I can still do some good. So yeah, here's looking at it both ways. At the top, we're looking at the interaction from the direction of the effect of changing continent depends upon the ruggedness. For flat countries, putting something in Africa hurts its GDP. For really rugged countries, it either has no effect or even helps it a little bit. From the other direction, on the bottom is the classical direction, the effect of ruggedness depends upon continent. Outside of Africa, or rather in Africa, ruggedness helps you. African countries that are rugged on average have better GDP. You'll notice, say, shells there is exerting a lot of leverage on this regression line and you will have a homework problem or you take say shells out so you can play around with this and get an idea. That's the kind of thing you probably want to do in analyses like this. And then for non-African nations, there's a strong negative relationship that doesn't really depend upon outliers. You can actually delete all the high ruggedness ones too and it's still quite negative. These are, I think you do learn something different from each of these. I think about it in the bottom view, it's not clear that the intercepts are different between these two graphs, right, because notice that the vertical axes begin and end at different values. And so you sort of miss the effect of the default plotting settings. So the fact that the African countries on average are still worse off than the non-African countries. But you see that very well in the top because even at extreme ruggedness, African countries don't outperform the rest of the world. They just break even. You see that in the top interpretation. It captures the intercept better. So there's value in both. It makes some sense. So plot, plot, plot. There's way more to get out of this plotting than you could ever get from the table of coefficients, which are just marginal posterior distributions. They allied all the interesting covariances that are going on. Questions about this? No? Over the next day some will arise, maybe we'll keep going. Let me introduce the next example which will work through when you guys come back on Thursday. We're going to work through the greenhouse data set on tulips. The evolutionary botanist in the room will be thrilled perhaps by this or maybe you hate tulips. But hey, they're worth money. So people want to figure out how to grow them, right? And it's a nice experimental data set. It'll be the cleanest data set we look at in the course. Let me just introduce you to it and the basic problem. This is a nice one because I think even if you're not an evolutionary botanist, you'll understand the biology because everybody knows something about plants, right? Maybe? No. So these data are 27 replicant blooms across three levels of water and shade treatment in a greenhouse. So there are three levels of water that are added and three levels of shade that are added. High levels of shade mean not much life is reaching the plant. High levels of water mean lots of water and not so much that kills it. And what we're measuring here I think is the area of blooms is the outcome variable. So there's just three variables are interested. Blooms is going to be our outcome. It covaries with water and shade. And you can kind of see if you squint in this pair spot in the top row, you can kind of see how it... There's a general positive relationship with water and that makes sense. Plants need water and everybody knows that. Even my five year old knows that. And they're hurt by shade. The average bloom goes down. Bloom area goes down with shade. There's a lot more going on in here because the effect of water depends on shade and the effect of shade depends on water. They're both necessary or rather light and water are both necessary for photosynthesis. Remember this was a crev cycle? Something like that? Remember it's a long time ago that I did real biology. And so we want to look at an effect we already understand in the context of an interaction model so you can begin to see what's going on. So I got 10 minutes. Let me see how far into this I can get at a comfortable pace. So first let's fit the no interaction model. I'm just going to state the interaction no interaction models on this slide. The no interaction model is the first one. In this case water and shade of independent effects. We just add them into the linear model. Capital B is the bloom variable. And then our linear equation for mu is an intercept and then a coefficient times the water level which is going to be 1, 2, 3 in this data set. We just add water levels. And 3 is 3 times as much as 1. They are metric in that regard. And then plus a coefficient times the shade amount. Shade is also 1, 2, 3. One third as much like reaching. The interaction model, the second model on this slide, now the question is that water and shade have interdependent effects and we want our little regression machine to tell us the extent to which each of these predictor variables, its impact on the outcome, the extent to which its impact on the outcome depends upon the value of the other. I'll say that again. Whatever the interaction model asks is, to what extent does the impact of either of these predictor variables on the outcome depend upon the value of the other? This is the burden's ask problem, that it's both interactions. So it's going to ask, to what extent does the association between water and bloom size depend upon shade? And to what extent does the impact of shade on bloom size depend on water? We'll plot them both, although I don't think we'll get there quite today. Okay, so yeah, I think I'm gonna have just enough time to get into this. So the first thing is interaction. When you compare the interaction model to the main effect model, often the parameter estimates change in wild and unexplicable ways. That's what I want to show you here. This is another reason not to try to interpret these things. I have routinely received panic emails from colleagues about this. Like, did I fit it wrong? It's like, no, you're just reading tables, that's the problem. So there's another fault again. Victims of curriculum. There's no scientist's fault. Right. So let's do the model comparison first at the top here just to show you that the interaction model is a lot better. This is a case where we're getting a pretty wide separation. Look at the WAC graph. Here's the distribution of the standard error. This is the difference. Standard error one in both sides. This is pretty good evidence. The interaction model is going to make better predictions. It's taking up something. And of course it is. You know the biology here. If there's no light, water doesn't matter. A tulip grown in the dark doesn't matter how far you get it. And a tulip without water doesn't matter how much light you get it. You know this. So the interaction is real and WIC picks that up. Now let's look at the coefficient table and just see how hard it is to interpret what's going on. Well first let me let me show you the the non-interaction model. These we can interpret. You guys have gotten good at these, right? They're all the box data, right? You're really good at this. So here's the intercept and the value of that is uninterpretable because it could be anywhere, right? You don't you don't think about it much. And here's the coefficient for water. It's positive so water helps, right? As water increases the mean blooms go up. And that's correct. Absolutely true. The effect of shade is negative. As you increase shade, blooms get smaller. That makes sense as well. And then you've got your residual variance. Makes sense. And that's this model's not lying to you. It's picking up the major trends but it's ignoring the interaction. Now we look at the interaction model which makes substantially better predictions. Both in sample and out of sample. And okay now the intercept has changed your action but so we never really I've tried to train you to ignore it, right? Unless you center your predictors you can't really interpret it at all. But notice that it's changed completely. The sign is completely different. It's swung around a lot. Huge change. You might worry about that a little bit. There's been a doubling of the effect of water. Like what? Really? Is water twice as important than the other model? No. Prelude here. No, that is not what has happened. The effect of shade has flipped around. Now shade helps. Is that right? No. This is not. And now we've got the interaction coefficient and it's negative. And what does that mean? Right? How does that let us understand the combination of these two predictors on it? What's happening here is that the meaning of these parameters like the main effects BW and BS is not the same in these two models. They have the same label but they have different meanings. And I'm going to leave you with that hanging in the air. And when you come back on Thursday I will reveal how the meaning has changed so that you don't stumble into the hazards of my senior colleagues. Thank you guys. I'll see you on Thursday.