 It's also a short chapter on the skills from before. So in this institute, there's a lot of fancy gene sequencing work that goes on that those of you in the audience probably need no introduction to. Very close to this classroom, there are big fancy machines you can feed samples of your spit into and they will sequence your genome. But biologists have been, in a sense, sequencing DNA or genotyping people for a long time, even before such machines are around. And there is a long history of very nice and efficient tests, if you will, of people's DNA types. An interesting category of these are tests for color blindness. So most mammals have terrible color vision, terrible vision in general. Mammals really don't go in for eyes. They're kind of bad at it. Most mammals only see intensities of blue, you might say, and black and white. But primates have really fantastic high acuity color vision, most of them, nearly all of them. Humans, no exception. And what comes along with that are various very interesting forms of color blindness. Most of them are excellent. So they're expressed mostly in males. So red-green color blindness is the most common and there are lots of cool tests, like the one I'm showing you on the slide here, that can assess with very, very small error whether what someone's genotype is at a particular locus on the X chromosome. If you're male, that is. If you're female, you need other tests. But if you're male, since you only have one X chromosome, this is a test of your genotype. So I ask you, can you see the animals? You don't have to report your color blindness to the crowd, but you will know. And I have lots of family members with excellent color blindness. Some of you probably do as well. Like I said, it's extremely common. It's nothing to be embarrassed about. When you make your scientific graphs, however, you should keep it in mind. Do not have a red line and a green line and ask your audience to tell the difference between them. That would be cruel. It's very mean. Anyway, so there's a cow here and a bear and a bunny rabbit and something in which I think is a squirrel. And then there's something else that I don't know what that is. Just a weird shape. It's an alien or something. It's a wolf? Okay, well. Anyway, figure out what the animals are later. It seems that most of you have good full color vision and you can tell the difference. I actually, this particular chart can tell three different kinds of color blindness given the different patterns of colors on it. Tests like this do exist in the sense that there are two categories and we'd like to sort, in this case individuals, but things into those categories and there's a simple cue that we can use to get it. Most of the stuff we do in statistics is not likeness. There we have procedures in statistics called tests but they're nothing like color blindness tests. They don't work like this. They don't sort things necessarily into bins. At best they can exclude the possibility of one category among many but most of them don't even do that since they're purely probabilistic. What I want to do this week is talk about, and that's what we talked about in previous weeks and that's not news to any of you, but there's another contrast of statistical methods with what we usually call tests like color blindness tests is that the meaning of the cues is very subtle and conditional upon other features of the context. So let me give you a couple of examples. This is another mammal. It does not have color vision. It's like most good mammals. It's a dugong. It's related to elephants and it's a slow-moving aquatic herbivore and the only natural predator these days of the dugong most of them live in the waters around Florida now is the speedboat. The speedboat is, accounts for substantial proportions of the natural mortality of the dugong and what you're seeing here on this individual are scarves that remains healed over wounds from propeller from a speedboat as it passes by these animals. So these animals graze in shallow waters and speedboats also graze in shallow waters, if you will. And so injuries like this are fairly routine. So campaigns have gotten underway over the last decades in Florida to equip speedboats with cages around the propellers to protect the dugongs. The idea being that this would drastically reduce the mortality of dugongs. Unfortunately it has done nothing to reduce the mortality of dugongs and the reason is because dugongs don't die from propeller wounds. They die from keel wounds. They die from the boat hitting them. The lucky dugongs are these who only get hit by the propeller. The dugongs you don't see in nature are the ones who've been removed and do not have propeller wounds. They have keel wounds with your massive gashes on their backs which have killed them instantly. Those are the ones that end up in autopsy. Yeah, welcome to morning lecture. So I think there's an interesting structure to this example which is important and I'm gonna bring it out. But let me give you another example with a similar structure which is maybe a little bit more heartening and less morbid. So this is a World War II bomber, British bomber, the Armstrong Whitley. These things were manufactured in Whitley which is a little suburb, manufacturing suburb in England and now I think it's where they make jaguars. I think the same plant makes jaguars, you know, the broken sports car. Sorry to jaguar owners in the audience, but I'm gonna get angry emails over that one. I should watch myself. Anyway, it's the jaguar plant now. They used to make bombers and this was a war-correspondable war too. It started out dropping pamphlets on Germany telling Germans in fancy script that they should go through their government and then later it dropped a lot of bombs. And in fact, right here on Leipzig were dropped by these bombers. So the thing, one of the major predators of the Whitworth Whitley is of course anti-aircraft fire. It didn't have natural predators of other sorts and the bombers that were coming back would have lots of shrapnel holes in them. And so a major problem that was put to British statisticians at the time was from the patterns of injuries, if you will, on the Whitworth, which parts of the bomber should we up armor? And you can't up armor the whole thing because then it can't fly. And then of course, also there was scarcity. So even if it could have gotten off the ground that could be totally up armored, they couldn't afford to. They were rationing everything in World War II. So this is a fairly famous story, a very famous statistician in statistics at least, Abraham Wald studied the data from these bombers and figured out that what you should do is you should up armor the parts of the planes that are not damaged. And the reason is exactly like the case with the dugong. The lucky bombers are the ones you see coming back home. The ones that didn't come back home got hit in the places that mattered. So those are the places you up armor, not the places that have holes in them on the returning surviving planes. It's just like the propeller injuries, right? We're misled because we're failing to notice that our sample is conditional on something. It's a sub-sample of the total population and this changes the meaning and actually inverts the meaning of the cube in the sub-sample. So here's kind of, it's a figure 7.1, chapter 7 opens with these two examples. I wanna use these examples to draw your attention to the importance of conditioning in statistics. That the association between two variables is often conditional upon some other variable for important causal reasons. Like in this case, that these artistic renderings of dugongs on the top row, they look like bowling pins, those of you who've done any bowling, showing the scar patterns. These are the survivors. And so the fact that we see more propeller injuries among survivors actually tells us that the propeller is the least dangerous part of the boat, right? But you need to think about the selection process, how you've gotten these data to realize that. And the same goes for the bombers at the bottom. Notice that none of these planes have bullet holes in the engines. Because planes have got, well, not bullet holes, shrapnel holes in the engines, they didn't come back. So we have to think carefully about the conditionality of the associations. So this chapter, chapter 7 is about things that we call interaction effects, which are a device for including more sophisticated conditioning in our models. And that conditioning is meant to capture causal hypotheses that produce paradoxical relationships like the case of the propeller injuries and the patterns of damage on bombers. Now of course everything in statistics is conditional. Our inferences are conditional on the data, they're conditional on the model, they're conditional on the information state we have going into the exercise. And interactions are yet another way to include yet more conditioning in it. And they're often essential. And from this point on in the course, we'll be learning more devices that help us get more kinds of conditioning into things. So first, a few simple plane language examples of interaction effects to give you an idea of what I mean by conditioning. These will be simple examples. What we mean by a simple interaction in just today is the influence of some predictor is conditional on other predictors. Or better would say the association of a predictor with an outcome is conditional on other predictors. So for example, the influence of sugar in coffee depends upon stirring. You can add sugar to your coffee, but if you don't stir it, it doesn't make the coffee that much sweeter. The coffee gets a lot sweeter when you stir it. Yep, you buy that. Okay, if not, you can try the experiment later today. Influence, when I taught undergrad stats, we would do this experiment so that you're allowed to give undergrads coffee and sugar. So I think you're allowed to do it. The influence of genome phenotype depends upon environment. Biologists will understand this, right? It's one of the central dogmas of biology. The influence of skin color on cancer depends upon latitude. When we get to generalize linear models, starting next week, actually, in those models, once we get outside of simple linear Gaussian models, everything actually interacts to some degree. The reason is because there are ceiling and floor effects on the outcome ceiling, and that forces all variables to interact. If you don't understand that right now, I sympathize, we'll spend a lot of time on that central point later, and it makes interpretation of generalized linear models a lot harder. And in multi-level models, are really massive interaction engines that allow you to interact variables with one another in complicated ways, but still control overfitting. So this is where we're moving. Interaction as a concept is central to understanding multi-level models. Okay, let me give you a data example. There are gonna be exactly two worked data examples today to help you understand interactions. The first is a discrete variable interaction. Get our training wheels on and understand what's going on. And then the second case will have continuous interactions. I think those two is enough to get you started. So these data sets are, of course, in the R package. This rugged data set is data on most, not all, but most countries of the world, various economic indicators about these countries in the 1990s and early 2000s. And our explanatory variable we're gonna be interested in is the ruggedness of the terrain in the country. So it turns out there are lots of economists, in fact, among the most interesting of economists are interested in the ways that physical geography affects economies. Because humans, obviously, being animals live in the real physical world and we get our resources in the physical world. And the costs of transport depend upon things like how rugged the terrain is. So the classic economic theory, going back hundreds of years, is that rugged terrain is bad. And so governments have gone in for the idea of flattening things and making landscape really dull, right? Push all the hills into the valleys. That's a classic idea. And so Europe, over the last 1,000 years, has gotten a lot flatter. It's still not perfectly flat. This part of Germany is very flat. Flatter than it used to be, for sure. But, so I wanna show you in this example, there's some evidence that sometimes there are economic advantages to being rugged. So this work comes from an economist at Harvard, Nunn, who's kind of an economic historian of a sort, very interdisciplinary, and has studied a lot about African economies. And so one of the things that he highlights in the paper I get this example from is that the relationship between ruggedness, terrain ruggedness, and GDP per capita, which is a measure of economic performance, a common one. You can think of it as the heat in the economy. It's kind of a raw measure of the economic activity. Doesn't mean the activity's good, it just means it's like heat, right? Stuff is moving. Is reversed in South Africa and outside Africa. So on the left here, well, what I've done is I've taken the total sample and I've split it into two sub-samples, there's two data sets, and I've run linear regressions on them separately. This is something you should not do, and today we're gonna figure out why. You should keep your sample together. But this is to show you what we're after, is how to capture this statistically in a single model. On the left, the regression just on the countries in Africa, ruggedness on the horizontal, as ruggedness increases, GDP tends to go up. The countries that are most rugged have the most vibrant economies by this measure. On the right is the rest of the world, and it goes the other direction, right? So there are outliers, I apologize, those in the room, these slides, when you look at them online, they'll be very clear. But I think Switzerland's up here, it's a very good economy and it's super rugged, but it's a real outlier, it really stands out. Switzerland is wonderful and strange, right? But most countries that are rugged, they suffer the classic economic hypothesis that it's hard to move things around and that's bad for economies, right? So what's going on here? Is this some spurious result? And what we're gonna focus on, we can't really answer that, but what we can focus on is how to recover this in a single model? We don't wanna split the data like this, instead we'd like to have some terms in the model which allow this switch in the relationship inside the model, depending upon whether the country's African or not. So let's figure out how to do that. So I assert, and I hope to be able to prove to you, convince you, that splitting the data is a very bad idea. Number of reasons. First is, you don't get any estimates about how you split the data, right? Of the statistical adequacy of the split. It's just something you impose on the sample, like you were a statistical god and you descend upon it in a cabin and these things are now separate worlds. What you'd like to be able to do is compare models where the split is there to models where they're not. And you can't do that if you just fit separate models to each half of the data. Secondly, there are parameters that wanna use all the data, irrespective of whether it's an African country or not, and you're forbidding those parameters from learning from the full sample when you split it. Talk about this as pooling, which is language we'll use when we get to multi-level models. So what about the simplest thing? So you already know how to do. Africanness, being in Africa is a categorical variable, right? I mean, you can be on the edge of Africa, but so we won't treat it as continuous, right? Well, we'll treat it as discrete for now. And so you know how to do categorical variables. So let's try that. What happens if we put this data set back together and put in a categorical variable for Africa? Is that enough? Of course it's not enough. The dummy variable doesn't work. So here's the model with the dummy variable. Simple multiple regression. There's the continuous variable capital R sub i is the ruggedness of country i, and beta sub r is the coefficient for ruggedness. Yeah. And then a sub i is an indicator variable zero one, so it's called dummy variable for whether the country's in Africa or not, and beta sub a is the coefficient for that. And so this does, in essence, estimate two lines, and I plot them on the right against all the data, but you'll notice they have the same slope. And that slope is given by beta sub r, the coefficient in the model. This makes sense. This is kind of a refresher for you guys. You guys are all pros in multiple regression now. You love it. You dream about it. Yes. Yeah, there's some wincing eyes in the audience there. That's the reaction I was hoping for. So this doesn't work. All it does is change, if you will, the elevation of the line. It changes the intercept, because the Africa coefficient just changes, in a sense, alpha over there. You get two alphas now. There's alpha for African countries, and there's alpha for non-African countries, and there are different levels, but the slopes are the same in this model. So this model's still missing something really important about the structure of the data. What do we need to do? Well, think about the in the actual language again, our goal. Our goal is to make the slope, the relationship between ruggedness and the outcome variable, which is economic performance here, conditional on whether the country's in Africa or not. So we want this coefficient, beta sub r, to depend upon the predictor a sub i. So let's just make it a model. Let's put a model in the model. And you like models, right? So we'll put models inside your models. So let's just take beta sub r and redefine it as a function of a sub i. So let's replace beta sub r with another symbol, gamma. I like reef letters, eventually you run out, but we haven't gotten the squiggle yet, right? So gamma sub i now will be the slope between, the relationship between ruggedness and economic performance for country i, and we're going to make it its own equation, its own linear model. And now there's, it's got two parameters inside of it in one predictive variable. First we have the old direct defective rugged, it's the same parameter, right? It has a different meaning now, we'll get to that later, but it has the same name. And then there's another coefficient, beta sub a r, right? The dependency of ruggedness on Africanus, if you will, multiplied by the indicator variable. You with me? Okay, we're going to spend the next few slides on unpacking this bit. This is a linear interaction effect. You make a slope dependent upon a predictive variable by making it a linear model of its own. You just put linear models inside your linear models, and this is what linear interaction effects are. Those of you who've had a course in regression before may not recognize it like this, but these are exactly like what you're used to. You're used to this form where you're multiplying predictive variables, right? You get products with two predictive variables. That arises just from substituting gamma back in to the equation for mu, which you can see here. So, all I've done on the right-hand side here is substitute gamma in and expand. And now you see we get three terms. There's a term, beta r, sorry, suddenly my r is lowercase, I don't know why. That's ruggedness. What we call the main effect of ruggedness, then there's the interaction effect between ruggedness and being in Africa, and then finally the so-called main effect of the country being in Africa. But it comes from making the slope a linear model that depends upon Africa. And then you expand it out, it's like that. Yeah, does it make sense? Okay, so it's linear, it's linear models all the way down in this thing. You don't have to use linear models, but this is the tradition. Okay, so inside map, you can specify this exactly as I had in other previous slides. You can just make up a name gamma in the equation for mu, and then write gamma as a separate linear model. And that will happily do exactly as your code says, and it will calculate the value of gamma for each case, and then it will substitute it to the definition of mu, and then it will find the posterior distribution from that. You could also write it out with the long form with the product of the two things that works too. I think conceptually, this is nice, a nicer way to do it. You with me? Yeah, okay, so let's see what happens now. We'll do some plotting, but first let's do the model comparison exercise. You guys need some practice in thinking about WAIC and cross-validation metrics. So we can compare three models here, thinking about them. One would be, sorry, I didn't show you 7.3, but it's in the book. That's the model that ignores Africa completely. It just has ruggedness. Yeah, it does badly. Continuous matters. 7.4 is the first model we fit. That's the model that has an intercept for Africa, but no interaction effect, and 7.5 is the interaction model. So what you're seeing is a vast majority of the weight is on the interaction model because the switch and slope is substantial. There's a lot of evidence that the slope outside of Africa is different, maybe even in opposite direction, but certainly different, and that's how it stacks up here. Interpreting interactions is hard, and what I'm going to encourage you guys to do is resist the urge to interpret the parameter. This is a trap I think people get into, is interpreting parameters in general. In simple linear repressions or multiple repressions with no interaction effects, it's a benign fact of the universe that you can directly inspect the value of progression coefficients and understand what's going on in the model, but that's because nothing's interacting. As soon as you have any interaction effects or the outcome is not linear, because it's a generalized linear model, that is no longer true. Now, the rate of change in the outcome as a predictor changes depends upon a bunch of stuff. It depends upon more than one parameter, and so now it's hard, so I want to teach you how to deal with that sort of problem and discourage you from peering into tables like this, tables of coefficients, and trying to figure out what the interaction effect means. It's really hard to do it, and there are lots of mistakes in the published literature that arise when the fact that people try to interpret the coefficient directly, but the problem is, as I'll show you in the next slides, the slope in Africa depends upon two parameters. Not just one, and so you can't look at just the interaction effect to figure out what's going on. You need to know more, and you need to know the average values of predictors, even, to figure out what's going on. It's a big mess. Let me give you a metaphor. There are a whole bunch of fantastic, historical, and law computers that are existing museums that are fun to look at. So before digital computers, this is how computing was done. Well, actually it was either people, actually that was the original thing, or people, basements of people doing addition. That was a very common sort of thing to do. And then there are analog computers. Here's a great one. This is the first model, working model of a tide prediction engine that was made by William Thompson, also known as Lord Kelvin of the temperature scale, that Lord Kelvin, and one famous Scottish inventor. And he made a series of these tide prediction engines, which were literally analog computers, and they were developed to predict the tides, and they're very good at that job. When you look at this thing, though, it's got a bunch of exposed gears and cranks, and then at the bottom there's some tape that's being measured. Sorry, I don't have to show you the whole thing. Looking at the internal states of this thing, the gears, is not what you're supposed to do. If you wanna predict the tide, you look at the output part of this machine, and then there's all this other fantastic stuff that might distract you, that's moving around. But that's not the business end, as it were, of the machine. And that's not the part that helps you interpret what's going on. You don't predict the tides by looking at the individual gears and components of this thing. You look at the output part. And if you were trying to figure out how the machine worked, you would move the gears, and then you'd still look at the output part, and that would help you understand what the gears were doing if you had to reverse engineer this. Regressions are like tide prediction engines, in this regard. The parameters are a distraction. They're little gears inside the machine that produce behavior from the business end of the model, which is predictions. But what you wanna understand is the predictions. So you can change predictors, and then observe changes in predictions, and that's what you wanna use to understand how the model works, and to communicate to your peers how things work. So when you see tables of coefficients, you should think about Lord Kelvin's tide prediction engine and think that those are the ugly gears inside the thing. Yeah, that's nice. And you could print out the state of those gears in a table, but that is the worst way to communicate how the model behaves. It's like if Lord Kelvin were to publish the tides for his colleagues and he would not publish the output of the machine, the actual predictions, but just the positions of the gears that combine and make the predictions. So that's what I think we're doing in the scientific literature, which is a terrible disservice to one another. So let's do better. What we wanna do is push predictions out of these models so that we can understand what the interaction effect means and not get tricked by our natural bad human intuitions about how machines work. So to give you an idea of what I mean about how the slope depends upon multiple parameters now, the slope is gamma inside the model. Notice gamma doesn't appear in this table because it's not a parameter that was estimated, it's a function of parameters, right? So what we need is the posterior distribution of gamma. Now we can get that because gamma is perfectly determined by things that are in the posterior distribution, these two parameters. So we can compute it. So there are two cases. The first case would be there's a country that's in Africa. In that case, gamma is beta sub r plus beta sub ar times one, which at the map values, there's a posterior distribution here, but just thinking about the map is going to be minus 0.2 plus 0.39 from the table up there. Yeah, I'm rounding. Yeah, which is about 0.2. It's positive. Outside Africa, it's exactly in the opposite direction. Opposite magnitude, same magnitude opposite sign. And that because it's just B sub r, right? Now most of these things have posterior distributions and we can plot them, right? So we can extract the samples from this bit model. We compute gamma dot Africa. It's post pr plus post bar times one, and gamma not in Africa is post pr plus post bar times zero. This gives us two different posterior distributions for the slopes. There's slopes of countries in Africa, outside Africa. This embodies this interaction effect, the switch in slope, right? And I plot those in the graph at the bottom, gamma not Africa centered around minus 0.2, and gamma Africa centered around 0.2, but less certain. Fire uncertainty. Does this make sense? Yeah, this is the kind of thing you wanna do is you wanna compute the total slope, not inspect beta AR, beta or RA, how do I say AR here? That, it's only a hint about what's going on. What you want is the total sum and understand it. Cautionary note, before we move past this and start plotting, the difference between these slopes is yet another distribution. So the degree of overlap on this plot at the bottom of gamma not Africa and gamma dot Africa is not an indication of their difference. This is, again, this is a central fact about system length, for instance. I think psychologists are well aware of this as like a contrast effect. You have to compute the contrast. If you wanna know the difference between two treatments, you compute the contrast. That's the distribution of the difference. Now it's not the distributions of the overlap or the distributions of the effects in this treatment are not the thing you care about, right? That make sense? So it's, you can compute that, as I say up here, you just subtract one from the other. And that does all the pairwise attractions. Now you have a posterior distribution of the difference and you can plot that out and it doesn't overlap zero hardly at all, right? These slopes are, even though they overlap, their marginal distributions overlap, their difference is entirely above zero, right? Which means there's almost no chance according to this model in these data that the slopes are the same. Right? Make sense? Okay, what you wanna do is plot, right? So, if you plot these interactions, we get the thing that we had on the first slide. Going all the way back. You're like, thank you, Richard. Yes, that was a nice tour. But now it's credible because you can compare the interaction model with the full dataset to the model without the interaction and you have a strong argument in the interaction effect explains a lot more of what's going on in the data and exactly what it explains for which countries, right? You can try interactions with other continents and I encourage you to do that in your free time. You will find that the other continents do not benefit from ruggedness. Countries in other continents suffer from ruggedness almost uniformly. This is a general kind of feature of a lot of data is that there's this conditionality, this the association between an outcome and a predictor depends upon one of the other predictors as well. This is the discreet example where the dependency is induced by an indicator whether the country's in Africa or not. But of course it's plausible that there's something else going on here. In Nunn's paper, he argues that being in Africa is not actually the exposure of interest but it's a history of exposure to European colonialism and a particular slave trade and countries that were rugged were better defended against Europe basically and that that benefit has carried forward into the present for them. It's a really interesting argument about the possible depth of historical effects on economies. So in that case, there would be some continuous measure that's actually of exposure to European colonialism that is affecting this and might give us a better impression of what's actually going on. So let's look at, for the sake of excitement, let's switch and not burrow down deeper into that data. I'll give you all the variables from the data set in the package. Let's think about another example where we're gonna explore continuous interaction in a simple case actually where it's, well it's continuous but it's discreet so it's easy to think about. Jesus is a teaching example for exactly that reason. This is greenhouse data, it's experimental which makes it all perfectly balanced and easy to think with so it's a good teaching example but of course your data will never look like this. Don't hold yourself to the standard of this. You're superheroes, you will have messy data sets, you will conquer them. So, but this is nice clean greenhouse data. What we have here, my favorite flower, the tulip, 27 replicant blooms across three levels of water in shade. Tulips are big money, some of you may know, right? And they used to be traded heavily in the world. And so figuring out how to grow them at scale is an extremely important thing and the Dutch have perfected the science of this probably. So, on the right here in the Paris plot, I try to show you an overview of the data set. We've got three variables we're gonna work with. The outcome variable is blooms. This is the area of the blooms. The goal is to have big blooms because you can sell them for more money. Bulbs, lineages of bulbs that produce big blooms can fetch a higher price. So, and we have two treatments in a factorial design, water and shade. Three levels of water and three levels of shade. Water is good for plants. Shade is, well, it's over, whether it's good for plants or not, but in general plants like light. They need light and they need water and they need both. So, the interaction effect is baked into the biology here. It's one of the reasons I choose this example too. It's not mysterious probably to most of you that water, that plants need most of these things. Both of these things for photosynthesis work, right? Photosynthesis doesn't work without light and it doesn't work without water. So, we need both. That's where the interaction effect is going to come from. So, let's model these data. First consider the no interaction model on the top, which you can think of as the hypothesis water and shade have independent effects. We're gonna model bloom areas, normal distribution, where the mean has some intercept and then an effect for water levels. For each level of watering, there will be a constant increment of the area of blooms. And then, same for shade level, there will be some increment or in this case, decrement that arises from shade level. Yeah? And the interaction model, I'll do it here in the traditional form where we have the product term at the end. The model looks the same until we get to the end and then there's this interaction term. And this is the case, I've chosen this case because the value of that coefficient ends up being incredibly confusing. And you just need to plot. Just plot. Don't even try to interpret the table. Just plot. That's my message, just plot. So let's plot. First at the top, I'll show you the model comparison between the two models on the previous slide. The interaction model does a lot better here. Unsurprising, because biology, yeah? Photosynthesis, it works. And then we look at the coefficients for the two models at the bottom. There's this helper function and we're thinking called co-eptab, which just makes up tables of the coefficients for different models. And so I want to draw your attention to how confusing this is and that this is not the way you should interpret what's going on in a model. Because these are again the gears inside Lord Kelvin's tide engine. And they do weird things but they make perfectly sensible behavior at the end. So they combine in strange ways that it's hard to figure out just by glancing at the table. So the intercept completely switches sign. Remember, what's the intercept? Well, this would be the area of the bloom when both predictors are zero. Both predictors are never zero in these data so the intercept is uninterpretable, right? The area of a bloom for a flower, they got no light and maximum light and no shade and no water, something like that. It's unsensical. It doesn't make sense. It's not a prediction. But it's one of the gears inside the tide prediction engine, right? So it's value can't be interpreted. The main effect for water, it's positive in the main model. That makes a lot of sense. Actually, water, blooms get bigger when you add water. We expected that. That's good. But it doubles in size in the interaction model. What does that mean? You have to hesitate for a second because remember the relationship between water and the outcome depends upon two parameters now. It depends upon this parameter, the main effect and the interaction parameter that we're gonna get you at the bottom, yeah? So you can't, the slope isn't this value anymore. So even though it's called BW in most models, it means different things, right? What does BW mean in the interaction model? It means if water were zero, that would be the slope. But water's never zero. That's never the slope. Yeah, with me? Good times, right? So this is all meant to just terrify you, right? The tables and coefficients are not adequate to understand how models work. The tide prediction engine. Think of the tide prediction engine. So same story for shade, but now it actually reverses sign. What, is shade helpful now? No, it's still not helpful because it's combining with the other one. And then we've got sigma. Sigma goes down because the model fits better. That's the only reason. And then the interaction effect, it's negative, what does that mean? Yeah, question? Yes, so not the slope, but the action is a parameter in the model. But when you examined before, you said gamma was not estimated as a parameter, which is how many models have not been distributed. Gamma is something we have to compute for this model, too. Gamma is the main effect plus the interaction times the interaction predictor. Yeah, does that make sense? It's not a parameter in the posterior distribution, but beta WS is. And in the previous model, beta AR was a parameter inside the posterior distribution. Yeah, makes sense? Okay. All right, let's move on to plotting. Since there are only three levels of these predictor variables in this model, it's convenient to plot them in a classic artistic form called the triptych. Triptych is a pleasing way to be photographs and other things. There's one of my favorite triptychs from American history. There's one of the conspirators to assassinate Abraham Lincoln, looking extremely smug in a series of three photographs that were taken shortly before he was hanged. And yes, welcome to Friday morning. I'm now naturally more diversified. Can't help it. It's my Scottishness. So the slope, but triptychs are really good for picturing interactions because you can plot an extreme low value of one of the predictors, a middle value, and then a high value and look at the change and get an interpretive feel of the change in the interactions. There's nothing constraining you only three panels, but I assert that three is the minimum. Four is good too, but three is the minimum. Two's not enough, but you want to get a good sense of what's going on in the interaction effect. So think about these assassination of Abraham Lincoln when you think about the interaction effects in a sense. So we're gonna make a triptych for this model. First, let's think about the no interaction model just we could get a sense of what the triptych looks like. In this case, what am I plotting here? So each of these panels I'm plotting shade. This is the centered version of shade, sorry. So I've subtracted the mean. So now it's just the values are minus one, zero and one. It doesn't change the predictions at all. And blooms on the vertical. This is the model without the interaction effect. The first one we looked at and what we're looking at is the cases when water is at its lowest value on the far left in the middle panel is when water is at its central value, the middle value, and then on the right, the highest value of water in the experiment. And you'll notice the slope is the same in all three panels because the model enforces that. The model can't have a different slope. There's no interaction model, so the slope can't change. But what the model does say is that in every case, shade makes the blooms smaller on average. And adding water makes the blooms bigger. You notice that the line is always sloped down, but it gets higher as you go across. Those are the main effects. Does this make sense? But even with models like this, it's nice to plot. I find this so much easier than reading tables and coefficients. You get to see it on the outcome scale, what you're talking about. And also the confidence balance. It's nice to have the bow ties, right? Always have the bow ties up there. Now let's look at the interaction model. The interaction is massive, absolutely massive. Why biology? That's why. And so now, same meaning on the left, lowest water value, shade on the horizontal axis. There's basically no effect of shade when water is at its lowest value. I'll let you hypothesize, that's why that's true. We'll return around to the explanation in a second. Water's at its middle value now, shade starts to hurt. Higher levels of shade make the smaller blooms. And then at the highest water level, shade is the most damaging, right? So what's going on here? Well, again, I chose this example because it's not mysterious. I think most of you have grown plants in your life. Yeah, if not, I encourage it. It's good to bring life. Create life with it, right? And at the lowest water level, it doesn't matter if the plant has light or not because it can't grow. It can't do anything with the light. It doesn't have enough water to do anything, yeah? As you add water, it's got more raw material that it can do photosynthesis with. And so taking light away has a bigger decrement on the size of blooms. And that's where this relationship comes from. That's the nature of the interaction effect. Does this make sense? Yeah. Cool thing about linear interaction effects is that they're symmetrical. There's a section in the chapter that I encourage you to read where I talk about this in some more depth. I'm gonna spend exactly one slide on it here. I apologize for that, but I had to take some stuff out. What symmetrical means is we can reverse the way we say the interaction effect. So there are two ways to talk about this interaction effect. The first is the way we've done it so far is that the effect of shade depends upon water. And that's this slide, right? Where we're changing water and we're saying what's the effect of shade on the outcome given a water level, or it's conditional on the water level, what is the slope between shade and the outcome? But it's symmetric in the same model where the same parameters can be spoken in the opposite direction. It's just this thing about natural language and math. They're not the same thing. So let's reverse it. So at the bottom here, I've got the same triptych that was on the previous slide. At the top now, I've got it in the symmetric direction. It's the same model, but I've just flipped once on the axis and went on varying across the panels. So now the panels are different levels of shade from left to right, low shade to high shade, right? Otherwise known as highlight to low light. Yeah, this is here to make you think, right? I could have done it highlight to low light. That would have been kind, but certainly an awful amount of friction in teaching, in my experience, right? You have to keep the audience's attention. That burning sensation is learning, right? So, and then we have water on the horizontal. And now it's the same model again. So you'll see an eerie similarity in these things. There's a mirror image that's going on here, but it feels a little different when you say it. So the bottom is that the effect of shade becomes more negative, the more water you add. On the top now, what is going on? Well, the effect of water is closer to zero, the more shade you add, right? So water helps more when there's very little shade. Does that make sense? Of course it's the same thing, because the model assumes it's a symmetric interaction. It can't tell the difference between the two. But sometimes when we say it out loud, it feels differently to us in our strange primate brains interpreting it differently. But algebra don't care. The algebra, it's all the same. Does this make sense? So it's often useful when you're plotting these models to plot it both ways. It may help you. That's something that's mysterious. In this case, it's perfectly symmetric because you've got three levels of both predictive variables. But you're likely to have data where it's not perfectly balanced like this. And then in those cases, the symmetry can help you learn, because it may just be way more intuitive one way than the other. And in the text that I said, I do this flipping for the Africa ruggedness example. And in that case, I encourage you to read it. I assert that it does buzz your brain a little bit differently to say that the effective ruggedness depends upon Africa versus the effective Africa depends upon ruggedness. Yeah? Feels different now, doesn't it? Exactly. This is the algebra. Don't care. But your brain does. So it's worth trying it out both ways. OK. I'm doing great on top. This is fantastic. So interactions are not always linear. These are the simplest kinds of interaction effects and they're linear. But if you know a lot about the topic, there's no reason to use the linear interaction form. You can have some different functional form. And in this case, it's obviously not really satisfactory to think about these interactions as being linear over the whole space. Why? Because there's a floor effect here, where eventually the plant is dead. And then the blooms don't get any smaller. Right? That's what we call a floor effect in statistics. So for example, suppose all the tulip data is collected under cool temperatures. Under hot temperature, tulips don't bloom. This is the famous thing about tulips, right? This is why they grow well in the Netherlands, where it's never hot. Yeah? It's not really cold either. It's just kind of crumbling all the time, sorry. Again, I'm going to get angry emails from Dutch colleagues. But no, the Netherlands is wonderful. It is. No angry email, please. But this thing, they're winter blooming flowers. There are lots of winter blooming flowers in Europe and Asia Minor. And tulips are a very successful group of them. And so if we increase the greenhouse temperature too high, it won't make any, it won't matter how much water or light you give them, they won't bloom. They just won't. That's not a linear interaction, right? It's like an on-off switch. And you need a different kind of model to capture that, describe it adequately. Does that make sense? And of course, we can build those models, and by the end of this course, I think all of you will be able to build that model. That's one of my teaching goals, is to help you build functional relationships like that. Okay, does this make sense to them? Yeah. This is the kind of thing, by the way, if in your science you realize something like this, but you're not sure how to make it into algebra, you should come see me. That's why they pay me, is to turn your language into math. That's what I do for a living. So there are higher order interactions, and you can build them by just keep multiplying predictors together. There are some hazards here that I wanna worry about. So the homework I want you guys to work with today is a case where there are some potential higher order interactions. So let me give you an idea, if you're so inclined to use them, what they look like. So suppose we have three predictor variables, and it's possible that they all interact with one another. That is, the effect of each depends upon not only the other two, but how the other two interact, right? The combined effect of the other two. It's a higher order interaction. I'll give you some plain language examples of this in a second. So how do you build these? Well, this is an example. So with three predictors that interact with one another, you end up with three main effects terms. Yeah? None of which is by itself the slope between any of those in the outcome because the slope will depend upon all the parameters in this equation. It's gonna be a mess. You've got three two-way interactions, which are like our watershed interaction. And you've got this really confusing three-way interaction, which is the extent to which any one of these depends upon the product of the other two. Good times, right? Yeah. So this can be a very useful device because in principle, conditionality can go quite deep in these situations. Yeah? So higher order interactions are very hard to think with though. So sometimes you need them, but they're very hard to interpret. So the extent to which the effective X1 depends upon X2, depends upon value of X3. Dude, I don't know if anybody does this movie anymore. Sorry, this is just a feature of my age. But there will be jokes that no one gets because yeah, anyway, none of them about me. So they're also hard to estimate. You often need a lot of data and you have to sometimes aggressively regularize your estimates in order to get stable estimates of these things. But sometimes you really need them. So I'm not saying don't use them, but you shouldn't use them flippantly. That said, for homework for next week, I really want you to work with a data set where there are credible three-way interactions that give you an idea of why sometimes you need them. This is data built into the rethinking package called Wines 2012. This is the actual data from what's called the judgment of Princeton, which is held in 2012 in, you guessed it, Princeton, New Jersey. New Jersey now grows wine. You don't think of wine when you think of New Jersey. You may have rich stereotypes about New Jersey, which come from the show Jersey Shore, if nothing else. But you probably don't think of nice wine. However, New Jersey does produce some very good red wines. And pictured on the right. So some of you may know, there was this famous contest where French judges went to California. Back in California, it was starting its wine growing industry and they discovered that French judges prefer California reds. This is a famous thing, movies were made about it, right? The French judges thought that they were cheated afterwards and all kinds of great stuff. But now you think California, you think of good wine probably. So New Jersey orchestrated the same thing. And they had this judgment of Princeton in 2012. And I give you the full data set, the raw scores from the blind tasting by the various judges and some various other things. So the outcome variable here to explain is the score, the subjective rating each judge gave to an individual wine, right? Here are the predictor variables that you have. You have the region, the grapes were grown in and that the wine was produced in. There are two, New Jersey and France. There had been some low figure, right? France is not amorphous and neither was New Jersey actually. Southern New Jersey and Northern New Jersey are different worlds. But that aside, we were gonna lump them into New Jersey and France. There's a nationality of the judge, which you might think influences scores. Yeah. USA and French and Belgium lumped together France and Belgium. Again, apologies, that's doing some violence to reality, but we'll lump them together. And then the flight, whether it was red wines that were being compared or white wines that were being compared. You probably don't wanna judge those against one another, right, or different. All white wines are bad, that's my opinion. All right, again, angry emails. It is the angry email thing, I should really. So, your predictions, predictors are region, nationally, judge and flight. Let's consider some interactions to think about why sometimes higher order interactions make sense. This is a domain you might, even if you don't like wine, which is a reasonable thing. You probably have some intuitions about this. So, there'll be an interaction of region and judge. And we might call that bias, right? Judges from certain regions, prefer wines from those regions. That might be a hypothesis. That's a hypothesis that I had about these data, one of the reasons that I cultivated them. That's a two-way interaction, right? Score depends upon the interaction between region and judge, yeah? That interaction itself may depend upon the flight, right, whether it's red wines or white wines. Right, the bias may be of different size, depending on whether it's red wines or white wines. So, the bias may be of different size, depending on whether it's red wines or white wines. Yeah, the interaction of judge and flight, we might call that preference, right? Some judges like red wines, they don't like white wines. Right, I like red wines, I don't like white wines, for example, not a wine judge, but for example, we could call that a preference. The preference may depend upon region. It may be that New Jersey judges are very biased towards white wines, for example. The interaction of region and flight, we might call comparative advantage. Some types of grapes are better grown in New Jersey, maybe. And some other types are better grown in France, and that'll affect the scores in particular flights. But, that comparative advantage will depend upon the judge, and what the judge likes, and what they're used to. Yeah, does this make sense? So these are three-way interactions, where some two-way interaction depends upon the third variable, and that's where three-way interactions arise from. Does this make sense? So, for your homework, I would like you to play with these data. This is the loosest homework I've given you so far. But really, I just want to see, this is always a point in the course where you have become highly skilled applied statisticians, and I just want to give you some data, and not give you any hints about what's going on in it, but just tell you, explain this outcome variable using these predictors, and then I want to see what you do. So really, I'm not expecting you to use any particular thing, except, well, regression. I'm expecting regressions. They're probably some interaction effects. But I want you to figure out what explains the scores in these data. So, yeah, as I summarize here, answer the question, what predicts the score? And you've got three predictor variables to work with. They haven't made a movie about the judgment of Princeton yet, but it may be coming, yeah. Okay, next week, I believe next week we're back downstairs. Yeah, I think that's right. I think we're back downstairs next week. So on Wednesday, we'll be downstairs. And we're going to start, we're going to do chapters eight and nine, and start 10. What this means is Markov chains. We're going to do Markov chains. And this is precess to fit generalized linear models, which are not necessarily safe to fish with map estimation, because the posterior distribution is routinely not Gaussian, even when all the priors are Gaussian. And then we're going to march on work to DLNVMS. So you need Markov chains. So part of your homework is also to install STAN. Go to MCSTAN.org. I gave you some instructions up here. You're going to install the R-STAN interface. Your steps are, you get a C++ compiler, then step two is a bit hazy, and then step three is profit. Sorry, this internet meme, it's probably been a bit old. All my memes are very dank, very, very dank. But this is the important thing to get going because you're going to need STAN. As soon as you get STAN installed, you're going to be specifying models exactly as you've been doing them so far. But now they'll be run by very smooth Markov chains. And we can move beyond this world of perfectly Gaussian hyperspheres into arbitrarily shaped hyperspheres. They won't even be spheres anymore. So, by the way, STAN is not an acronym. It's named after Stanislaw Ulam, one of the people who made the most important mathematical contributions to what we now call Markov chain on the Carlo. Okay, with that, thank you for your indulgence, and I will see you next week.