 Good morning, everyone. So I desperately wanted to finish the SEBA's collider from last week, but I don't think I really have time. So please look in the notes and read about SEBA's collider. It's not about SEBA's hitting one another, which they do. But I think you'll find it a satisfying and mysterious phenomenon in the data. Instead, I need to pick up with some new material and get into how to build more interesting functions with linear models. Let me ease into this, though, by drawing back a bit. So there are things in the universe which really are dichotomous. They either happen or they don't. This is benign. This is great. And so you can develop tests for these things that work really nicely. So many of you have already guessed by now what this is. This is a color blindness test for children. It also works on adults, right? So I ask you, you don't have to answer. I ask you, how many animals do you see? This is what you ask. And this single chart can test for all the major forms of congenital color blindness, most of which, as the biologists in the room know, are found in males and not in females because of the way sex chromosomes are inherited. But this is an amazing thing, right? So I don't think I'm color blind, and I see a number of things. There's a squirrel down here, is that a squirrel? And purple squirrel and a red hair and a blue something. Fox. Fox, that's a fox. Okay. Purple cow, is that a cow? Yeah. Yeah, okay, maybe I am color blind, but there are animals here. Yeah, there's a giraffe, something overlap, right? So, but there's a nice thing about color blindness is that there really are, it's a dichotomous phenomenon. So you can administer a test like this and figure out very quickly whether someone is color blind. Much better than asking them to name colors and stuff because you can be bad at that and not be color blind, right? Some of us don't have a very rich color vocabulary, but we can still see colors. I think most of what we do in statistics is unfortunately not like this. The evidence is not dichotomous. It's continuous, it's probabilistic. The fact that we're doing statistics means we're outside of the realm of something like this where it is or it isn't there. We're talking about some continuous phenomenon and some continuous measure of evidence for that phenomenon or not. And this is why there are no tests in this course because I don't think we want statistical procedures that do tests. When you do that, you're making a decision way too early in the stream of information that justifies making a decision. Statistical evidence is not dichotomous. The phenomenon that we study as scientists are not dichotomous. They're continuous and are complicated and multi-dimensional. They're not like color blindness. So this is why I say in one of my many slogans, you should stop testing and start thinking. This is not a Bayesian thing. There are also Bayesians to test. I don't like that either. I'll let Bayes factor stuff. That can go, right? So let me try to be more constructive about this. There are a bunch of off-the-shelf tools that have value. But eventually in your research, you have to do something better than off-the-shelf. Off-the-shelf clothes don't fit, right? This is why now in modern societies for people with middle incomes, none of our clothes fit properly because we don't have personal tailors. But if you can afford a personal tailor or you buy a really nice suit or dress sometime, then you get it adjusted, right? And then it becomes bespoke, which is one of my favorite English words. Some people in my department know this. I like this word. It's a bit old-fashioned now, but at tailors it's still used. You have a bespoke suit. It's adjusted to you. And we can talk about bespoke models and bespoke risk analyses. And by the time you get to a point where you're going to decide whether some phenomenon exists or not or how to take action to intervene in the world based upon the evidence you've collected, your analysis needs to be bespoke to the problem at hand. It can't be off the shelf 5% significance threshold. That's not going to cut it. That's ethically irresponsible to do that. So we want bespoke analyses, be bespoke, not broke. It's a long walk to that point. OK, let me give you an example. We've got a good frost on the ground here in East Germany this morning. Another part in North America, I think they're getting blizzards now. They're just getting pumbled. This happens periodically. And several years ago now, four years ago in 2015, this very month, January of 2015, there was a prediction for a catastrophic blizzard in hit New York City, New York City in New Jersey. But it didn't happen. They shut down the city, which they almost never do in New York City. They almost never shut it down. They're like, it's going to rain hellfire. And it's like, OK, well, the L train's got to run. We've got to get to work. And but they shut it down. Mayor Bill de Blasio took the unprecedented step of shutting down the city. And then the blizzard didn't come. And everybody was mad, really, really mad. But it was the right thing to do, almost certainly the right thing to do. Let me walk you through what a real decision of bespoke analysis looks like in this case, which might make the population unhappy, but at least they're not dead. So why did they shut it down? They relied upon a particular forecast from the European Center for Medium Range Weather Forecasting, ECMWF, here. There are lots of big data groups doing meteorology around the world. ECMWF is only one, although it's probably the world leader. They claim they are, and they actually probably are. Their forecast was way more extreme than everybody else's for this particular blizzard in 2015. The ECMWF thought it was going to deliver many times more snowfall and higher winds than everybody else did. So what do you do if you're a responsible public servant in this circumstance? Do you say, oh, that's an outlier forecast. We should ignore that. Or do you say you care about human lives and you rely upon the extreme forecast? What everybody decided to do there was the latter. And this is this issue where it's not a, accuracy always matters, but it's not the only thing that matters. When you're trying to make a decision after you've calibrated for the evidence, you have to take into account the asymmetries of costs and benefits. And that's what meteorologists do when they're ordering to shut down a city because of a blizzard or ordering evacuation because of a hurricane, all the stuff that doesn't really happen so much in this part of Europe, right? It's pretty benign here. It's a little chilly and then we complain. But it's not that bad. Eastern North America gets pretty bad. So the fact that other models ended up being more accurate, it was almost certainly the right thing to do to respond to the extreme forecast because even though it was a tail probability it was gonna be that bad, the loss of human life could have been really significant. So it's the right thing to do. This is what I mean by a bespoke analysis is that there's evidence which is what we get out of statistical models to make predictions. But the accuracy of the prediction doesn't tell you how to make a decision because you might have to plan for the extreme events because otherwise you kill a bunch of people, right? Now all of us in here are probably, I shouldn't say all of us, but most of us do basic research that doesn't actually intervene very much in the world. So we don't want to make decisions. We don't want to dichotomize. Is the blizzard bad or not? We just want to reliably transmit all the uncertainty in the data and model combination to our colleagues. And that's why we're not testing or summarizing. Does that make sense? This has been your first sermon of the week. There will be others. Let's switch to something completely different. I think you'll see the connection unfold. This is a manatee, right? It's related to an elephant. It's a very interesting aquatic mammal. Gentle vegetarian mermaid, right? Swims around, eats plants off the bottom of shallow waters. The only natural predator they really have is the speedboat. And you can see the bite marks of a speedboat on the back of this particular manatee here. This is quite common in manatees in Florida, Southern United States, where lots of people own speedboats and drive them irresponsibly. And since the waters are shallow, often there are these collisions with manatees. This is probably more manatees than not after a certain age have these sorts of scars than others. As a result, the state of Florida has gone through all sorts of movements to try and reduce this damage, including mandating cages around the rotor. So the rotor is what you call a propeller if it was on a plane, the rotor on a boat. You put a cage on it. And then the cage hits the manatee and not the blade, which is certainly, I would imagine is less painful. I wouldn't want to be hit by either, but I imagine it's less painful. It turns out that has not helped at all. For a very interesting reason, statistically, and that is because rotors mainly don't kill manatees and that's why you see manatees with rotor scars. You know what kills manatees? The keel of the boat is what kills a manatee. The keel is most of the mass, very heavy. And if a manatee gets hit by the keel, it usually dies from internal damage. The reason you see manatees with rotor scars is because while it certainly hurts, I'm willing to guess, it doesn't kill them. They're the survivors. It's the other kind of damage. The other part of the boat gets you. You're in real trouble and then you don't show up in the sample. Happy Monday. Famous example in statistics. It has the same information structure. Is this example from World War II bombers? This is the cover of a model kit you could make, but this is a famous British bomber from World War II. The Armstrong Whitworth Whitley, Mark V. And I think what is now a Jaguar plant in Whitley made these during World War II. And these things initially dropped leaflets on Germany in later bombs, including here. I think these are the British bombers that dropped bombs here on Leipzig. And they were a war correspondent. They did tremendous amount. They were highly customizable and serviceable and really an amazing design. There was this thing as the war dragged on and the war did drag on. Those of you who know your history is that metal started to get in short supply. They were recycling everything. And there was this issue of trying to up armor the bombers because the anti-aircraft buyer was taking out a large number of them. And so there was a statistician named Abraham Wald. There's a box in the chapter to give you some more information about Wald. He was a real statistic superhero and did a lot of really important things in his short life. One of the things he did was the Royal Air Force asked him to use using the damage patterns on the bombers that were available to them to figure out where they should put their scarce armor. So they didn't have enough. There are two things about armoring planes. The first is you don't have enough metal to armor the whole plane. The second is if you did that they couldn't fly, right? Because there's a weight limitation. You got to carry bombs. That's the main thing. Bombs and fuel, they're very, very heavy. So you can't just cover the whole plane in armor. You have to be selective. So they asked Wald, where should we put the armor? What's the most crucial place? And look, we've got this large sample of bombers for you and we've mapped out where every piece of shrapnel hit it. Go to work. And the intuition was that you put the armor where the damage was on these planes. And that's exactly the opposite of what Wald recommended. And he had this really nice proof of why that was the best thing to do. And it involves conditioning on a collider. Yeah, you knew it was coming, right? But what's the issue? Well, these are the survivors. They're like the manatees with the rotor scars on them. The interesting thing Wald noticed about all the bombers that made it back is none of them had bullet holes in the cockpit, right? So, or the engines weren't torn to shreds. They had damage in the wings. And it turns out wings can be all full of holes and still their plane can fly. It's an amazing thing about airplanes. And so he recommended up-armoring the parts that were least damaged in the planes that came back because the planes that didn't come back probably received damage in those exact places, right? Just like the manatees, not just like the manatees. You know what I'm saying. There's a connection here, yeah? So, we're conditioning on a collider. The variable that we conditioned on, or nature has conditioned on for us is survival. Your sample are the survivors and their selection bias that arises from this. And there are two kinds of damage in these stories. You've got rotor wing damage, which is actually not that bad. You'd rather be undamaged, but if you had to be damaged, you'd choose that damage, right? And then there's the other kind of damage, which is lethal, with very high probability. And once you've conditioned on survival, it opens this path. And when you see the other, you're actually getting information about the other kind of damage. You see rotor, and then rotor damage ends up correlated with the lethal damage through this path by conditioning on the collider. It's a confound. Wald realized this because he drew a diary. He figured out that there's a causal structure to this. And now, I don't think the language conditioning on a collider existed back then. But Wald has this analysis that has survived him that goes through this logic, this causal logic. This is a very common sort of issue. When there are multiple things that can affect some status, we have to be really careful about selection bias. The ways that these variables combine to produce observations. So what we're gonna do today, we're not gonna work on manatees and bombers, but we're gonna focus on conditioning and the ways in which multiple variables interact with one another to produce conditional outcomes. We need to grow up our models a bit to deal with more complicated ways that different statuses interact in nature. So background, I want you to keep in mind is that, of course, everything we do in statistics is about conditioning. Conditioning means something is dependent on the state of the system. And everything is conditional in our analysis. Our inferences are conditional on the data, on the model, on our state of information. And what we're gonna work on today is the idea that the influence of some variable in your analysis could be conditional on the other variables. Yeah, what we're gonna learn from it is conditional on the value of the other variables. And we wanna build that in because nature's like that. It's not all additive effects where the effect isn't conditional on all the other effects. And I wanna show you the simplest way to deal with this and give you a couple of example analyses to work with. And later in the course, I'm gonna have some fancier examples where things interact in more interesting and natural ways, but we'll stick with unnatural and easy to understand stuff today. Okay, so this topic is usually called interaction effects, but really it's just about conditionality. The idea that the influence of some predictor variable is conditional on the other predictors. Let me give you some natural examples that I think you'll understand. The influence of sugar in your coffee depends upon whether you stir the coffee after you add the sugar. Yeah, I know there are tea drinkers out there, whatever, your barbarians, but I imagine the same holds for tea. Yeah, so what happens if you don't stir? Well, you get a lump of sugar at the bottom of your drink. It doesn't dissolve well, and then it'll be very sweet at the bottom, but it won't be so sweet up to that point. Yeah, so if tea behaves like coffee, this is what will happen. Of course, you wait long enough and entropy means it diffuses, but you're gonna drink your coffee faster than that. Yeah, so if you stir it, the amount of sugar you add to the coffee has a much, much larger effect on the sweetness of the drink. The influence of a gene on your phenotype depends upon the environment, right? In some environments, the gene will have no effect, it'll create no variation among individuals in other environments, it has a huge effect. Influence of skin color on cancer depends on latitude, right, someone with my creamy fish belly skin color at a mid-latitude is in big trouble, right? But in Scotland, I am vibrant, right? I make all the vitamin D, I should keep selling. And when we get to statistical models, you'll see how these sorts of things necessarily happen in many natural phenomena that these kinds of interactions always happen. They're not just some statistical complication that statisticians add to annoy you. We do do things like that, but this is not one of them. They arise whenever you've got a boundary in the outcome space, you're going to get interactions. It has to happen. And so, I think actually next week we'll get to generalized linear models and you'll see in every generalized linear model, which includes a count model, if you're modeling something that has been counted, then there are necessarily interactions. Has to be true because there's a boundary. Once you get to zero, it's zero. A thing cannot die twice, think of it that way, right? Okay, and multi-level models are really just massive interaction engines. And the sorts of models you'll see today, when we make them in the multi-level models, we just have to add one or two lines to them and they're already set up. Multi-level models really are just big interaction models. Okay, let me show you what's an interaction look like in a DAG. You've already seen it. It doesn't look special at all. In a directory, they say like graph, an interaction of just two arrows entering a variable. This might be an interaction, it might not. The reason is because DAGs are totally heuristic. All that DAG sugar and stirred entering coffee sweet, all that means is that the sweetness of your coffee is some function of sugar and stirred. It's your job to figure that out with a statistical model, what that function F is. So this is the thing that DAGs are not complete representations, they're not enough to make accurate predictions. Their tool is just to help you understand confound risk and figure out a deconfounding strategy if one exists. But you also need statistical muscle to get a good estimate of the functional relationships among variables before you make an intervention in the world. Otherwise, you could do things really, really wrong. Just adding sugar to people's coffee will not necessarily make their coffee very sweet. Some stirring is good advice as well. So very quickly, what's the difference we're talking about when we start making models with interactions in them? On the left, I've got a hypothetical and completely ridiculous but hypothetical non-interacting example where between sugar and stirring on your coffee. This would be the sort of models we've looked at so far, could model what's on the left, and that's all they could model, where there are independent additive terms for adding sugar and stirring. When you stir, it makes things sweeter by some incremental amount. And when you add sugar, every unit of sugar makes it sweeter by some incremental amount. And that's all the linear models we've looked at so far can do. Is this ridiculous, right? No, this is wrong, this is not how it is. Cause obviously if there's no sugar added, stirring does nothing. Just cools your coffee off, right? That's all it does, waste seed. But if there's sugar in there, it'll make it sweeter. So they interact, necessarily. And so, on the right is my completely fanciful version of this, but it's one that obeys that requirement. Is that we know something about the phenomenon that gives us, that constrains the functional relationships we'd consider. Okay, let's shift into a data example now. So, the example I'm gonna show you is about the economics of African nations. And whenever I start lecturing on Africa, as an African studies graduate student at UCLA, I always wanna show this map. Africa's really big, yo. Really, really big. And this Mercator projection thing that people use really undersells how big it is. It's really, really big, okay? And there's a lot of diversity of economies and environments and histories all over this continent. And trying to understand that variation consumes a lot of interest because it creates a bunch of quasi experiments, especially through interactions with the rest of the world to help us understand economies and institutions. There's a, the example I'm gonna show you comes from an economist at Harvard named Nathan Nunn. He's an economic historian of sorts. I guess maybe he'd agree with that description. And he does a bunch of analyses mainly focusing on Africa to understand the impact of the past and colonial experience on contemporary African economies. It's really interesting research. So, he puts all his data on his website, so that's where this comes from. And the phenomenon here that's really interesting is that there's a feature of the terrain called ruggedness, which is bad everywhere except Africa. You know what I mean, bad, bad for the economy. Rugged terrain is bad because it makes it hard to move things, right? So, this is why lots of countries invest in plowing hills into valleys, just making things flat, right? And then it's easy to move stuff. Then you can move your goods around more easily. And so, outside of Africa, shown on the right, if we fit a regression line between terrain, this is standardized, well, not standardized, they normalize, so it's between zero and one, whereas zero is the minimum ruggedness that's perfectly flat. Think of a place that's perfectly flat. I don't know where that would be. But, and one, which is actually an African country is one, Lesotho, the world's most rugged place. I don't know if anybody in here has been there, but it's beautiful and incredibly rugged. And so, in Europe, you think of Switzerland. Many people here will have been to Switzerland, right? It's kind of rugged. Now, they have tunnels and things, but still it's pretty rugged. And there's a really strong negative relationship, and you're thinking like, oh, but we gotta drop to a jeep stand and stuff. You can do that. It's still, in that cloud on the left, ruggedness is bad. Even a tiny increment of ruggedness is correlated with a decline in GDP. It's an interesting phenomenon. But in Africa, the relationship goes the other direction. And the more rugged countries have better contemporary economies than the less rugged ones. So there's something to understand here about institutions and how economies develop. And by the way, you can drop the outliers, and this is something you might wanna do in homework. Maybe I'll assign this as homework, and it doesn't remove this problem. It does diminish it. It's true. Seychelles is definitely an outlier. It's basically a tourist resort, right? But dropping that does not remove the difference between the continents. There's still a difference. So what's going on? Okay, the first thing is the required sermon on priors. When I work this example in the chapter, you're gonna have to go through the sermon on the prior. What is the sermon on the prior? We wanna develop priors for these models that constrain pre-data the outcomes to the possible outcome space. And so I wanna show you how easy it is to do badly. So for example, so the way I've standardized these variables is I've taken ruggedness and I've scaled it between zero and one because the absolute measurement scale is not something we really understand, but that's the whole world is between zero and one and ruggedness right there. And then I've taken log GDP and I've scaled it as a proportion of the average. So one is the average country in the world. And then if you're above one, you have that multiple above as a proportion. So 1.5 is 50% more. And if you're 0.5, you're half of the average country. Does that make sense? And then you don't have to be an economist to think about the possible range and how big it is, right? Think about doubling in economy, that's a huge effect. That would be absolutely a giant effect. So, and then the dashed lines, that's the whole world's GDPs basically, yeah. California is at the top. And so if we simulate priors, we can set an intercept prior so it's centered at one, that makes sense. Why centered at one? Because that's the average GDP, yeah. And then we just take a guess as standard deviation of one and now for a slope on ruggedness, the B parameter, we set it on zero because we wanna let the data tell us if it's a positive or negative effect. And we have to put some regularizing standard deviation on it, let's just try one. We simulate priors from this, we get chaos, right? We get impossibly strong relationships. You can't trust your intuitions on this. It's just too hard to figure out if it's reasonable or not. These priors need to be tighter to be realistic. We're not gonna constrain things too much here, we just wanna get it into the realm of possibility. And that's what I show you on the right. After you constrain the intercept down to be quite tight and beta has to be pretty tight and at least you can stay within the world's possible economies, right? So, let me walk you through the models that don't work to recover this relationship. The analysis I showed you in these two plots on this slide come from splitting the data. I take the data set and I split it into two data sets, African countries, non-African countries and I run two linear regressions on each and then I make those plots. This is cheating. Why is it cheating? Because now you have no statistical criterion upon which to evaluate the split, right? You wanna measure the contrast in the slope between African and non-African countries. And to do that, you've gotta estimate both of these lines in the same model. And that's what interaction effects lets you do, right? You shouldn't split your data set. You should let the model split it and tell you how credible the split is. That's the way to do this. So, let me show you techniques that don't work as a way of highlighting what interactions do. The first thing to try is just adding a categorical variable for Africa. This doesn't work, but it does do something for us. So, remember what this is, a categorical variable. I'm gonna create an index variable, which is the continent ID. Think number one means Africa, number two means not Africa. You could do it for each continent too, but I just wanna make this simple for now. And then we adjust our alpha. We have a different alpha for every continent. Well, every continent. For Africa, not Africa. And we run this model. This model structure makes sense. Yeah? Now remember, why would I do it this way is that then I can assign the same prior to alpha to both continents. Well, Africa and then all the other continents. Yeah. And that's nice because I don't wanna assume that I don't want that dummy variable issue that we had before. And then we run this model and we get the graph on the right. The slopes are still the same. Why? Because this model forces them to be the same. It's just one B parameter and it's the same for all the continents. But the intercepts have changed. And yes, African economies are depressed relative to non-African economies on average. Yeah, but it doesn't show you the switch. This model forces the relationship between terrain, ruggedness, and continent to be the same everywhere. So we gotta do something else. This is what the interaction model looks like. You just add the index variable to the slope. Now we're gonna have a different slope for every continent. That's all there is to it. Congratulations, you've made an interaction model. That's it. Just index everything, right? Make it conditional. If you want the slope to be conditional on continent, just make a different slope for every continent. That's really all there is to it. It's pretty straightforward. What is RI's ruggedness? R bar is the mean ruggedness. Remember, I've been centering these predictors so that it makes the intercept easy to think about. And that's the only trick here. And then you have your linear regression with an interaction effect in it. In the code, you just write it exactly this way. You just bracket on continent ID and also in the prior. Now, we have the same prior slope for each continent, the same prior intercept for each. The model figures out an intercept and a slope unique to each continent. And here are the marginal posterior distributions for this model down at the bottom. So one, the index of one means Africa. The index of two means not Africa. So you can see the intercept difference, right? Average GDP is lower. At the mean ruggedness, this is how you read A bracket one, at the mean ruggedness in the world, an average country has 90% of the average GDP in the total sample. That makes sense? At the average ruggedness, right? But then how does GDP change as you move away from average ruggedness? Now you need the slope. And that slope is positive. For African countries, that's B1.13, right? With a pedavidly interval above zero. And B2 is negative. In fact, these are like mirror images of one another, these two slope estimates. One's negative and the other's positive. Basically the same magnitude. What's going on here? Of course you need the plot to really see what's going on. So now these plots look a lot like the previous ones, but they're from the same model. Inside the model, we've got statistical support for the split now between the two. And it looks the same and I've labeled more countries. You can see what's going on with ruggedness. The slope on the left is less certain. You notice the compatibility bow tie, we can call it that, very scientific term, the compatibility bow tie on that regression line is bigger in Africa. Can you imagine why? The answer is there's less data in Africa. Fewer countries in Africa than outside of Africa. So we get a more precise estimate for the non-African because we pulled them all together and we get more power that way. But that's all that's going on. Does it make sense? Is this good? Yeah? Feel good? I know you guys are like, yeah, we know this. It's good to have refreshers though, if you do know this. I think interpreting interactions is really hard and you nearly always need to plot them to do it. This is a pretty simple example because we've got two categories and we're just looking at two different slopes. The things can get more complicated very fast, especially if you don't center the variables, which is common in a lot of things. And then you try to read a table of coefficients so you can get confused really, really quickly. So you need to plot to understand interaction effects. The main thing that's going on that makes it complicated is that now the impact, whenever you have an interaction, the impact of a change in one predictor depends upon more than one parameter. I'll say it again, whenever you have an interaction, the impact of changing one of the predictors depends upon more than one parameter. So you can't look at a single row in a summary table like this and guess what the effect is of changing things. You need them all. You need the whole posterior suddenly and you can't do that from a table. You can't figure out what's going on. And so this is why plotting is so essential. Plotting and calculating contrasts after the fact. Okay. Last thing I wanna say about the terrain-regonist example before moving on to a more elaborate example is that this is weird thing about interactions is that they're cognitively symmetric. Cognitively, there's a causal symmetry to them. There are two ways to say them and within the data you can't tell the difference and it's only information outside of the data set itself that will tell you. It's the scientific knowledge you have about what can change. So let me try to walk you through this one for a second and I'll do the same thing with the second example that we'll get to. So the way I've been thinking about this is that the effect of ruggedness on a nation's GDP depends upon which continent it's on. Yeah, does that make sense? And that's what this interaction means in plain language in this case. Whether ruggedness hurts a country's economy depends upon whether it's in Africa or not. I know some of you are asking like why would this relationship exist? Read Nun's paper. Probably has to do with slavery, right? History of slavery. But regardless of why the relationship arises, we can, in terms of the model, the model is agnostic about this. When you look at this prediction equation, there's nothing in it that screams for that interpretation. You're bringing in your extra knowledge and what is that extra knowledge? Well, let me try the opposite one on you. Which is also statistically fine. The effect of continent depends upon ruggedness. Right? So the effect of switching a nation's continent depends upon how rugged it is. In the stat model, this is the same, but in your brain it's not because you can't switch a nation's continent, right? So you weren't tempted to that explanation. There's causal information in your brain dying to get out and it is chosen, it has broken this symmetry and chosen an explanation of the interaction. But statistically, it's identical and sometimes it's really useful to plot the effect of the model from this other perspective, even though it's causally impossible, right? We're not gonna pick up Swaziland, right? And move it to Southeast Asia. This is not going to happen. It's an impossible experiment. So we don't think of intervening on the continent variable. We think of intervening on ruggedness and plowing hills into valleys and things like that, right? Making tunnels and so on. But you could plot it out this way and so I give you the code to make this in the text. Now what we're looking at is trained ruggedness on the bottom as before and now the vertical is the expected difference in log GDP between an African nation and a non-African nation. This is taking seriously the idea that we could move a nation to another continent and we're trying to predict the effect of moving it. And then we plot out the result that way and the y-axis is now the expected improvement if we move a nation to Africa. And you'll notice that low ruggedness, you expect to hurt its economy if we move the nation to Africa. But at high ruggedness it should help it. Now of course this is causally ridiculous. You know this is not the right interpretation but the model does not see them as different things. You see them as different things. Does this make sense? Why don't you think about it this way? Yeah. I'll get another example of this too. I would say that the reason this is confusing is because there's information in your head. You are a causal animal, right? Humans leap to causal inferences constantly, often incorrect ones, but you can't stop yourself from doing it and you're doing it with this and you want to interpret the variables in causal ways. The model doesn't know the difference and you have to impose that. You have to police them and make the plots in ways that obey that causal logic. Okay, second example. Let me give you an example where the predictors are continuous. Now of course ruggedness is continuous but continent was categorical. What happens when both predictors are continuous? Well then fun stuff happens. It basically works the same way but it's much harder to think about and you have to get very clever with your plotting now. So let me take you through an experimental example which makes it easier to think with it but this works in observational studies as well the same way. So the data set said we're thinking, it's called tulips. This is greenhouse experiment 27 replicant blooms with three levels of water and shape. Why would data like this exist because tulips are big money? Yeah, people like them, they're nice flowers. And so three variables of interest here. There's also another variable in this data set which is the block, the experimental block and that is something you want to control for but I'm gonna ignore it with permission but you definitely want to consider that because there are correlations in plants that are near one another because of things we haven't measured like aphids. They're always aphids. It's like the rule of life. There's always an aphid. And so at least that you're up, there's always an aphid. And but we're gonna leave with your permission and leave out block and you may want to go back through and add that in for yourself later. So water level and shade level there are three levels of water and three levels of shade experimentally controlled. And then bloom area. And we like big blooms because they sell for more money. We're gonna make two models consider two different models here. Now you know how to make a no interaction model I think. We're gonna center water level and shade level which means that the average or middle water level and shade level is gonna be zero. And then we're gonna have a minus one low level of each and a plus one high level of each. These are discrete treatment values but in principle these are continuous variables. You can add a tiny amount of water you can add a tiny amount of shade. And in a linear regression on the top of course you just have two slopes and then you're two centered predictors there. Makes sense? This is all the stuff you're already sick of from this course, yeah. These geocentric sorts of models. And then an interaction. Now this is the conventional form of an interaction I'm gonna explain this on the next slide. So hang on. Some of, most of you probably already know when you add an interaction they're continuous variables you multiply the two predictors and you add a new coefficient which goes to this. So we add a third term now in this linear model with W, the centered version of water level and the centered version of shade multiplied together and then there's this other coefficient which is their so-called interaction effect. This produces a very confusing model but if you plot this I'm gonna show you it can make a lot of sense. First thing I want you to understand is why this happens. So for years I was confused about why interactions were like this and no one would explain it to me it's just like you do it. It's like a ritual. It's like rosary or something, right? You just do it. And so why is it like this? So we have to ask how is interaction formed, right? And here's the conventional form on the top. We want to understand where this comes from. This comes from exactly the same assumption as in the ruggedness model. We just want to make a slope conditional on the other variable. What does that mean? It means we replace one of these slopes with another linear model. So I've taken the slope for water level. So now the capital W is gonna be the centered version. I just want to make this easier to read for you. And capital S is gonna be the centered version of shade and capital W be the centered version of water. This makes this easier to read, right? We don't have all these terms. Now I've replaced the beta coefficient in front of water level with a gamma and gamma is not a parameter it's a linear model. Another one, we can have as many as we want. And this linear model now tells us the slope and the slope now has two parameters in it. One is the ordinary slope there in front, beta W and the other is beta WS is the interaction strength. And that parameter measures the marginal effect of changing shade on the impact of water. So we're directly assuming in this model that well exactly how do we say it in plain language, the effect of water depends upon shade. We do that by making the effect of water on blooms depending upon shade, we make a sub-model. And the sub-model is linear because we're still in geometric world here. I mean, a geocentric world. Yeah, does this make some sense? And this is where it comes from. This is literally all it is. Linear interactions come from replacing slopes with linear models. But if you substitute gamma up into mu and expand, you get the conventional form. You get this multiplied term. Does that make sense? I show you in the book that it doesn't matter which one you pick, you can pick shade and do the same thing. You get the same equation. You can do it with both of them simultaneously and you get the same equation. There's a box in the book where I show you how to do this. But this is where it comes from. It comes from literally assuming I want the association of each with the outcome to be dependent upon the other's value. Let me make a linear model of that. So it's like a regression within a regression. Right? Anybody here remember Pimp My Ride? To be, it's like, yo dog, I heard you like linear models. So I put linear models, any linear models? No? Sorry. I'm old. MTV used to have music on it in addition to other things. So, okay. So let's fit this. So here's the no interaction model. I'm gonna fit both of them because I wanna contrast them for you and show you the different predictions they make. So this is our ordinary linear regression, the so-called main effects model. There's a main effect of water treatment, a main effect of shade, priors that came from doing prior predictive simulations. I'll show you what the priors look like in a second. And then we need to plot these. In fact, I wanna show you the prior predictive simulation to justify these priors first, but we've already got the problem right away of doing that, of how to visualize this. My preferred way to plot interactions is to use something called a triptych. This comes from art, right? You know a triptych? A triptych is a picture where you've got, well, three, literally three. There's also diptychs, which is two, but a triptych is more pleasing. It's three, and you have related frames that tell a more, a bigger story. And this is my favorite historical triptych. This conspirator to assassinate Abraham Lincoln before he was put to death, looking incredibly smug and hipster, right? And this is great art, really great photographs. So we're not gonna have as pleasing a demonstration as this, but we're gonna have graphs in triptych form. And what we're gonna do is vary one of the variables as we move across the triptych and see how the relationship within each plot changes. This is a really effective way. There's nothing binding you to only three, but three's the minimum, I think, because you want some central value and some extreme values. So three's the minimum, but you can do 20 if you think that's necessary, but three is sort of the minimum for communication. So here's the triptych for prior predictions for the main effect model. So I'm just showing you that the outcomes stay within the legal bloom range where one is the maximum bloom that's been observed. Zero is zero, right? That's no bloom. It's like a hard threshold. We don't want to predict below zero. Don't want to predict negative flowers. And showing you that these aren't, you know, these are inducing almost no regularization aside from the fact that they're staying mainly within the legal outcome space. They're not very tight at all. In fact, if I was doing this now, I'd make them tighter, but I'd leave this to you to decide. And then what's the black one? So now that we've got these three frames, on the left, we've got shade at minus one in the middle, shade equals zero, and on the right, shade equals one. And then in each one, we're varying water across the bottom. Yeah, and what I'm trying to show you is that within each graph, those black lines come from the same sample from the prior distribution. So I'm showing you that the slope doesn't change. In any particular sample from the prior viewer, the slope is always the same, regardless of the level of shade. The relationship with water is always the same. This is because there's no interaction in this model. Does it make sense? This will change in a second. What do the posterior predictions look like for this model? Here I'm showing you the posterior predictions plotted as these lines. I think this is like 50 samples from the posterior, plotted against the raw data, which are the blue points in each case. Same arrangement in the triptych. On the left, low shade. On the right, high shade. Average in the middle, water varied across the bottom of each. Notice that we're missing the data in each case here. And why are we missing the data? Because the slope is the same in each graph, right? So it knows it needs to adjust the level, but it's doing a pretty bad job of prediction here because the slope is not constant across. There really is an interaction between water and shade, and we all know why, because we've all grown plants, right? They need both. Water has no effect if you have no light. Light has no effect if you have no water. There's necessarily an interaction. There's lots of old and high quality work on this environment. So let's look at the interaction model. Now I've added this multiplication term on the end, water centered, time shade centered, and another parameter, the interaction term. We fit this, we could look at the coefficients, but as I keep saying, you can't look at just VWS, the interaction coefficient and figure out what the effect is of varying either parameter because it depends upon multiple parameters now. All these little bits. And so you've gotta push it out through the prediction part. You gotta push things back out through mu. And this is what the link function in rethinking is there to make easier for you to do, is to evaluate how the model behaves on the scale of, well, its behavior. The parameters are not its behavior, they're components of its behavior. So you need to look at its behavior. So here are the priors again. Now priors for the interaction model. What I just wanna show you is that now the dark line which come from the same sample from the prior, now the slope is not the same. The slope changes across shade levels now, which is what we want. What we want that to be possible. The interaction model prior doesn't assume that's the case. It allows positive interactions, negative interactions of actually kind of absurdly large range, as you can see, all over the place. But it allows interactions, whereas the other model does not. And now we look at the posterior predictions. Again, top is the main effect model from before. Bottom is our new model with interactions. And now you see very pleasingly, the regression lines are tracking the data now. It did its job because the slope is allowed to vary and we see the diminishing effect. So very quickly, think of it this way. On the left, shade is low. That means there's a lot of light, yeah? Okay, so that means water has a big effect. As you add water, you get a lot more growth because there's light there. And so the water can make a big difference, yeah? On the right, there's a lot of shade. There's very little light. Adding water has very little effect because the plant can't do much with the water because it doesn't have enough light. Does it make sense? Yeah, so I like this data set because it's so dull, right? No, so I like flowers and I like dull data sets that are good for teaching. You understand, you have theories about plants and everybody has grown a plant, yeah? Nod, yes? Yeah, maybe not. But if you haven't, shame on your growth plan. But so you have intuitions about these data and that helps because when you work on your real data you wanna use your intuitions to guide the modeling. In that light, let's think about the distinction between how flowers really grow and how they grow in this experiment. In this experiment, we've cut all back doors between shade and water because it's an experiment. We've experimentally set those values at the micro level, in fact, in the greenhouse. This is not the real knowledge we wanna know about how plants grow, say, in your house or your office or in the woods. What's the difference? Well, shade influences the water level because it reduces evaporation. And so shaded flowers can actually grow better than unshaded ones because they can retain more moisture. This is the back door path from shade through water to blooms that would happen in any European forest when you have wild flowers growing. Yeah? It's not clear exactly in any particular environment whether shade is good or bad for a plant because it's gonna interact with the water level. But if you're experimenting and manipulating them then obviously you cut these back door paths and you don't see them. Do you have to think about this when you think about an intervention now? Right? If you're just gonna cut down the woods and let all the flowers be exposed and then they all dehydrate, maybe that's not such a good thing. Now you wouldn't do that, I know. Right? Here in Germany, an army would show up and pick it, flash themselves to the trees. Right? But you have to just think carefully about these things and the fact that experiments cut all the back doors means that sometimes they can't even tell you what an intervention will do. Because in reality the back doors exist. Okay. Interactions are not always linear and this is fun for very good reasons. So imagine for example, that all the tulip data I showed you are collected under cool temperatures. In fact, tulips are in their native range which is like Iran, right? Georgia, they're kind of early winter or late winter flowers. They like cool temperatures. And under hot temperatures, lots of tulips do not actually bloom when it's too hot. And so now there's an interaction with temperature but it's not linear. Blooms go to zero at some threshold temperature above which. So you can find that function but you need to allow the function to be non-linear. But you can measure it in a greenhouse just by varying the temperature, right? But you wanna expect a non-linear effect. Okay. Last, very important thing to say about interactions. There's no reason to stop at interaction between two predictors. Well, sanity might be a good reason to stop. There is a good reason, sanity. But there may be good scientific reasons to keep going into higher order interactions. Third order for example. So let me show you an example of a three way interaction between some abstract variables. We've got some outcome Y. Three predictor variables X1, X2, X3. So this is a big linear model but let me break it apart. The first line of it, of course we've got alpha, our intercept. Then there are the main effect terms, right? A slope times each predictor. And then we have three two way interactions because all of them are possible, right? The first one, the effect of the first one can depend upon the second. The effect of the first one can depend upon the third. The effect of the second can depend upon the third. Three two way interactions. Hang with me, I'm gonna have a real data example coming up so that we can put words on this. But do the abstraction first. And then there's a two way interaction in which the interaction between any two depends upon the third. The extent to which, I know this is good stuff, right? So the extent to which, the first depends upon the second depends upon the third. Yeah, so sanity might stop you already from doing this, but sometimes nature is like this. It really is, and we can come up with examples. I've got an example coming up in a couple slides here. First let me give you the caution. These things are really hard to understand. Really, really hard to understand. If you plot them, maybe you can understand how these models behave, because interactions do go pretty deep in nature. But you need to be cautious and you need to regularize really hard on higher order interactions. Yeah, be skeptical of them. They tend to be small effects. I don't know, does anybody even remember this movie? Sorry, I have all these old jokes and these slides. Okay, people know this movie. Sorry, I'm from Gen X. I think of Winona Ryder, that's my age. I'm Winona Ryder's age, so I think of you an idea of where my jokes come from. But in some data sets, they're just crying out for higher order interactions. You really need to use them and there's no dodging them. But you just need to be cautious. So here's the data set that's in the Rethinking data and I think I'm gonna assign this data set as the homework at the end of this week, actually, and have it be your first Markov chain homework. I'm gonna teach Markov chains on Friday. Called Wines 2012, it's from the so-called Judgment of Princeton, which occurred in Princeton, New Jersey in 2012. So some of you will know, there was this famous wine judgment in California that kicked off the California wine industry when a bunch of Paris judges preferred in blind tasting California wines to French wines. Remember this? There was a whole movie made about this. What was the name of that movie? Someone knows, will tell me afterwards. But in 2012, a bunch of, so New Jersey now grows a bunch of wine and the words wine in New Jersey are not usually used together, right? But there's lots of good wine grown in New Jersey and New Jersey was trying to get some marketing going. So in 2012, they arranged a similar judgment and they did very well in it and I give you the data here to look at. And same thing, New Jersey wines, French judges can't tell the difference between New Jersey wines and French wines. In blind tasting, on blind tasting, they can tell the difference. They have a preference, but in blind taste testing, you can't tell the difference. But for good ones, I think they're bad ones too, but. So the outcome variable in these data is the score, the rating each judge gives to the wine. I don't know if you're, people here have done wine tasting, but you get a flight, right? A flight of wines, there's little wines and you tip each one and then you write down a little score and then you, well, you spit it out and then you rinse and then you do the next one. So you're not completely trashed at the end of the flight, right? But the predictors we have available that plausibly, causally influenced score are the region the wine is from, whether New Jersey or France, the nationality of the judge, which is USA and French or Belgian. There's one Belgian judge. We should distinguish between these things. They are different countries. Okay. And the flight, whether it's reds or whites. So you've got three variables now and all of these things could interact in powerful ways to influence the score. You can have judges that prefer, well, I've got this on the next slide. Here we go. So, higher order interactions that probably really exist and you wanna look for, well, could exist and you wanna look for. Consider each of these. An interaction of region and judge, we might call this bias, that is, judges are biased for wines from the country they grew up in. Why? Because they grew up drinking wines that tasted like that. Yeah? Do you like cannons? Well, maybe you're from a particular place. Yeah? The bias, however, may depend upon the flight, whether it's red or white. Does this make sense? Sorry, those of you who don't drink much wine are like, what is this stuff? But, you know, beer, right? We're in Germany, beer is like this. The same stuff as beer. Yeah? There's Pilsner and then there's all the bad beers. And, I'm gonna get hate mail later, but the interaction of judge and flight is preference. Right? You might have a preference for red or whites and that will vary by judge. That is the mean scores. A judge could give mean score, higher mean scores to red than white. I'm like that. I don't like white wine, I like red wine. Sorry, it's just true. And so I will rate all white wines, even the good ones, lower than red wines. That's just how it goes. But it's just a preference. Preference may depend upon the region you're from. Yeah? Finally, an interaction of region and flight. We might call this comparative advantage. What does that mean? That means some places are better at growing different types of wine. Yeah? New Jersey may be really bad at white wines or vice versa. In which case the average scores could change depending upon the interaction of the flight and the region. But that advantage depends upon the judge because the judges will have their own views about what makes a good red or white wine. Does this make sense? So all this stuff is plausible, causally for very simple reasons and they understand about how tasting works and preferences work and human bias and all those things. So like I said, I think I'll assign this data set to you at the end of this week to play with. It's good fun. And maybe you're a little bit interested in the outcome. I already told you the outcome. The New Jersey wines do fine. But you wanna see what's going on there. There are judge effects, I think, in these data. So before I can assign this data set to you though, I need to give you an introduction to Markov chain so we can estimate all the models that'll be coming up in the rest of the course. And when you come back on Friday, I'll leave wine behind and we'll pick up with these crazy things called Markov chains and then we'll loop back to the wine data set at the very end of it. Thanks for your time and I'll see you on Friday.