 This week, we'll be light on new tools and heavy on applications and concepts. So let's step back for a moment and think about Waffles. So in North America, there's a very famous chain of waffle restaurants called Waffle House. There's also a Waffle House in Frankfurt, do not go to it, it is a crime against Waffles. But Waffle House in North America is excellent. I highly recommend it if you're visiting. It's mainly located in the southern states. It is always open. This is one of the things about it that's very reliable. It doesn't matter what time of day, it will be there. Indeed, it is so reliable that the United States Federal Emergency Management Agency has used internally something called the Waffle House Index to index how bad a natural disaster is. This was the creation of this fellow here, Craig Fugate, I think his last name is pronounced, who was director from 2009 to 2017. And he had established an index within the agency that they used whether waffle houses were open or not as an index of how bad the human experience of a storm had been. This is really used still in the United States. The quote for him, if you get there and the waffle house is closed, that's really bad. That's when you go to work. So this is the internal scale. I'm not kidding you. This is the waffle house after a hurricane. That one's closed. They're not serving waffles. Green, the green storm rating means that there's a full menu. And waffle houses, by the way, have their own electric tool generators. This is why they can stay open when the electric tool grid falls out. You can still get your hot waffles after a hurricane. Yeah, hello. There's a limited menu. There's no power, only limited power from the waffle health generator. And food supplies may be low. And red is the restaurant's clothes that increase your damage. And this turns out to be better than wind speed and rainfall because it's the human level. It's what the facilities do. And so this has become a serious offensive offense. Waffle house is mainly located in the southern parts of the United States just for historical reasons. Probably where there are lots of hurricane impacts. And so this implies useful as an index. Lots of other things are also associated with the region of the southern United States. The North Americans here will know this. And those of you who are North Americans, you will have a rich set of stereotypes drawn. I went to high school while I had college in the southern United States and had a lot of affection for it. So I will make jokes about it, but I do it as an insider. And one of the things that the South is also known for in addition to waffles is divorce. Divorce is a southern institution. The United States, the highest rates of divorce in the United States are in the southern states. And it turns out that one of the best predictors of divorce rate by state is the number of waffle houses per person. Why do I bring this up? Well, it's funny. First of all, secondly, it illustrates that common hazard instead of a lot of it is that everything's correlated. So if we use correlations as clues of causation, which we do, let's face it, that's what we do in science quite often. You're likely to make bad inferences. And one of the things we ask statistical methodology to do for us is to guard us against casual correlational inferences like this. It's probably not true, I assert, but I cannot prove that waffle house is not causing divorce. Nevertheless, if you put up the scatterplot like this, and then I've done the regression, not only is it amazing in regression, so it sounds very sciencey. It's just an extraordinary regression, whatever. And although there's a lot of uncertainty there, there's a pretty meaningful correlation between the number of waffle houses per person and the divorce rate in each state. What's going on? So plausibly, there are some other set of background variables that are associated with both of these things. Their correlation arises, we say, spuriously. It's an accident of some other set of variables that's driving them both. And things about the southern United States that they can have high divorce rate are, the sociological literature says, people get married earlier. And one of the best predictions of the solution of marriage is the age of the people get married. 15-year-olds tend to make bad marriages, that's what the literature says. And cohabitating outside of marriages by the United States is still frowned upon, much more so than other parts of the U.S. So this is what the literature says about why the divorce rate is higher in the south. And waffle house just started in the south. The guy who founded it was a southern American. It started in Georgia, if that's right. And certainly the greatest concentration in Georgia, which is where I went to college. So as an example of spurious correlation, of course, lots of things. The horse is a good variable to look for, because it's correlated with damn near everything that's related to people. Because it's driven by demographic forces and a lot of them. So it ends up having high correlations with tons of stuff. And in a time series, you can find correlations with, time series are great for finding correlations. There's some classic statistical result that, as a time series gets longer, it's bound to illustrate some spurious correlation. There's a great website, which I've heard a few browse. The spurious correlations website, which has a hilarious collection of these. It's one of my favorite. The divorce rate in Maine correlates very nicely, I must say, with the per capita distribution department as well. Now why? Well, these things are both related to demography and probably the age distribution. And so correlations like this can arise from all kinds of things, which are not giving you any clue about what causes divorce. It's not related to demargery or something. Right? Maybe more than a good subject is related, but it's caused by force. Right, one, two. But still, this is a hazardous thing. It's easy to find examples which are obviously spurious. The problem is that lots of things are not obviously spurious, but really are. And that is the terror. As I said on the title slide that it highlighted, the causal terror of trying to do scientific inference is that it's very difficult to tell from data alone whether a correlation indicates some causal connection or is instead nearly spurious. So this week, we're going to take all the tools from last week and tour through the causal terrors for you. The first part will build you up. We'll talk about all the great things that progression can do to guard against spurious, inference of spurious correlations. So we're going to carry forward with the Gaussian model from last week, but we're going to add multiple predictor variables. These are often called multivariate linear models. The good things about these models is they can help protect you against the most casual forms of spurious associations. They can reveal spurious correlations, and not only that, but there are cases where they're actual causal relationships and you can't detect them or see them unless you have more than one predictor variable in the model because causes are incompetent and mask one another. I want to show you how to uncover massive associations in such systems. Then we're going to tour the bad, and this may be, maybe we'll start this today, but we'll definitely spend Friday all on the bad and then you can go off to the weekend to press them. They'll come back next week and I'll build you back up again. But you're welcome. You need to know this, and I think I'm going to focus on cases where adding predictor variables causes spurious correlation. It is not harmless to just add stuff to a model and actually totally distort your inputs. I'll give you some examples of this, in addition to high real associations as a consequence. I'm very sensitive to this issue because I think there are lots of professional pressures to exaggerate the power of statistical methods, so it's my responsibility to deflate some of this and caution you about the dangers as well. That isn't to say that doing inference without statistics is better. That's also terrible. Statistics doesn't settle the argument and there are lots of dangers to be aware of, but at least you have a structured analytical system that you can actually clearly communicate what you've done. That's an achievement, but it doesn't solve the problem. It solves communication problems, the calculation problems, but it doesn't solve the inference problem. The universe is hostile to human life, that's my message, the central message of life. With that, but at least we have walls, so that's the upside. What is spurious association? Here's the background. Remember that correlation does not imply causation. The word imply there is the logical meaning, like in a philosophy course, if you go through the tables, that means it does not always indicate. It's a one-directional arrow of imply. That is, you can have correlation without a causal link between the variables. That's also true in the other direction. Causation doesn't even imply correlation. You can have variables that are causally related, but if you look at their correlation, it could be zero. What's a simple example? I won't spend much time on this, but a simple example would be, say you have a machine and you input into it positive and negative numbers, two minus two, four minus four, and it outputs the square of the input. There will be zero correlation because the input's in the outputs, but the input is perfectly determining of the output. They're definitely causally related to the zero correlation. It's easy to build possible machines like that. In complex systems, this is what the whole Santa Fe Institute is about. In complex systems, this sort of thing happens frighteningly often, that there are highly nonlinear delayed feedbacks and all sorts of things, and natural systems causally related variables may exhibit no correlation, unless you look at them in exactly the right way revealed by a model of how the system works. Causation implies some kind of association, though, but it may be complex. It may be nonlinear. It depends upon having some model of the system that you can lean on so that you know what kind of association to look for. It may not be... Now, again, now you expect me to say this because I'm a theorist, but I'm telling people that he models but guilty friends. So let's tour through this with the simplest kinds of linear associations with some modeling of morseling. And let's look at data from North America because following on the Waffle House example, you could do the same thing with Europe and with the regional states of Europe. And let's look at relationships. Think variables that are causally related to divorce rates in different regions. So the first one, of course, to think about is the marriage rate. We've asked this simple question, does marriage cause divorce? That's an interesting question, right? In the obvious sense, yes. No one ever gets married and no one can get divorced. But beyond that isn't what we need. Beyond that, though, there's variation in marriage rate, variation in divorce rate. And it's not necessarily, it depends upon some subtle set of models. And just, we're going to look at this today with linear regressions. But I want you to keep in mind that the satisfying analysis of this would start with some demographic idea. Think, like, well, what's the mechanism by which we mean this happen? And what's a better way to look at the data? But I want to teach you both for regression without getting out of the weeds of my particular age structure demographic model of worst terms, right? Some other time. If I need to hear something. But let's look at two potential explanatory variables for divorce rate in the states of the United States. So divorce is positively associated with marriage rate across the states. That's what we're looking at on the left, right? This is the simple binary linear regression of the sort you saw last week. And on the right instead, we have a bivariate regression between the median age of marriage in each state and the divorce rate. Now there's a strong negative relationship. Let's think about what that means now. So states in which people tend to get married later have lower divorce rates. This is the result I had to get before. This is a very strong negative correlation. This is the thing in the sociology literature people think is the most powerful driving force about why the southern states have. So again, this is true in Europe, too. You could look in Europe. It's the same kind of relationships. This racism is a moderated variable. Not surprising to people, but not so much anymore as it used to be. So how do we get both those things in the same model, though? And we need to do that because the first thing you ask is which of these is really driving force, right? That's a complicated question, but it's a good one and we want to start there. What would happen if we put them both in? What does that mean? So what we really want to know or rather what multivariate linear regressions answer our questions of the sort. What is the value of predictor once we know the other predictors? That is, let me flip back to the slide. Once you know a state's marriage rate, is there any additional predictive power in learning median age of marriage? And vice versa. Once you know the state of Alabama's median age of marriage, is there any additional predictive leverage you can get by also learning the marriage rate? That's what linear regressions do. They do that comparison. They do both directions simultaneously. That's what the coefficients are giving you answers to, is the marginal additional value of each predictor once you know all the others. They may not make sense right away, but we'll have lots of examples. And that's a very powerful thing. This is the kind of thing computers are good at and we're not. Computers are bad at things like walking and recognizing birds and photos and things that we find trivial. But they're really good at simple things comparisons between infinites and some predictor coefficients. That's easy to know. So we will lean on this robot to do this for us. But then we will have to interpret the output. And we'll spend a lot of time today interpreting. So we're going to ask the two questions. This is our first multivariate regressions. We're going to ask, what is the value of knowing marriage rate once we already know median age of marriage? What is the value of knowing median age of marriage once we know the marriage rate? So here's what the regression model looks like on the bottom of the slide. D sub i is the divorce rate in state i. We're going to make that normally distributed because everything in the class is normally distributed right now. Later on we can revisit this assumption. Again, what does that mean? All this means is we're saying there's some set of measures and they have finite variance. If we knew nothing else about them, we'll sign them a Gaussian distribution. And that's a very conservative way to model their error. That's the strongest assumption there. It's a very weak assumption. We could do better if we knew more about these measures. So now we modeled them with a mean and a standard deviation. The mean in each state i was the expected value of each state i as a comfortable model. And there's going to be some constant standard deviation across states. And the mean in each state i is given by this linear equation. Some intercept plus a term for the rate of marriage. R sub i is the rate of marriage in state i. The median age of marriage is a sub i in the United States. They just keep adding terms. Now most of you are familiar with ANOVA's and things like that. So you've seen these equations before. This is why they're called linear because this predictor is linear. Okay, so let's figure out what this means. So what this equation does is it asks the question above. When you see that kind of model, you want to see it asking the question instead of questions above it. So let's tour through this thing in the pieces. So as I said, D sub i is the divorce rate. R sub i is the marriage rate. A sub i is the median age of marriage in each state. And then we have these parameters, these unobservable things that we have to infer. We're going to have posterior distributions for them. The so-called slope for marriage rate and both of these parameters give you per unit change in the very most next to them. That is the amount that each of those betas is the amount that the outcome increases. So if R sub i, the marriage rate increases by one unit in a state, then the model expects D sub i to increase by beta sub r. And likewise for age of marriage. This is super boring, but these are machines and this is how they do stuff. This is their program and this is how they do it. So there's a lot of something that's built into this that makes them ask the question. Do we need priors? Do I need priors? And I signed here our standard bang-regalizing priors. I beg your indulgence that next week we'll talk about priors a lot more and why we don't want to have super-black priors things that are a little bit conservative. The threat that looms here is over-fitting and we'll talk about this next week is all about over-fitting. Endless over-fitting. So one way you want to think about the intercept is, as I said before, the horoscopic priors, you want to let it swing, let the slopes determine where the intercept is. Unless you have some theory about where the intercept should be. Sometimes you do. And then we put these priors on the coefficient so that they're skeptical of really big effects. This guards us against being tricked by our sample. We'll talk about this a lot next week. And then the standard deviation just sub-black range. We can worry about over-fitting on same mode as well when we get to later on in the course. Translate this into code on the right. This is the sort of things you're doing in your homework that you'll turn in on Friday. And we did last week in lecture. There's a variable called divorce in the dataset. We give it a normal distribution of mu and sigma. Remember those symbols, mu and sigma are arbitrary. You pick the names, but if you use mu and sigma, people will understand you. That's the only, the holy value. I don't know why Greek letters have survived in math so well. Like this. And then you write your linear equation for mu. These things can start to get quite long, especially if you have long variable names, which I often recommend, long scripted variable names. Text is free. It's typing in your computer. It'll take it all. It won't complain. So A plus VR times marriage.s. When you read the text, you'll see the .s has come from standardizing these variables, which is a very important thing to do. Get my sermon about the importance of standardizing variables again. I think we did some of this at the end of last week, right? And then VA times median age of marriage standardize. And then the priors, which are just as they were in the previous. Okay, so we can fit this in math. It's very straightforward now. Math executes the formula list. It really does. Math is just a wrapper that executes your formula list. You're programming your model, and you're defining the likelihood. It's the product of the likelihood of the priors. It goes into counting up all the ways that each combination of parameter values could produce the observed data. And then math does that. It climbs the hill and climbs the top part. So you can program lots of stuff in there. And it will just as happily execute this as with the other thing. This is more complicated now because you think we've got more parameters. So now, how many parameters do we have? Well, there's one, two, three, four, right? The parameters in this model. It's a four-dimensional posterior distribution. I think this is the first one that we've had. Now, we had the parabolic model on Friday, didn't we? That had four. So now you can't visualize it. It's some hypersphere. It's a four-dimensional sphere. But that's computer. Again, computers are good at that. Four-dimensional sphere, no problem. It doesn't see you anyway, right? It doesn't get confused by trying to visualize it. It just represents it as a big array. And it can return the quadratic approximation of it, which I showed you at the bottom here. Again, what are these? The mean column. This is the center of this four-dimensional hypersphere, right? It's the point of, in the middle of the average, the peak of this four-dimensional Gaussian surface. And then the state of deviation in each of the dimensions. What's missing here that tells you the whole shape of the popular distribution or the covariance of some of these dimensions. You can get those out if you wanted them. And I think it's in chapter four. I shouldn't have seen that somewhere. Yeah. So when you extract samples, that information gets used and the correlation structure is still preserved. So it's reading those tables. Now, this is going to be a lot of my opinion. And you may have different opinions and I respect differences of opinion on these things. But my opinion is that tables of coefficients are really, they struggle to be useful. They look really scientific. It kind of looks like science, right? You show it to your relatives sometimes and say, oh, you're a scientist now. The tables of coefficients look really nice. And yeah, they're good for that. But I personally, and again, if you get a lot from them, that's great. You're better than me then. I don't get a lot out of them. I have a hard time parsing them. I think what people get trained to do is it really gets getting tables of coefficients for p-values less than 5%. That's what people are good at. They're very fast at that. But once, like me, you've deleted all the p-values, these things cease to be tremendously useful. I joke, but it's true, I think. I think it's what people do with tables of coefficients. It's like, are even these conventionally significant? There's a lot more information in there, of course. That information is useful. But it's hard to get it out of just the table of numbers. So there are lots of ways to visualize these. An easy way is you can just plot this pricey table from my red nipping package, and it makes this kind of plot. But what do people call these things? Caterpillar plots, or what do they call them? I don't think they have names. Anyway, plot. Caterpillar. Caterpillar? Something like that. Anyway, point and line segments. So what is it plotting? The horizontal axis is just the value of a variable. All the variables in the posterior distribution are then on vertical axis, one in each row. And then along that row, we plot the open circle for the map for the mean of the quadratic approximation of the posterior distribution. And then the interval is the 89% interval that's from the table up there. It lets you get an idea of the location and precision of the estimate. And zero is highlighted here because for coefficients like slope, zero is a natural point of reference. If things are below zero, then as the variable increases, it reduces the outcome. And if it's above zero, then as the picture increases, it increases the outcome. So zero is a natural function point. But what you should not do is ask when the thing overlaps zero and then conclude if it does, there's no effect. The interval can overlap zero and there will still be a very powerful effect of a predictor. Zero is not special. It isn't like some magic that if the line touches zero, it annihilates the effect. There's nothing special about zero. It went on 0.1, right? Overlaps 0.1, no effect. There's nothing. So don't do that. Zero is a natural function point that you pay attention to. But don't give in to the conventional superstition of thinking just because an interval touches zero that you should ignore that predictor. That's not true. Yeah, Katie? So how would you interpret something like that then? I mean like, so you just talked about if it's below zero, it's negative, right, negative skew, whatever. That's a great question. So the question was, in case the microphone didn't pick it up, how would I interpret something like this? Let me give you my feel about this particular kind of set of effects. Let's ignore the intercept for a moment because the intercept, like whatever is over there, anchors the lock, right? I mean, well, that's the divorce rate for a state where the marriage rate is zero is the median age of marriage is zero. So it's meaningless in this model, right? It's off the graph. It's an impossible thing. Right? So that's why we'll just take the price. It has no interpretable meaning. What's that? Wasn't it standardized? Your variable? Oh yeah, so for an average state. Yeah, thank you. No many points out there. It says I standardize the predictor as it does have a meaning. So for an average, for a state with both average rate of marriage and average median age of marriage, that's the divorce rate. So it's about a little over nine. Something like that. Am I seeing it right? Yeah, nine and a half. Yeah, something like that. So let's look at the coefficients. Let's think about the coefficient on marriage rate for a second. It's slightly negative and it overlaps a lot of the probability mass in the positive distribution is above zero. More of it's below zero. So on the whole, the evidence is that it has some small negative effect, small relative to the other effect. That's the way I interpret it. That's the evidence. So if it has any effect, it's small. All the probability masses near zero. It seems it's about one-third chance that it's small and slightly positive and about two-thirds chance that it's small and slightly negative. Either way, it's small. That's the way I interpret it. But it's not zero. Now maybe as a manager, you want to proceed as if it were zero. You think it's ignorable and that may be true. But it's easy to see cases, and we'll look later in this course, where those error bars are really wide and the average effect is really far from zero. But it nevertheless overlaps zero. Does this make sense? It's not on the slide. So it's hard. It's on the slide right now. It's hard to see it. But you can get the map estimate. It could be really positive. But the uncertainty could be really big such that the interval overlaps zero. And what people do with key values is they say, oh, it's not significant. It doesn't matter. But it's just as much chance, according to the model, that the thing is bigger than the map estimate than it is smaller. I'll say that again. According to the model, there's just as much chance that the effect is bigger than the map estimate than it is smaller, at least for a Gaussian model. So it doesn't make sense to say, just because zero is a possible value, we're going to act as if it's zero as a very poor decision-making process. So I should say, statisticians are unanimous about this. We are routinely horrified by the way people do significant testing. But yeah, we lost control a long time ago. Anyway, today I'm playing a statistician. But tomorrow I play a scientist. There's a conflict of interest within my soul. I just tried to stay fewer. So now the median age of marriage effect, this is a lot more negative. And all the probability mass is below zero. And it's certainly not a small effect. In fact, there's almost no overlap between those two. This is the same. We shouldn't use the overlap between things to indicate their difference. There's a whole other distribution for the differences. We'll do that later this week. Contrast effects. Psychologists know these very well. And then standard deviation is not something we're going to interpret a whole lot. But this is the amount of scatter around the line. Residual variance. Okay. Unless you have predictions about that, you're not interested in it usually. Okay. So let's try to summarize. We'll be very forced. Let's go back to interpreting the golem. One way I often joke about this is to say that you get these tables of coefficients. Maybe you plot them up. I recommend plotting them up. And think of this as like an interview with the golem. Now you fed data to the golem and it's produced a posterior distribution. That's the answer. And I will thanks. Yeah, this is what I wanted. A four-dimensional Gaussian hypersphere. Thanks. That's exactly what I wanted. And the golem doesn't get sarcasm. So it doesn't respond very nicely to that. So then you have to patiently have an interview with this thing. And you take it into the interrogation room. And you go to work. And what do you get out of this? Well, after a few hours of sweat and pounding on the table, you can learn that what the golem thinks is that once you know the immediate age of marriage, there's little additional value in knowing the marriage rate. Yeah. And that's because BR is close to zero. And it seems there's a lot of math on both sides of the euro, in fact. And it's a fact relative to the A is small. In contrast, once you know the marriage rate, there's still a lot of value in knowing the immediate age of marriage. You would really want to know. So another way to think about this, if you had to build up some kind of, what people call lexicographic rule, and you could only pick one piece of information predicting the state's divorce rate, you would pick immediate age of marriage. That would be the first point. And then, if you could know the marriage rate, there would be a tiny bit, maybe, of additional predictive leather that you could get but almost not here. Because the uncertainty is on both sides of zero, so you're really not even sure whether it's an increase or decrease. There may be some effect there, but it's very hard to tell from this data. But keep in mind, if you didn't have access to the immediate age of marriage, there's a lot of information in marriage rate because these two predictors are correlated with one another, so they contain co-information. If you're denied by some censorship knowing access to the immediate age of marriage for a state, go for the, there is informational value in knowing the marriage rate. Absolutely. And that was the graph I showed you at the start. Right? Remember the five-variate scatter. And this is the thing why we say, now, people say, oh, there's a spurious correlation between marriage rate and divorce rate. Because once you control for immediate age of marriage, it's not very important. Then that's an okay summary to say. But that doesn't mean that learning marriage rate is of no value. It depends on what other information you have. If prediction is your goal, you'll think what you can get. I mean, it's probably true. The immediate age of marriage is also spurious. There are other things that are mediating effects that would not get out of the model. This is not actually the immediate age of marriage of the state that causes divorce. It's the sands of the hourglass and the days of the lives of the people. Right? And that's what causes divorce. It's that those are the mediating variables. These are just Bureau of Census statistics that are associated with one another and are pale shadows of the actual events that are causing these things. All right. That's enough of my bad poetry. So let's spend maybe the rest of the day just plotting the results for this fall. This will be more exciting than it sounds. I think continuing the metaphor of the interview with the golem, there are lots of different ways to go about interpreting the four-dimensional hypercube and what it means. And there's no single right way to do it. It depends upon the structure of the model and the meaning of the data, what you want to do. It also may depend upon you, the observer and your training or the particular questions you're interested in. So let me give you some options. And as we go through the course, we will return to these options and use them in different cases. And I want to be clear from the start, though, that I'm not saying you have to do these things, right? But these are broadly useful, horoscopically useful sorts of things. Before I know what you're studying and what your model and data are, I can say, well, this is a nice suite of things that have recurrent use in the sciences. And so I want to offer them to you as part of my horoscopic advice. But in any particular case, if you come by my office and show me your data and model, I can probably give you better advice. At least I would try to. I may fail, but I will try, I promise. So four options to show you today. First, I want to walk you through what are called predictor residual plots. This may be familiar to those of you who had another course in regression. And I'm going to show you how to construct these plots. But this is also an attempt to also re-explain what just happened with the model and how the model goes about getting the answers it does. Second, I have a plot that I personally use a lot called counterfactual plots. They are a genre of fiction. They produce impossible graphs that can never exist in your natural system. And by doing so, they help you understand what the model thinks. Because the model doesn't complain when you consider impossible cases, only you will complain. And that helps you understand how the machine functions. You see it in journals a lot too. People don't tend to call it this. They use the return of counterfactual, so you keep in mind that these cases that you plug into the model and are visualizing cannot happen to predict possible things. Number three, posterior prediction plots, where we're forcing the model to make new predictions and you could view those predictions in any particular useful way, depending upon the nature of your system. These are useful for all sorts of things, because the machine works. See if it makes sensible predictions of the data you see. These are sometimes called posterior predictive checks. And I think you can probably always do one of some kind, just to make sure that the model worked. The machine actually functions, because sometimes it doesn't. And then number four, it's not really high on budget. You can invent your own. You should feel free, since you are the expert on your system, to have some other way to visualize it. You don't have to go with convention. Okay. So our goal is to show the association of each predictor with the outcome. Controlling in quotes for the other predictors. Let's see, we want to visualize how the golem sees it. How do these beta coefficients get their meanings inside the model. And this is very useful for getting some intuition about how the model works. So we're going to compute things called residuals, which are in substance implied inside the machinery. But I would say you should never analyze residuals. And I have to say this, because you see it in journals. I don't think I've seen it in psychology journals very often, but in biology journals it's quite common to analyze residuals. This is a terrible practice. Why? Because residuals lose the uncertainty around them. Residuals are parameters. And everything that is a parameter depends upon the posterior distribution. It has uncertainties. There's a posterior distribution of the residuals. People just compute residuals and they treat them like data. They throw away all the uncertainty. You don't know the residuals. You have to infer them. They're not data. Sorry, I'm going to get exercised about these things. This is like rampant statistical malpractice inside the journals. Anyway, you'll get my answer about this over and over again. If you treat them like data, you're throwing away a whole bunch of uncertainty and you will mess up. It's my job as a grumpy statistical reviewer to say these things a lot. Introduce, and I do. But as you know, most papers never have any statistical reviewer. And so you see these things a lot. Anyway, I should say now that I've done this rant, I don't blame scientists for doing these things. They do it because they were taught to. Right? Often by the most successful scientists in their field. And so, instead, there's a vastly personal conspiracy that is scheming against us to make us do bad science. And I absolve individuals of guilt, instead we share a collective guilt for the dynamics of our society. We should band together. So, not more on that anymore in the course. So, what's the recipe for your residual cost? I'm going to walk through this on the slides to follow up. First, we're going to regress each predictor on the other predictors. As if it was its own model now. Leave aside the outcome variable per moment. Forget about it. Forget, in this case, divorce. Now we've got median age of marriage and married grade. We're going to do a regression of each of those on the other. And for each of those regressions you compute the predictor residuals, and it helps you visualize that. And then we're going to regress the outcome on the residuals. And again, you can see I'm going to do the thing I just say right above is not to do. We're going to analyze residuals, but I'm doing it just to make a crack. We're not making conclusions from this. The conclusions come from the multivariate regression in the first place, and that's what you should do. So, getting the residuals this way is just the visualization. And take the residuals and put them in another model like I see in the neural nature. Yeah, actually it depends. Well, I think it's it's easier to do bad stats in higher-effector journalists than it is in medium-effector journalists. It's very hard to do bad stats in 2D American naturalists. It's very easy to get into the science of nature. I think that's I think there's data on that actually for being surpised with that. It's the review process, right? So, okay. Here, we regressed marriage rate on median age of marriage. There's nothing fancy about this model. You're being comfortable with these models by now. This is just a linear regression between marriage rate as the outcome and median age of marriage as the predictor. Nothing surprising? No? That's a question. So, what's a residual? A residual is the distance of each outcome from the expectation. What is mu in this model? It's the expected value for each state. So, we want to know the distance between the expectation for each state according to the posterior distribution and the observed. The outcome here means the outcome in this little model we just did, which in this case is marriage rate. So, here's the code. We compute mu for each state and there's one line of code to do it in R. So, we have a coefficient out of this model in 5.4 and you want the component A how do you do it in R? And we want the times the median age of marriage in each state. And since median age of marriage in each state is a vector, mu ends up being a vector of the same length. So, now you've got a mu for each state. Yeah, you guys love R, right? Katie's got it. Thank you, Katie. So, then same thing goes for computing the residuals. Sorry, I left this one on the next slide. Computing the residuals we take the marriage rate in each state and we subtract mu from it. So, marriage rate in each state is also a vector. Mu is a vector of the same length. So, the difference is a vector of the same length and that's what the residuals are. So, you get a residual for each state. Let's visualize what those things are. So, here is the regression between marriage rate and median age of marriage. Each of those points is a state and the line is the math regression line. The maximum of a posteriori regression line. In this case, almost identical to the maximum like of the regression line. Or if you just did OLS regression then you can do that. It's almost identical to this. What are the residuals? Well, if we draw a line, a vertical line from the regression line to each data point, the length of that line is the residual for each state. That's all it is. That's all the math code on the previous slide. So, let's zoom in here on a particular region. So, yeah, that worked. Zoom in. It took me about 15 minutes to figure that out. Sorry. We zoom in on this region of the regression line. Each of those blue points is a state and the vertical lines from the regression line to the residual is a prediction error. Average prediction error. It's average because these residuals actually have a posterior distribution where we just plotted the math line. But there's actually a bunch of lines. Where the posteriori is full of lines and so for each of the lines there's a different residual implied. So there's actually a distribution of residuals for each state. But we're just looking at the expectation we're going to draw a graph. This was taken into account. That's what the posterior distribution does. It was already in there. We're just drawing pictures now. So we're going to focus on the average for the picture and that's fine, but this is about the analysis. So let's consider a couple of cases in the inset here. Consider states below the regression line. For these states the marriage rate in the state the actual observed marriage rate in the state is less than the expectation. These states have a low rate of marriage for their median age of marriage. It's a weird thing to say, but think about it. So what are individuals in these states giving on average? Typical individuals in these states they get married slower think about it that way. It's a race, like a speed. For typical states with that median age of marriage the age of which people get married. So it's implicitly a comparison to the whole set because that's where the regression line comes from. States above the regression line it's the opposite. They're getting married fast for the expectation. So they have a high rate of marriage for their median age of marriage. So other states with similar median age of marriage are getting married slower than those states. Error in both directions. So there are different kinds of cases. The residuals give you information about the deviation. It's a deviation it's directed. There's a sign to it whether it's negative or positive. And that's the information we want. That's the information the model uses to think about the additional information of knowing one variable once you know the other. So let me try to walk you through that. Now we ask this question how is the Morse associated with the residuals of marriage rate? As a plot maybe it'll make more sense. So here's the thing we already saw. This is median age of marriage against marriage and then there's regression line and the vertical lines are the residuals. Above the line we have states that are fast for their median age of marriage and below we have states that are slow for their median age of marriage. This is what you just saw. Now let's take those residuals the vertical lines and let's put them on the horizontal axis on the bottom of the right. Let's take them as data points down. Again this is for visualizations and let's plot them against the Morse rate to ask what's the relationship. So there's all this variation in marriage rate that was left over after we accounted for the association between marriage rate and median age of marriage remaining association varying variation is embodied in the residuals and it's called residual variance. So now we're plotting that residual variance the positive and negative variations the fast states and the slow states against the divorce rate and we're asking what's the relationship now and if there's any relationship then that's the additional marginal value once you've taken the joint information that is shared in this new variable we'll talk about this in the other direction again and come back to it but this is how the model inside is getting the partialing out these are partial correlations so let's look at it again on the graph on the right here marriage rate residuals are on the horizontal they're centered in zero where I have that vertical dash line to the left of the vertical dash line you have the slow states they get very slow for the median age of marriage we're already taking median age of marriage out of those values all the joint information into this column that's what the regression on the left is well joint information assuming a linear relationship the relationship may not be linear always but this is an assumption of your model and states to the right of the vertical dash line are fast they get very fast for the median age of marriage and now we plot those against the divorce rate and we ask is there any relationship and the answer is not much very mild negative correlation there so there's not this is what the model learned before once you've accounted for median age of marriage there's not much variation left in marriage rate that informs the divorce rate I'll say that again once you've accounted for the association between median age of marriage and marriage rate there's not much variation left in marriage rate that is associated with the divorce rate just a little bit of slightly negative like the regression coefficient we got before slightly negative but it could possibly be positive and that's what the bowtie is showing yeah good yeah at least enough to march forward let's do it the other way I always like to do things both ways so how is the more associated with the divorce rate so this is my summary slide for states with fast low rates of marriage or age of marriage do not on average have fast low divorce rates that's what this residual plot shows you these residual plots are often great for helping readers visualize the coefficient and what the way the model sees that vibrant relationship between an individual predictor and the outcome it sees it like this at least in a purely linear model and a nonlinear model you can still compute residuals but it's more complicated so let's do it as I said the other way so think about it from both perspectives there's two predictors so we're going to go through each of the plots on this slide one at a time so we can focus on one plot at a time and understand the whole just salt of it in the upper left first we're looking at that regression between median age of marriage and marriage so we're taking we're seeing what variation is what's the variation in marriage rate after we've accounted for the association with median age of marriage and that's how we got the past states and the slow states that's what we just did and then when we look at the residual predictor plot we've got residuals in the outcome and we see that there's not much information left once you've taken the median age of marriage information out marriage rate tells you very little now we do it the other direction exactly the same procedure but now we flip the regression that's in the upper left and we've changed the axis now what was the outcome is the predictor and what was the predictor is the outcome now we're taking the average rate out of median age of marriage now there's residuals for median age of marriage so above the regression line we've got states where people get married older for their rate of marriage there are a bunch of states imagine that have the same rate of marriage and the whole total population but in some of those states people are getting married older and some of those states are getting married younger and that's what this line is telling you above the line is the states where people didn't get married older and below the line they didn't get married younger and then we compute the residuals those vertical lines and we can plot those on the horizontal axis and the plot in the lower right again the vertical dash line is a zero point so the right of that we have states where people get married older for the rate of marriage in the state and on the left where people get married younger for the rate of marriage in the state and we plot them against the divorce rate and now we see that there's a lot of variation a lot of association between these two variables even after we've taken out the information about marriage rate and this is what the multiple progression told you of course this is not this is just visualizing what you saw before and the slopes in those lines are the slopes that you had in the table before it's the same information but you're seeing it like the Golem season this is the interview with the Golem right as you focus on a bi-variant relationship the way the machine sees it does this make sense? is this useful? I remember when I was learning this as an undergrad I thought this was useful so that's why I put it in here this is a very traditional presentation of multiple regression which is not, I don't tend to do traditional presentations but I thought when they're useful I'll do them I'm not any traditionalist so let's try to summarize what we're doing here is what we're doing here is often called statistical control we're controlling for other predictors and I'll have I put control in quotes because of course control is for experimental design where you really can't control for things or at least we hope so in statistics you don't really control for things it's a magical counterfactual kind of control it imports experimental design language into model design this is the standard language so multiple regression answers the question how is each predictor so to with the outcome once we know all the other predictors it's just association it's not telling you causes depend upon external information we'll problematize this a lot more this week it just uses the model to build the expected outcomes which are the lines it's not magic it's all model based this is all small world no direct access to reality here that all comes from your large world beliefs interpretations so don't get cocky the merit rate is still being associated with the divorce for some subset of states even if it's not for all the states now this is not to encourage you to do subgroup analysis endlessly looking for some particular cluster of states in which there is a meaningful correlation that sometimes happens it's called p-hacking where you take a dataset you slice it up a bunch of subgroups and you've got a theory that motivates subgroup analysis fine you can justify it but with people's p-hacks depend upon finding p-values they will find them they have inventive ways to do it partly my job as a physics educator is to discourage you for doing things that only need you when I want to help you discover true things while you do it that's my goal so we can't make strong causal inferences especially in this case because these are averages and there are lots of hazards in doing regression on averages we'll talk about that later in the course there are lots of potential fallacies that arise from doing regressions like this so I'm setting myself up to disprove my earlier lecture later on we talked about sense and paradox in particular averages can be related to one another at the individual level the variables are not related it's the fun fact about the world again the universe is hostile to human life that's the achievement you can feel proud that you've gotten out of bed every morning because the universe is history you're a winner so counterfactual plots the goal in a counterfactual plot is still to conduct the interview with the golem and the implications of the model but now we're not going to do it in the residual space we're going to do it on the actual measurement scales of variables that we have in the data set we're going to force the ball to make predictions for any fictional sets of combinations and predictors we like and we do this because it helps us understand what the model thinks you can visualize the slopes and you can visualize later on interactions that'll be very important when we get to chapter 7 so let me introduce these now where they won't necessarily add a lot of information to you but they'll help you see the slope and the magnitude of the slope on the natural scale of the data so how do they work? you pick some predictor you're interested in you want to view, say, the bi-herit relationship between marriage rate and the worse rate as applied by the model so you fix the other predictors you imagine some set of states that all have exactly the same meaning in your marriage say the average here, say the zero since it's a standardized variable that means it's the average and then we vary freely the median age of marriage as if you could vary these things freely but you can't, right? not in a human animal, at least if you manipulate one if there's some intervention you would do to a state that would change one of these things i-positive will change the other this is why it's a counterfactual plot you cannot in a natural system freely manipulate all the predictors it's just not how it works and especially not in anthropology right? but in the machine you can do this and it's really useful for understanding what the model thinks and then you can view predictions across some values of the predictor so at the bottom here are two examples we vary marriage rate holding median age marriage constant at the average and you get to see that's the slope, it's mildly negative the darker bow tie in the middle the uncertainty of the mean of the expectation of mu and the lighter bow tie is not too bowed the outside is the total prediction at the 89% interval using sigma the whole scatter of states are expected to be within that region and then on the right same thing now but for median age of marriage we imagine a fictional set of states that all have the same marriage rate but they vary a median age of marriage this is what the model expects them to look like yeah the model can do these things without complaining because it doesn't know the meaning of the variables it's only your knowledge of the variables but wait you can't make a state that really varies these things but this helps you understand what the model thinks and that's very useful you always have more than your model so you have to draw here so again on the left we change the marriage rate without being median age of marriage we change the median age of marriage without changing the marriage rate last thing I have time for today is posterior predictions this is a thing I think is nearly always useful if for nothing else than checking that the model worked that it fit correctly we're going to compute implied predictions for the observed cases just to check back what does the model think about the cases that were used to train it it was trained that the posterior distribution was updated on the basis of some set of cases that you fed it we want to see what it thinks about those sometimes it really doesn't think about them it's a really bad model and then cases where it mismatches the goal is not to get a perfect retradiction of your sample but rather to help it inspire you to think about which cases it is doing bad at because that gives you a way forward think about what's wrong and often models will do a great job on a majority of cases and a terrible job on a few cases that gives you some idea why states have particular histories so we'll have an example here coming up in a second so bear with me when you do this you need to average over the posterior distribution always we don't always use the math we have to brace the uncertainty that is embodied in the whole posterior distribution propagate it forward communicate it in the graphs we're going to do that with intervals but sometimes you can do it with distributions as well so there's a function I just designed for this I wrote this for myself years ago because I like to do this with every model this is just the did it fit sort of thing and I run the thing and then you run a project and it's an ugly graph but bear with me it has to work for everything so along the horizontal axis is each case each row in the data each little high index in the data so here is each state then on the vertical axis is the outcome so for each for each case we've got a blue dot which is the observed divorce rate and then we've got an open dot which is the model's expected value that's the mu and then there's a little line segment around the open dot which is the uncertainty around mu for that case and then the two pluses give you the full prediction interval and so you can read this well that depends upon you in the case and what the data mean sorry that's just how it is but the first thing I look at is it getting even close right did it fit at all and that's kind of the first check of it so you can see how it co-buries around the model is these sorts of regressions will nearly always get all the cases within the plus intervals because that's what sigma does the model expands sigma so that you can get the majority of the data points inside the envelope so don't be comforted by that it's really good at doing that the expectations are often more informative and the misfits are more informative than the places where you get it exactly and there are some cases where you get it exactly anyway so let me show you one last thing and then I'll let you go looking at those predictions in different ways the same posterior predictions is often informative so now we're looking at the observed divorce rates on the horizontal against the predicted on the vertical and this diagonal dashed line is the line of equality that's the cases where the model gets it exactly right and it does get it almost exactly right for average states for states in the middle because that's what linear models do they get the average cases right so they're good at the misfit cases here are easy to pick out there are two states in particular where the model is doing a really really bad job and they're Idaho and Utah and those the people who have spent some time in the United States are laughing because they know there is a particular feature of these states about their populations that is unique demographically and culturally that explains this result what's going on with these states these states the model expects them divorce rates that they actually have so these states have really low divorce rates for according to the model for their median age of marriage and their marriage rates what is up with these states? these states are heavily they have a lot of people who are members of the church of Latter-day Saints the LDS church known as the Mormons and the Mormon church is very good at enforcing the continuity of marriage divorce is a very very very bad thing for the Mormon church and the population is one of the highest proportions of Mormons in the states and that is the fact that explains it but you pick it up the model didn't originally think you needed Mormons in it but if you really want to explain all the states you would want to put them in now maybe you don't care about the relationship between being LDS and divorce rate fine but the idea would be you get a better estimate of the underlying relationship between median age of marriage and divorce rate if you accounted for this compound noise that is getting in the way it makes those states less important than they would be otherwise alright with that I apologize for holding you over for a few minutes thank you for your attention and I'll see you on Friday we're asking people why do you like Waffle House? you know what I really like best about a Waffle House I can stop by here anytime day or night that I've been fishing and get good breakfast lunch or dinner it doesn't matter what time I leave in the morning or what time I get done in the evening you know what I really like about it when I end up in the dog house I can always come to Waffle House and get something to eat