 If any questions about the Seahawks come up later, please let me know. This week we're going to do multivariate models. Let me get into the foot sound of dirt, for example, of the kinds of problems that we want. Those of you who have lived on the East Coast know about Waffle House? Yeah? Yeah. Okay, so anyone here not been to Waffle House? Well, you must rectify this probably at some point. There's some kind of reality distortion with Waffle House, which is not actually good, but it tastes really good. Maybe it's only because I eat there at 2 a.m. when I'm drunk. But no, it's like IHOP food. It's hash browns and stuff, I don't know. So here's the thing about Waffle House. I pulled this data down myself. There's a pretty reliable correlation between the number of Waffle House's four million people in the different states in the United States and the divorce rate of those states. So is Waffle House causing divorce? Of course not. This is one of those various correlations. There's a great website. It's just Google's various correlations. You can find it, which just magically finds various correlations by comparing two variables together to type series of two variables. This is not the kind of thing I would hope that anybody would try to publish a paper about because it's obviously spurious, right? There's no credible theory about how the presence of Waffle House could cause divorce. If you think really hard, maybe you can confabulate one. Keep that in creative so you might be able to do so. But I assert that it's an accident, right? It's what we call a spurious correlation. And one of the things that we like in multivariate statistics is the ability to partial out the effect of particular particular variables and discover the one that is really driving the outcome. And that's what we lean on multiple regression to do, and that will be the topic of this week. There won't be a lot of new tools introduced really. We'll be using similar kinds of model forms, Gaussian linear models, but there'll be a lot more conceptual work and a lot more plotting this week. We'll love the plotting. So the major idea with multivariate models is we bring in additional predictive variables, the ones that are actually driving the relationship. And then it leads the spurious ones, like the Waffle House density in a state, to no longer be partially correlated with the outcome. So in this case, the story will be, those of you who grew up on the East Coast know this, Waffle House started in Georgia in recent history. And it has grown organically out of Georgia over time. And so eventually it'll take over the world, but right now it's still mainly a southern phenomenon, as shown by this map of the density of Waffle Houses. You can kind of see it here. In Georgia, it has the most, there are 40 Waffle Houses per million people in Georgia, which is reassuring in some sense. South Carolina also has a lot, Alabama and so on. And then all these ones with zero, well, it's the white stuff up there. It's the New England and the West Coast and the part of the northern states. So it's just an accident that Waffle House ends up being associated with states with high divorce rates because the south has higher divorce rates. We're going to explore why today and how this is an example, not because we care about that problem, but because it's a good way to think about multiple regression. Lots of sports, sports correlations are routinely published though because sometimes it's not so obvious. Here's one, I collect these things. Here's one of my favorite, the effect of country music on suicide. If you've listened to much country music, and you should, really should sometimes, it's unique American kind of music. It's like culture-ology of a sort. It's worth doing. It is often depressing. It is. But somehow that makes you feel good. People like horror movies. They don't like being scared, but they like being scared in their living rooms. So it's like good country music. You don't like being sad, but you like listening to songs about people who are sad. It's an interesting thing to understand. So turns out if you get the right data set at least, you can find a pretty strong correlation between country music listening habits in aggregated metropolitan areas in the south and rates of suicide. They don't attempt at suicide. I suggest this as spurious, although really, who knows. Right? Here's one, which is definitely spurious. I think this is a joke paper, actually. Male organ and economic growth does size matter. The paper is very tongue-in-cheek. It's one of those things like it's so well done that you can't tell it's a joke. I think it's a joke. It's got to be a joke. Other papers are quite serious. So I think this is a joke. Anyway, if you find the key joke, the punchline in the paper is in the lower right graph, if you regress the difference in economic performance between 1985 and 1990 measured by GDP across nations and available data on the length of the male member among individuals in those nations, there is a strong negative relationship. I assert that there is no plausible causal explanation for this. I don't trust either axis on this graph, either. There are serious data quality issues involved here. Just to say, you can find relationships and the fact that this correlation exists does not then require that we believe it. Because it's easy to imagine that there's something else that's correlating with both these things. It's really driving a lot of, like, accidents of history or systematic measurement error or something like that. So we're going to take a peek this week at the machinery that is most often used to deal with these kinds of issues, to pull out from the trap of believing spurious correlations. And this is part of what I call the regression march. It's a long march to the sea, along the way burning cities and such. So last week you got the introduction of basic linear regression, and most of you had some introductions at before, but I hope my reintroduction to it, the rethinking of it was useful to you. You had to do it much more mechanistically. This week we're going to do multivariate linear regression, same sort of tool, same sort of models, but it gets conceptually harder because they're more moving pieces. And there are many more ways to visualize the predictions of the models because they're more things driving the predictions now. Next week we're going to do model comparison, information criteria, and your introduction to information theory. It'll be the same sorts of models still, but now we'll be looking at a formalized way to compare more than one model. And that'll set us up for week 5. We do interactions again, Gaussian models, multivariate Gaussian models, again, spend the whole week on interactions because interactions are really subtle. And after our Markov Chain Monte Carlo week, we're going to need to understand interactions and how to visualize them because when we get to generalize linear models, which are, if you want to say, a fancy form of regression, everything in them interacts, even if you don't explicitly make it interact, which is part of the fun to generalize linear models. So that's why I set it up this way. Usually interactions are this tiny little leg foot down on something. And oh, and by the way, you can multiply some predictions together. And we're going to obsess about them instead. So indulge me in this march. This paves the way and then starting in chapter 7, we can introduce new model types much more rapidly because you'll have gotten this, I hope, strong foundation in the basic mechanics of these things. That's where we're going. So this week, here's my ambition. We'll introduce the structure of multivariate Gaussian models in the context of a spurious correlation example. We're going to work through the good things that we like these models to do for us and that they can in fact do. The first is to reveal spurious correlation, that is, get rid of the Waffle House accent. Convince your friend who's really convinced that Waffle Houses are ruining marriages, that it's not that, that it's something else. And the other is, which I may get to start today but may not finish, is to uncover masked association. The case where you need, there could be multiple predictor variables that are driving the phenomenon simultaneously, but none of them may apparently do so because they hide one another's effects. And this happens a frustrating amount of the time. So we'll give to that. I think we'll start it today. We probably won't finish it today. And then on Thursday we'll talk about some bad things with multivariate models. Everything comes with costs. You have to be careful when you have predictors that are strongly correlated. These models are not magic and they can't tell you everything about how the world really works because they're just little machines, remember. And we'll get to introduce this problem of overfitting which will be our major work next week when we talk about model comparison which is designed to deal with the issue that just to set it up in the shortest way I know how overfitting is the problem that models that have more parameters always fit better. Always. If you fit them correctly that is. If you make a mistake they won't. But if you get them to fit right, they will always fit your sample better than a model with fewer parameters. Nevertheless, obviously this sets up a system where you lead towards madness. Eventually you get a parameter for every data point. Then you'll have a perfect fit to your data. But you'll understand nothing. And so we've got to cope with this somehow and that's why we'll spend all of next week on that issue. So let's deal with the Swedish Association. Let's leave Walshville House behind and instead focus on something that's a little bit more serious where you might actually be fooled. There is a correlation in these 50 states. Apologize to the international listeners. I have you in mind but just think about the regions of your own country if you will. I don't know if this is true in all places. But in the United States it is true that states with high marriage rates also have high divorce rates. So does marriage cause divorce? Well in the most trivial joking sense of course it does. You can't get divorced if you don't get married I think. I hope that may be coming. But in this case though this is almost certainly a spurious correlation and I want to show you what's actually going on here. In the book I say more about how it's not obvious. It could go either way. It could be that states with high marriage rates value marriage a lot and therefore wouldn't have as much divorce. So it's not clear that just the stock of marriage should necessarily be associated with more divorce at all. It could easily turn out differently. As I also want to show you in this case this is almost certainly spurious. Let me show you the alternative explanatory variable is the median age of marriage in many different states. Both of these variables are associated with the divorce rate in different directions. So places where people get married at a higher rate. That's what's marriage. By the way the dot s means it's standardized and those are zero on there. It's not that they're negative marriages in some states. That's minus one z score. It's positively associated with marriage rate. It's negatively associated with the median age of marriage. Also standard. This is the average median age across the states. So these are far out on the right of the right hand graph on this slide what you're seeing are states where people get married late and in those states there's a lot less divorce. And on the left places where people get married young and there's a lot more divorce in those places. Question? Yeah it's like it's per 10,000 I think. Per population. That's right. Standardized in the state population. That's right. As is the marriage rate. They have the same denominator. I believe in these data. I assembled this data set myself just from U.S. statistics I think. I can't remember what that was but in the book I give you the URLs you can go and farm it all down and discover my error if there is one somewhere. So these are both associated so the question is which is really associated if either of them or maybe they both are. Maybe they're both driving this. And this is a case as I set it up this way where there's one of these is spurious and the other isn't and there's no suspense. Median-agent marriage there's a big literature on this. Median-agent marriage is the best predictor. It's still a terrible predictor by the way but it's the best predictor of divorce. Basically people can't predict divorce very well but when people, individual couples get married young the risk of that marriage ending into divorce is a lot higher. So you can even do control within individuals that way. These data are aggregated across individuals so it's the worst kind of data you want to reason with it's just averages within states but it still shows the same effect. Median-agent marriage is the best predictor of divorce in the United States. It's still not great but it's the best. And marriage rate really gives you almost no leverage at all. So let's think what we want our multivariate divorce model to do for us is to answer this kind of question. We want to say for either of these predictor variables is there once we know the other predictor is there any value in learning? So for example if we already knew the marriage rates in the 50 states having already already gotten all the predictive leverage out of that variable that we could from the correlation between it and the outcome which is divorce rate, what's the additional value in learning the median ages of marriage in each of the states? And also view and that's what the multiple regression does is it answers that question and does it for both predictor simultaneously. So it also does the other one. So as I put them both on this slide what is the value of knowing marriage rate? Once we already know the median age of marriage and simultaneously what is the value of knowing median age of marriage once we know the marriage rate? It doesn't both. And the parameter estimates in these models answer those questions. There are partials there are the partial influences there are marginal value of knowing of learning this thing once you know the other thing and on Thursday that phrasing, that translation of the meaning of these beta coefficients in these multiple regression models will explain some of the funny behavior they have just to prelude a little bit we're going to look at a model where we predict someone's height with their legs both legs, the right and their left leg which have highly correlated values and now you'll see that the model gives you the right answer to a stupid question but I will ask the stupid question it will be my fault but the model will give exactly the right answer but the answer will seem mystical it'll tell you that neither leg predicts height so I'll let you just think about that in the context of this case which this one will behave really well so at the bottom of this slide I'll show you without priors what these models look like there's probably nothing new here for most of you but what we do is the top line looks the same, the likelihood is still ye olde gaussian likelihood and now in our linear model of the mean mu sub i for state i in this case we have an intercept still there's just one intercept and then there are two terms one for each predictor, there's a coefficient of these slope parameters now we'll call them coefficients because it's not clearly a line there's a beta coefficient for each predictor value and then m sub i here is the marriage rate in state i and a sub i is the median age of marriage are you with me, make sense? okay and then choose priors too I'll put those up in a minute let me take you through the anatomy here a little bit just to make sure there's nothing mystical about it divorce rate in state i is capital D sub i and then our two predictors get their own additive term so the plus here, the fact that they interact only additively you can think of that as they have independent effects I'll say that again the fact that they're just interacting additively means they have independent effects on the outcome and that's what we're trying to estimate later on when we look at interactions look at cases where the effect of say marriage rate depends upon the median age of marriage which could easily be true that it matters in different ways so if it's a bunch of really young people getting married rapidly that could make worse even more common but if instead it's mainly like in San Francisco where if anybody does get married at all they wait until they're in their 40s and then the marriage rate could be high say everybody reaches 40 and they get married the marriage rate would be really high but the median age of marriage would also be high and then divorce rate might be low in some way it turns out not to be the case in these data and then we have these two coefficients and I put slope in quotes now because it's cognitively hard for most people to think about these as slopes because there's not clearly a line now although mathematically there is mathematically there are lots of lines but what we actually have is a plane of predictions and I won't indulge in trying to plot that because I can't see 3D stuff on the screen and so I assume other people can't try to teach me but I can't do it so I won't do that instead I'll go back and forth to call these slopes and coefficients so that you know they're the same sort of thing what's important is not the word you use but to understand the equation because the equation is what's producing the predictions so that's why we're going to focus on that so when we fit these things now I'll add some priors just very quickly on these priors the same general horoscopic kinds of priors in this case you want a broad prior on the intercept because who knows where it'll end up and weekly regularizing priors on the coefficients centered on zero and then the uniform prior on sigma and these will work just fine here I do encourage you when you go through these examples at home to alter the priors and see you're going to have to make them really strong to nudge these effects away very much but it's worth exploring it so and then fitting this exactly like before you just change the mean line add your additional prior for the additional coefficient and it works fine no problem at all does this make sense so far? no new tools this week really just a lot of fluster or plotting you'll get very flustered with your plotting so let's look at the parameter estimates so the first pass on interpreting these models and often in journals all people give you are these tables of coefficients and as I said before I think these are terrible they're really hard to understand but let me run you through the standard interpretation nobody cares where the intercept is so just ignore it that's the usual thing sometimes you'll see this crazy like no hypothesis test on the intercept it's different from zero it's like what is going on there that's just software being ridiculous we don't care but the interpretation would be you read the posterior means of the coefficients as an indicator of the effect of people will say of that predictor on the outcome it's the answer it's the cryptic golem answer to the question what's the value of knowing this predictor once you know the others so in this case BM that's for merit rate it's slightly negative but notice the standard deviation is twice the size of the distance from zero and as a consequence the the percentile interval has a lot of probability on both sides of zero so this is like well if it has an effect it's probably pretty small but it's highly uncertain there's nothing really strong to say about merit rate does that make sense how you can get that from that we're going to look at this graphically on the next slide it'll be much easier to see what's going on so I'll run you through the awkwardness of just a table of coefficients so you don't do it to your readers and then the BA for median agent marriage it's also negative but much more so by coincidence the same standard deviation and so it is reliably almost all the posterior probability is below zero so this model is quite confident that conditional on this model and these data your machine is telling you if you want to predict divorce rate knowing median agent marriage is really useful knowing knowing marriage rate not so useful if it is probably only a little bit might be positive might be negative does that make sense how you can get that from these tables and again most people ignore sigma too so what you can do is plot these all I'm doing here is plotting marginal posterior distributions if you just plot the preci you'll make this dot chart for you and this is the kind of thing when you get good with your R skills you can make yourself and make it prettier I make my graphs ugly just to inspire you to do better and it's not because I'm lazy no it's to inspire you and it might be a little about but I like these Spartan graphs so all it's doing is taking the means and the percentile intervals and putting them on this common estimate axis usually we focus on zero for the coefficients and you see coefficient for marriage rate the mean is close to zero and there's a lot of probability on both sides but it's not very big of a factor it does matter and then for median agent marriage very reliably negative that means what is negative you're already trying to do the gymnastics in your head increases that decreases the outcome but we're going to do a lot of plotting today so you don't have to do that kind of gymnastics when we go through does this make sense so far? alright so let's review what we know so far once we know median agent marriage there's little additional value in knowing marriage rate there might be some it's uncertain once we know marriage rate there's still value in knowing median agent marriage if we here's the thing I want to emphasize if you don't know the median agent marriage there's definitely value in knowing the marriage rate because it's correlated with the median agent marriage that's how this Holsbury thing arises is that things are correlated states where people get married young so the median agent marriage is low also have higher marriage rates higher marriage rates are not actually driving the variation I'm going to try to show you why how the model figures that out in a moment question was are these betas standardized or they have the units of the predictor variables it's the latter in this case these are not standardized coefficients standardized coefficients don't come out of the model unless you standardize the predictors first in this case I did standardize them so they're in zscore units but that's the units that they were in that's a good question I'm not a big fan of standardized predictors because I care about the outcome scale but other people are economists really love them and economists don't listen to me so I won't bother to give them advice but alright so it's a rate so these beta coefficients have units there's a box in the book about this I think in chapter 4 I think it was about units if you have a physical science training you're used to carrying units through it's a great way to check your math actually check your algebra on them so in this case these beta coefficients will be units of the outcome per unit of the predictor right so it makes them what's their units? it's a rate right it's something per something and the denominator here is zscore and the top part is the outcome but I think Katrina was asking about the denominator part at least that's the way I'll read it think about it but yeah good question if those of you are confused by this there was a box in chapter 4 about this it helps you if it doesn't don't worry about it you don't necessarily need it but if like you come out of a physics or chemistry background you're used to carrying units through all of your work and sometimes people do it in biology but usually not right units just evaporate in biology for some reason and the social sciences don't ask us to categorize anything it's just like numbers but but it can be useful for checking your equations sometimes figuring out how things work there are a lot of models or parameters some of the parameters are probabilities and those are unit lists so you can confuse yourself sometimes in these things okay right so the tough part of the work this week is dealing with all the different ways you can plot these models to help understand them and do posterior predictive checks so I'm going to show you a number of different ways and this is kind of horoscopic advice again in the sense that in the context of your own work they'll probably be a way which is most useful given your purpose and the things you're interested in so that's why I want to give you some experience with different forms of plotting the predictions these models to COA even with a model this simple it's just about the simplest multiple regression you can have there's only three predictors three variables one outcome and two predictors there's still a really large number of useful ways to plot the implications of the model so I'm going to show you several all the code for producing the graphs there'll be a couple points where I actually step pause and talk about the code a little bit but mainly not because we're just doing the stuff we did before link and sim or doing it yourself with your own custom link function it's the same tools as we did before the models are just a slightly bit longer right the only trick and I'll emphasize this when I get to it is for a particular kind of plot it's number two in this list the counterfactual plots those are cases where we hold the other some set of predictors constant while we vary one in order to get counterfactual predictions about that thing so I'll come back to that when we get there and show you how to do it in code so just bear with me otherwise it's just drawing, calculating stuff and drawing it and the code is in the book to do it but by all means if you have questions bring them in and start next time and I'd be happy to spend some time on it okay so I'm going to go through three useful types and show you some multiple examples of each first is actually just an excuse to try to explain to you how the model does what looks like magic but isn't by showing you something called predictor residual plots so how does this model figure out that marriage rate is not actually correlated and I gave you a hint before marriage rate and median age of marriage are correlated with one another states that where people get married young also have a lot of people getting married and so and they're both then one of them median age of marriage is strongly correlated with divorce rate so then how does the model figure that structure out actually the hint is there are what are called partial correlations instead of emphasizing the algebra of that result I want to show you graphically how to make posterior predicted plots which have that partially done in the I hope help you understand what the model is doing somehow then we'll look at counterfactual plots which are mainly useful for understanding the implications of the model they're counterfactual because the predictions you can make can be completely bonkers you can invent a state with any combination of median age of marriage and marriage rate you like to see what the model says that's why it could be bonkers sometimes they must be tied to some extent demography guarantees that the two will be associated you can't perfectly decouple them and yet in our plots we can't because we're gods here and we'll get to posterior prediction checks I'll show you a little garden of varieties of them that you can do with these models that show you different things and number four I won't show you but I want to encourage you to always think about what you want to do as you get more experience and confidence in this business you'll have opinions about this for the kinds of problems you work on okay so for predicted residual plots this is a very standard kind of plot lots of software packages do it automatically for you as usual in this course we're going to do it ourselves the goal of these is to show you the association of each predictor with the outcome having controlled for the other predictors control means answering that question that was the motivation of the multiple regression once we know all the other predictors that's what control means what's the value of learning this one that I'm focusing on that's what control means here I've looked control and quote in scare quotes and it's meant to scare you in the sense that control is a term that comes from experimental design there's no experimental design here I've just harvested a bunch of data from the internet and run a regression on it there's no like random assignment of people to states or marriage ages right that would be the best thing we randomly assign each of you to get married at a certain age and then we could really pull these things apart experimentally that has not been done here but there's this contamination of experimental design language into statistics and if you let it creep into your mind it's like a mind control device it'll convince you to be overconfident in the model there's no real control going on here this is just statistical control it's partially out so if these are the only variables that matter and this model has the right structure then it is telling you how the world works but that's a lot of gifs right so that's an issue and there could also be how your sample arises could matter as well we'll have some examples of that later in the course alright so here's the basic recipe we'll run through it in a lot of detail first you regress each predictor on the other predictors the outcome is not involved in the first step at all just each predictor on the other so this is measuring the association between in this case marriage rate and median age of marriage that's the first step measure that association then you compute something called residuals some of you already know this term this is the variation left over after you've accounted for the association among the predictor variables then you take these residuals and you regress the outcome on those and you predict the outcome using just the variation that's left over once you've taken out all of the association among the predictors this is what the model has already done but we're going to focus in on it and do it step by step this will allow us to make a plot and you'll be able to see the data in a different way I hope so here's the first part of that step one we're going to regress marriage rate on median age of marriage so this is asking how well can we predict marriage rate for a state when we know the median age of marriage in that state I've already hinted that they're associated quite strongly with the reasons of demography and they're more young people alive right so I mean if we assigned everybody you can only get married, you have to live to 80 to get married I assert that the marriage rate will go down just because not everybody will live that long I'm sorry to say especially anthropologists so I'm sorry anthropologists looked over at you guys the same time I said that that was macabre so the field work is dangerous things lay eggs in your body and all kinds of stuff happens in the field so we're going to regress marriage rate on median age of marriage no surprises about this model are there any questions about that model make sense the action again is the mu line this is a simple by very regression there's only one predictor outcome is marriage marriage rate m sub i and predictor is median age of marriage a sub i the code is at the bottom this is what the relationship looks like if you the solid line there is the map regression line comes out of this model plotting marriage rate against median age of marriage strong relationship now the next step is we compute residuals which is the distance of each outcome value from the expectation where the expectation is taken from the map regression line you can get the posterior uncertainty into this too I'm not going to do this as an example because I want this to be cognitively transparent you can certainly do that you can calibrate the uncertainty about this as well just by running over all the samples from the posterior but we'll just focus on the map line because it will make the lesson easier to get in this case so here I'm just pulling out there's a mu assignment line here I'm using the function cof to pull the map values out of the fit model and then I'm using the subsetting to pull them out by name this is a trick that can save you some grief sometimes but that is just the linear model there that I put in so for each state its median age of marriage goes into that line so we get 50 predictions because there's 50 states in the dataset and then we compute residuals by just subtracting that expectation now there's 50 expectations in the symbol mu for each of those we subtract that from the actual observed marriage rate in each state and that gives us the error sometimes people say it's the error of the model so you can be careful with the word error because it can sound bad it's just a left over map I like residual because it's like a left over thing does this make sense so far but that's what residual is and it's that easy for the can model fitting routines in R like LM there's a command I think it's the calculation of residuals I haven't used it in a long time but I think it's resided so let me show you what those residuals look like there are the line segments on this graph so each of those blue points is an observed combination of marriage rate median age of marriage across states the solid black line that's sloping down towards the right is the map regression line it's the mean posterior alpha and beta for all the possible lines that have been ranked in the posterior distribution remember Bayesian Golem is saying okay I'm going to consider all the possible lines and connect these and I'll give you a posterior ranking of all of them and this is just the one that's at the mean and then we draw these vertical lines those are the lines that we got by doing the subtraction on the previous the distance of those lines how long they are are the residuals so it's how far the actual outcome is from the expected central prediction of the model make sense so let's focus in for a second and see what value this is zoom in on that little part that took me 30 minutes to do that animation by the way so it's my technical skills zoom in on that here we've got a little cluster of states some of which are below the line and above the line so how can you interpret this for these that are below the line the observed marriage rate is less than what was expected by the model right so these states you can think of they have a low rate of marriage for their age of marriage I'll say that again because this is if you're paying attention this will be confusing so if you're not confused please pay attention these states have a low rate of marriage for their age of marriage according to the model that makes sense people are getting are getting married slow here for the age at which they get married across the whole sample in contrast the ones states above the line these have a marriage rate that exceeds what the model expects on average and these states you have a high rate of marriage for the age of marriage people are getting married fast relative to the age of which they get married yeah you with me so this accounts then this regression is accounting for the association between these and the residuals are what's left over that's in a sense unexplained by the model and we're going to look at the if there's anything left over any additional value of learning of the variation that's left over here we should be able to see that in any correlation that remains between the predictor and these residuals and that's what we're going to do next we asked step 3 how is divorce associated with residual marriage rate so it's the variation on a per state basis the residual that was left over in each state in the marriage rate is that associated with divorce so we've taken out the effect of median age of marriage now we've got still some variation in marriage rate we're going to ask is there any association between that remaining residual error and the observed divorce rates and this addresses the question directly that we started with multiple regression once I know all the other predictors is there any value in knowing this one so we've taken out the associated with the other predictors and we've only got some additional residual variation now we want to see if that's associated with the outcome so graphically let me show you how this looks this is the residual plot we had before this is still marriage rate on the vertical against median age of marriage on the bottom we took out the association between these two so then we get some residual what's left over in marriage rate that hasn't been explained by median age of marriage speed of marriage that was explained by age of marriage it's one way to think about this and above the line our states that are getting married faster than expected below the line slower than expected when we plot now in the graph I've just added up here on the right hand side of this slide the vertical axis is now the outcome variable of interest divorce rate the scale has changed and the horizontal axis is the vertical axis from the plot on the left it's the residuals, the marriage rate residuals according to the expectation of the model we ran so now there's this dividing line that was at the meet at zero residual so this vertical dashed line in the graph on the right that's the map regression line in a sense between the other two variables haven't been taken out and then states to the right of that are ones where people get married faster than expected for their age and marriage and to the left of it are states where people get married slower than expected for their age of marriage and then we just run an ordinary regression between these predicting divorce using the residuals the code's in the book to do that but I think you can probably imagine what it looks like and I probably don't have to persuade you too hard that there's not much action here there's nothing the points on the right hand side of the vertical dashed line are basically in the same range of divorce rates as the ones on the left there's no balance on the two halves so there's no apparent linear relationship between these two things it's the number of people getting married per capita so it's a rate and that's why I need my faster or slower more or less does that answer your question? it is confusing I've tortured myself over how to word this and it's tough I still don't understand can you just give an example to read down there that means that there are more old people getting married than we expected okay so you mean like over here? yeah in the left hand side graph this state that has this big residual up here it's got some high median age of marriage so people are getting married old I wish I knew which state that was we'll plot some labels up later and then we'll see in this state people tend to the rate of marriage, the number of marriages per capita exceeds the model expectation by a lot because on average states with people getting married old don't have that much marriage so this state has a lot of marriages people are getting married at a fast rate relative to the age of marriage they either work fast and not healthy works for me to just think about rates and be fast and slow there's lots of marriages per capita despite the fact that people are getting married old could be higher or lower, I think fast and slow yeah maybe this is just me being autistic you know, you should learn to expect from me occasionally but did that help? did it make sense? so maybe this is like Vegas people go there for their second marriages so is that still largely true? Florida, yeah so Florida it's a great thing, yeah Florida Southern Florida, it might be like this as well the average age of marriage is pretty high but there's still a lot of it in most places where the average age of marriage is high people don't get married as much does that help? so alright, other questions about this? yeah so this was the joke that's right, so the question was this shows there's not much effect, that's right and in fact the slope of that black line in the right hand graph is the slope we got from the multiple regression okay because this is invisibly how the multiple regression works internally this is rescaled in a sense now if instead the residuals were to is it just that it would be a tight correlation or does it matter left or right hand side? I'm going to show you that next so exactly right, yeah the question was if it were the other predictor what would it look like we're going to do that so it's a great question, yeah this is like talking to an oracle with Delphi right, it's like the oracle sees it but can't talk to you just gives you some prophecy like yeah, none of them in the moons in the house of whatever, if you kill a goat then you can be king so it's like the advice here so I'm willing to spend a lot of time I was a classics minor in college some of you know that so it's like Greek nonsense will spew out of me at any moment sometimes this stuff is hard so I'm willing to spend a lot of time just going through this the hard part is not the mathematics the hard part is the logic of the world and then your criticism of the model and it's hard to figure out in the first place just what the model thinks because it doesn't speak English it speaks probability distributions and that's all it speaks so it's a little hard to work through it so I'm willing to spend time and you've got to be patient with yourself it does take time to kind of get adapted to this and you really learn this stuff with your own problems it's your domain knowledge about what the variables mean and that's when it can make some sense to you you know things about the data that grounds your interpretations with my examples if you're familiar with the nature of the data you'll be doing better than your classmates and I just ask you guys to well, first of all, bug me about it and I'll try my best but also help one another about it I tried to vary the examples in the course so that everybody's uncomfortable sometimes just to go through it but I'm very sympathetic to the fact that this is weird and it's your domain knowledge of the empirical matter of it and the nature of the measurements and where the samples came from that resolves a lot of the ambiguity to model interpretation when it's your own work so it will seem easier when you do your own stuff it really will, this happens over and over again so this is a little bit more terrifying and weird than it will seem when you have your own data in hand okay, let's do the other case how is divorce associated with residual marriage rate just to summarize not much at all there's a very slight negative correlation but you see the bow tie uncertainty is broad that negative correlation, that's the map line it's minus 0.18 which is what we got in the multiple regression, it's the same slope even though it's all been scaled differently the residuals are on a different scale than the original predictor but we get the same inference because internal to the multiple regression this is what it's doing it states with high, low rates of marriage forage of marriage do not on average have high, low divorce rates so I'm not going to run through the calculation for doing it the other way but I'll show you on the next slide graphically how it's implied all the codes in the book though it's the same idea we're just going to flip the predictors around so let's take a look at that and I'll run you through the summary real quick and we'll get to it so on this, on the left the left hand column has the analysis we just did in the upper left we got some animation here that makes it like MIMP up yes there we go so that is the first regression we did where we predict marriage rate using median agent marriage and then we computed those residuals which are little line segments and then in the bottom left we then predict divorce rate using those residuals and we see that there's not a lot going on it's highly uncertain but if there is something going on that's not too super important that's what we learned to do the same thing for median agent marriage we just flip these axes so now on the upper right it's just like the plot on the upper left but it's rotated so now we're predicting median agent marriage using marriage rate this gives us a different coefficient although it's in a sense the same and we get a different set of residuals now there's error left over in median agent marriage after having accounted for marriage rate did I say that again there's a lot of marriage going on here it's the residual error left over in median agent marriage after having accounted for the association between median agent marriage and marriage rate so they're up in that graph in the upper right there are some states where people are above the map line where people get married old four states with that rate of marriage so you think about this is what's weird about these models in a sense is every state has a different marriage rate for the most part there might be a couple that are tied and every state has a different median agent marriage for the most part and yet what the model is doing through the regression line is it's constructing an imaginary set of states with the same marriage rate and then it's saying how much variation is left over in their median agent marriage and is that what's left over associated with the outcome and that's what it's doing I can say that again it imagines it constructs from your model it's not magic, it doesn't know actually in the world but it says okay, you think it's a line that relates these things together okay, your funeral and assuming it's a line then I can imagine a mythological Georgia and a mythological Alabama that have the same marriage rate the model imagines manipulating them so that it makes the people in those states get married at the same rate and then if there's any difference in their median ages of marriage are those differences associated with divorce rates still after you've accounted for that that's what the model imagines it does these little internal experiments based upon the assumptions you put in from the model the linear associations so there are a bunch of states above that line where people get married old for the rate of marriage in that state then there are a bunch of states below the line where people get married young so this is taking out the association between those residuals and then in the final plot in the lower right we predict divorce rate with the age of marriage residuals and so again that vertical dash line sort of where the map line was before that's any state that's right on that vertical line was exactly on the map line and there are some that are pretty close some states get really close to the expectation and there's almost no error left over there are lots of states that are on both the left and the right in those states where people get married younger than expected they tend to get divorced more there's more higher points over there higher divorce rates on the left-hand side of the line and in states where people get married old for the rate of marriage there tends to be lower divorce rates in those states so when you compute the regression line it's reliably negative the slope of that line is about minus one and in the event that there were parameters to like throw in you take the residual of these bottom values and then plot the residuals for that against another value and then you take the residuals for that and plot it against another value maybe I'm not sure I understood your question you you do them all these regressions at top would be multiple regressions you have all of the ones all of the other predictors you want to take out you put them all in that model at the same time and you get residuals for one particular predictor and then you'd have 5 area regressions at the bottom and you could do the visualization that way was that what you were asking? I'm not sure I told you the idea of doing this example with 3D predictors and I thought now this is hard to have this it is but that's how it would be when you reach a problem like that and you have that question just bug me about it and that would give me my opinion the question was when you get a result for some predictor is stronger if you always need to do this no this is just a way to visualize it this isn't the same thing as a posterior prediction check I'm going to do those in a bit just to check that the model actually fit correctly I think you want to do that you need to generate predictions for each state using all the predictors simultaneously to do the posterior prediction check so we'll look at that as a separate but it's a good question this is an interpretational thing it's often really useful you'll see this in papers a lot there will be some residual axis they're showing you the association in a multiple regression context between the outcome and one of the predictors particularly the one that supports their theory and it will be on some mystical scale this is that scale it's the residual scale having partial out to others most software packages will do this for you they magically appear and there's something they just want and they appear this is how it's done so now you can cook them up yourself and do it so here's my little sermon about statistical control again always remember linear multiple regression answers the question how is each predictor associated with the outcome once we know all the other predictors on Thursday we'll have an example of how this explains even confusing behavior of these models sometimes you do have to be careful this is still conditional on the model it could be really a doopy thing model gives you the answer in the form of a posterior distribution to the question you ask and the question is embodied in the model and so you do have to do model criticism always you can't get cocky about this for example is this really linear I doubt there's a linear relationship between these things and in fact in this case one of the things I like about this examples it's really easy to do model criticism in the sense that look we know something about if you will the physics of this there's no demography involved why not start with demographic the demographers equation the stock of marriages you do a stock and flow model those of you who've done some demography or done population biology training you know what I'm talking about the stock of marriages at any point in time in a state is well it's the number of new marriages minus the number of divorces and deaths so you've got basic demographic equations which define these variables not the mystical linear regression so we can do a lot better really quickly here the same is true very often in biology if you're trying to study population dynamics well then use the equations for population dynamics linear regressions you can get away with it sometimes but it's very easy to do better because we know the physics if you will of the system in those cases so I like this case because it's one of these rare cases in the social sciences where demography tells us how to structure the equations if we care to do it that better way in this data set since it's all aggregated averages it would be hard to do but you could get the finer data and do a better job with it and that's how people actually study these things at least I hope I'm really not sure about that so yeah I have this point on here don't get cocky the marriage rate may still be associated with divorce for some of the states so this model is just saying on average it's not allowing different regions to have different relationships among these things and so you could get misled but it may be that these variables behave in different ways in different places in fact when I show you the posterior predictive checks I think you might be convinced like me that they probably do and still we want to make causal inferences but these are average data what we really want are individuals and the marriage is at marriage and the half lives of the marriages so to speak we'd like to know the half lives of a marriage for every age that people get married at given the marriage rates in particular states we don't have that data in fact I don't think that data is available at all you'd have to put it together with a lot of footwork going to courthouses and things copying tickets and writing them down things like that counterfactual plots the point of the previous plotting well really it was an excuse to help you understand how much progression works and I hope you did that a little bit you'll really get it later we still have this issue of interpreting the model in the sense of what does the model say is the impact of changing a predictor value on the outcome and if you're really good you can read that off of coefficients but you don't have to be good you can just make pictures and get it that way and also do calculations so I call these counterfactual plots because we're going to do the impossible in generating these predictions we're going to hold all the other predictors constant at some value we choose and then we're going to vary the others and again I think for these predictors that's not possible in the real world because if you change the age in which people get married you change the rate of marriage just because of demography it may not change it a lot but it's got to change it some and in the real world that's often true you don't have magical control over all the predictors even if you could do the experiment in the world, in the small world of the model we can do the impossible and we can hold marriage rate for example constant and then we can change median age of marriage and see what the model says would happen and these are called counterfactual plots and I'll show you two examples here so we can compute I'll show you on the next slide we can compute for some change in marriage rate without changing the median age of marriage what's the impact on divorce that's the plot in the upper left so what I've done here at the top of this graph you see I've labeled median age of marriage equals zero zero means the average because it's a standardized predictor so we set it's like we're saying for some state with an average median age of marriage if we manipulated its marriage rate and that's what's being manipulated on the horizontal axis of this graph what does the model say would happen and again the black line is the map prediction let's go down a little bit but not a lot the bow tie there is the confidence interval of the map line and then the lighter gray area is the prediction interval for the actual observations that sigma is used in so you might think the dark bow tie is link and the slightly bowed part is sim from last week the functions you're fighting with in your homework so it makes sense so far again what's the value it lets you see that the model says ignoring controlling for if you could experimentally manipulate states this is what you'd like to do this is the data you'd like to observe which you can't holding median age of marriage constant adjusting marriage isn't expected to have any marriage rate isn't expected to have hardly any effect on the force rate according to these data in this model in contrast in the upper right we adjust median age of marriage without changing the marriage rate we set marriage rate to its mean and then we adjust median age of marriage from minus 2 up to 3 that's in Z scores standard deviations we see a big change expected in the force rate that makes sense you already knew this is what I was going to say this is the clear indication of what the model thinks according to these data clear question here is what do we set the unobserved predictors to in this case on the left median age of marriage is unobserved at the top we set it to 0 to its mean and then on the right we set marriage rate to 0 to its mean for linear regression it really doesn't matter because they don't interact at all right it'll just change the absolute level what you set the other two you might as well set them to their mean because it really doesn't matter it won't change the shape of the line at all that will not be true when we get to generalize linear models when we get to generalize linear models all predictors interact always and it will matter what you set things to and then we'll revisit this issue and we'll have anxiety about it and that anxiety will last you your whole life well no your domain knowledge will help you but it'll also explain to you why everything interacts when we get there this is just a friendly promise for now but right now it really doesn't matter it'll just change the units on your vertical axis but everything will look the same once we get into the fun and constrained spaces of generalizing your models which is most of the data you will have in your career probably we'll need to generalize linear models so you're welcome does this make sense the code for making these is in the book it's very much like the other things there's going to be an example coming up in the posterior predictions where I show you how to set some of these equal to fixed values you can imagine how to do it though it's coming up in a little bit okay we still need to do posterior predictive checks because if your software is like mine actually your software is mine but if your software is like mine even when the software works correctly you don't always and so things may not go right and sometimes it's the software's fault and sometimes randomly just doing it over again it works there's a sunspot or something it makes your computer mess up so we always need to check our work for posterior predictive checks or you might think of them as retro-addictive checks but the other function is to stimulate imagination and think about model deficiencies for which states in this case there's a model doing a particularly bad job at that gives us hints about what's missing in understanding these phenomena and that could lead to new rounds of data collection and modeling and that makes it progressive makes it a productive exercise part of the design loop that I talked about in the first week so in this when we do posterior predictive checks we want to use all the uncertainty in the posterior so we can see sometimes the model is saying I can't predict anything there's just this big flat field of equally provable stuff and I don't know what's going on if you ignore the uncertainty in the posterior you'll miss the fact that the model actually doesn't expect anything I used this example last week when you have the right model about primary sex ratio in humans that is that you can't predict it you expect a half but for any particular birth it's like maximally uncertain it's like maximum entropy is when it's at a half so you can't it's hard very hard to predict human primary sex ratio in human populations and a model that gets the right answer will know that posterior distributions can also be highly uncertain and you only see that uncertainty when you use the whole distribution to generate predictions so we're going to still carry that strategy forward and again it sounds hard maybe but it's easy to do is just use samples and then you're doing integral calculus really fancy integral calculus without knowing it but that's all it is integrals are just sums take a bunch of samples sum over them you're doing integral calculus we just need enough to approximate the area so the first thing to note is that the rethinking package provides this unglamorous well let's just say it provides an ugly default kind of plot for fit models if you use post check post check for a fit map or map to stand model in every case in the data arrays them on a horizontal axis and then it computes the model expectations for the outcome on the vertical so the little blue points are the observed they're the raw data for each state each case on the horizontal axis is a state from 1 to 50 the open circles are the map expectations and the little vertical line segments there are the confidence intervals of the expected value of mu so the open circles where mu is for that state using its predictor values the little line segment then is the confidence interval on that posterior interval percentile interval on mu and then the little pluses show you the sim using sigma to get the actual prediction envelope for where it is and so if the model is doing a good job it'll well as I'll show you next week we could always get those blue dots right on the open circles if we had enough parameters right that parameter for every state I could bullseye it every time we'll do that next week it'll get close for some states it gets really close like case 9 the blue dot is basically right on the expectation that's a well behaved state right on the regression line these predictions take into account all the predictors in this case there's only two of them but if you had 100 it would use them all and the model sees the data sees the prediction this is what it was fitting on notice for other states it does a pretty bad job but sometimes it's even outside of the plus envelope right of the predictions there so you can see some of the cases that said these plots are pretty ugly it's just the default post check doesn't know anything about your purpose you can do better and so let me show you a few that are a little bit better that are alternatives in this case that is in the book so if you're interested in these you can step here figure it out yeah quick question on the post check would it be a lot prettier on the graph if you just sorted the cases by the divorce rate would that make it easier to see the pattern of them all yeah so the question was what if you sort by the divorce rate we're going to do that in my couple slides but I agree completely you can always do better than post check post check it's like I made it ugly and I refuse to make it prettier because I want to motivate you to do better so they say my graphs are ugly I'm gross and that's no accident it's not only because I'm lazy it's because I'm also spurring you to draw your own fancy stuff using what you know about the nature of your data you can make it better absolutely you can so I don't have any kind of like tough de-esque ideology about making my graphs ugly like it's some like Spartan you know Edward Tufty is actually a hero I think he's a really smart person but he's got this Spartan attitude towards ink like every drop of ink is precious and I don't have that ideology I think I think graphs have lots of functions and that depends upon your audience but I will in this context for your own analysis usually your knowledge of the variables helps you make a more effective graph so exactly in this case you want to sort them makes you see the pattern so here's a case where all I've done is it's the same calculations for the most part they were on the previous one although we're just looking at the Muse and the confidence interval of the Muse at each state for the moment on the horizontal is the actual observation, the divorce rate that was really observed in each state but on the vertical is the model based prediction so the vertical has the uncertainty that's where those bars come from that's the posterior uncertainty for each state notice that it varies why? because some states are close to the middle of the regression line and some of them are really out on the end so some states will be less confident predictions than others there's a consequence of that you can really see that here and then this diagonal dashed line shows where they're the same that's the equality line so any state that's right on that is the model's doing a perfect job could still be a wrong model one second I'll get your question and then for states above their predicted divorce rates are too high they had the model over predicted for those states so another way to say that is for the states above the line they had lower divorce rates than the model thought notice that Idaho and Utah are up there does anyone know something about Idaho and Utah? other than they grow potatoes Mormons, that's right and so these are anomalous states because they have an anomalous population anomalous is the wrong word I have some dear friends who are Mormon church in a lot of ways anthropologists have a complicated relationship but it's mainly a good one there are different populations the difference is the norms the community is much more involved actually I think that's the main thing the model does a really bad job with them so this is the case where you say the posterior check helps you notice cases and you realize something of course this model does a bad job with those states because I didn't take account of the fact that they're different community structures question over here yeah this one has really big ones it still depends upon its particular combination so there's this plane with the two predictors in it and it's where you're dislocated on that plane from the center of gravity of that plane so there could be combinations of meeting age and marriage and marriage rate which get you on the line but are at extreme values so that point so we'd have to pull it apart this is a collapsing of those two dimensions but it's a good question I don't have a visualization I think it shows that very well but it's a great question so yeah it depends you can end up on the regression line because you have a particular combination of the two predictors or because you have average values of the predictors there's an infinite number of combinations of the two predictors that will get you on that line and they'll have different uncertainties depending on where they are that helps I hadn't thought to do that visualization that's a good question the course gets marginally better every year I think as a consequence of questions like this though so I do appreciate them okay anything else about this no alright I'm running out of time but I'm right on schedule this is beautiful so here's the sorting this is what Cody wanted I think here's a more rational way to look at it I only apologize this plot's very compressed probably wouldn't be better to do it in two columns but I wanted to show you the sorting show you this nice S-curve that you get out of it here are computing residuals so again you compute a prediction for each state using both predictors and then look at the difference between the observed and the predicted and that's a residual on the outcome scale these are the model residuals the total model residuals so the zero line on this graph so we say each row on this plot is a state you can probably tell from the state of abbreviations and there's a zero line and a thin hairline going up where a state is right on the model based expectation any state that's right on there is it to be averaged according to the regression relationship ones with negative residuals have less divorce than the model expected right so then there's Idaho down here is our far outlier Idaho has very little divorce I think Idaho has the highest proportion of LDS of any state even then Utah now because it used to be almost no one living in Idaho except for like buns that ate potatoes was the main animal population of Idaho I think was like potato weasels or something like that apologize to the people apologies to anyone from Idaho it's a beautiful place actually just don't go there in the winter but it's a beautiful place so no I think it's true down so Utah is down here too down in the bottom and then there are states like Maine bless their hearts where it's a very positive result of divorce than expected in Maine it has a low divorce rate in all of them but there's a lot of divorce Maine is a very unusual state in lots of ways though my political scientist colleagues talking to me about Maine is the case that no model predicts Maine it's really really odd they have a lot of progressive ideals mixed in with conservative ideals and they don't look like the national kind of partisan landscape their local politics are different but they're at the other they anchor the other extreme of bad model performance here model is bad job for picking their divorce rate as well so and now you can see sometimes you get different amounts of confidence depending upon the combinations so you can go through and then audit those and see I think that North Carolina will have an extreme value one of those predictors so it will put it out in the plane somewhere marginally on the planet Washington DC is just average except for homicide this is the at the end we compute a prediction for each state an expected divorce rate for each state out of the model say the map prediction and then subtract that from the observed divorce rate and that's all I'm plotting on the horizontal axis here and again the code for this is in there and when you play with the code you'll see what's going on and again this is not necessarily the right graph but it does show you something it helps you do a quick audit where it's doing well and where it's doing worse and the goal isn't to get all these right on the line because next week I'll show you how to do that and that will be like madness that's a very bad thing to do but personally to say it's is it fitting right in this case it is and then you it inspires the imagination by noting for example the LDS population is a clear thing so on the last slide um it's probably somehow different here too the higher the actual divorce rate the more likely the model is to underpredict divorce and vice versa is that something that well we were just fine with it or are you looking for a model that sort of circles are randomly distributed around the line so I think the question was let me try to translate you tell me if I got it right so what we observe here is that this cloud of points is kind of like lazy if it was really clinging to the diagonal it'd be right up here apologies to those of you watching at home you can't see my articulations instead it's kind of droopy on the right it droops below the line and then it rises above the line on the left that's a typical thing that regression models do that's the regression to the mean phenomenon you don't actually want a model that exactly replicates your data because then you can't learn anything from your data so this will happen a lot with regression and it's a symptom again of the regression phenomenon if you had infinite data in an infinite amount of complexity you'd get all those points exactly on the line but we don't want to do that so this is fine there's one way to think about it but still the posterior predictive checks if you prefer help you figure out for the ones where the model is doing a particularly bad job is there something obvious about them that's why I've highlighted Idaho and Utah I think systematically if I knew more about the demography of marriage maybe there'd be something obvious about the states on the far right but this often nearly always happens with regressions especially if we do some regularization which we'll talk about next week it's a very important thing to do I'm right on schedule so let me show you though you can take the residuals that are in that big column of posterior predictive checks there and you can regress waffle houses per capita on that there's still a certain association so even after all this work I've taken out both of the marriage variables it's still true that waffle houses stick to the course it's just not very much it's not a huge effect you can see that the 95% confidence envelope there is hidden horizontal but it's still associated so I still don't believe it I still don't believe it so there must be something else lurking in here don't sure what's going on we're going to come back to these data in the last week and do something which I think actually totally removes this there's error in measurement on these variables and it varies a lot across the sample in small states there's a lot of uncertainty about the rates in large states there's very little uncertainty about the rates and population size is associated with the presence of waffle houses and I think that's actually what's going on here we'll come back to that and I'm going to teach you how to put measurement error into your models in the last week to do measurement error on all these variables because we don't actually sample everybody these national statistics are done by sampling because that's how bureaucracies work this might be a great place to stop yes it probably is instead of I'll give you your four minutes free and instead of starting into mass association when you come back we'll pick up with this slide and we'll deal with the other case so we just built this various correlation a case where there's some predictor that seems associated with the outcome but really isn't rather it's associated but not once we account for something else which is really driving the cause of the outcome there are other cases where multiple things are all causing the outcome but they have their effects are in different directions so they antagonize one another if you don't account for both of them in the model you end up concluding that neither of them is associated with the outcome and they both are I'm going to call that masking and I'll walk you through that when you come back on Thursday