 Welcome to the fourth lecture of Cystical Rethinking 2023. We're going to continue on full steam ahead with the theme from the previous lecture where we started learning about linear regression as a way to estimate the associations we need to address the scientific estimates we have. Just to remind you a little bit, I introduced the geocentric model as a metaphor for linear regression in the sense that it is fantastically wrong in its structure. It doesn't represent the solar system at all, but it's fantastically successful and accurate in doing its job, which is to predict where Mars is in the night sky. This strategy of using epicycles to produce high-quality approximations of natural phenomena can be extended arbitrarily. So here's a nice video from 3Blue1Brown where the same strategy of using epicycles can be used to draw any kind of cyclical path, any kind of orbit, no matter how weird and wiggly. If you have enough little circles, embedded on enough little circles, you can draw anything. And as such, linear models, linear regressions can be used to estimate lots of complicated processes, even if they're not linear. They're scaffolds, statistical scaffolds, and we use them to get at scientific estimates, but we need to design them with the scientific model, background model, in mind that is external to the linear regression. And this is because, just to say it again, the linear regression can approximate anything. And so we need to design it with that knowledge in mind that it will accommodate any kind of epicycle we ask it to. The new element in this lecture is that we will now, for a single generative model, have multiple estimates. And this will imply multiple estimators, which means multiple statistical models that are structurally different. And their structural differences are justified by the estimate in each case, reflected through the generative model. I'll show you how this goes. The second thing that will be new is quite often, the estimate we want is not something that's going to show up in a summary table. And that's because it will depend upon multiple unknowns in the posterior distribution, or upon making some particular assumptions about the population as well. And so when we develop our estimates, in most cases we need to do some post-processing of the posterior distribution. The new statistical tools in this lecture are categories, categorical variables, and curves. Linear models can handle both, even though neither of these things is linear. Categories are discrete. Categories and usually they show up in data tables as indicator or index variables. And there are lots of ways to draw curves with linear models. I'm going to talk about splines. But there are many, many other kinds of other additive structures, which are fundamentally built up of little lines like the scaffold on the right of the slide. But they can make nonlinear structures. And we really need tools like this because they give us the conditioning, the stratification that we need to get at the estimates we desire. Okay, categories. So there are lots of things in the natural world which are not continuous. They are discrete unordered types, like the C-shells on the right. Each of those types of shell is different. There's not a continuum in them. And none of them is more shell than the others. They're all shells. Individual people and nations are also examples of categories that may be more familiar to you. And we will want to stratify by categories in our data because we will need to get at the proper estimates we'd like. And in the context of a linear model, this usually means when we stratify by a category that we fit a separate regression line for each of the categories. So an example. I think again of the howl, Nancy Howl's height and weight data that we worked with in the previous lecture. If you look at that data frame, you'll see that there's also a column for age and a column for sex. It's labeled male. And this is an indicator variable. It's one if that individual is male, zero if they're female. And I've re-plotted the height and weight data just for the adults on the right of this slide. And I've colored each point by blue if the point is male and red if the point is female. And you'll see that there's difference. How would we develop a statistical model to estimate the influence of sex on height and weight? Well, the first thing we need to do, you won't be surprised, is add it to the DAG. We add it to the DAG first to get some assumptions on paper. And then we're going to develop a generative model that also includes sex. So we want to ask, address the question, how are height and weight sex causally related? And we're going to focus on the causal influence of sex on height and weight. But simultaneously, we know that height influences weight. So we can't ignore height now when asking these questions. So how do we justify the expansion of the DAG and think about it? Let me emphasize this point again. I know I've said it before, but it bears saying probably in every lecture, the causes aren't in the data. You can't just look at the scatter plot even with the color distinction and know which direction the arrows are supposed to go. So think about even in the previous lecture, which direction should the arrow go that relates height to weight? And I say something about this in the book. You have to think about the kinds of interventions you're willing to consider to determine which direction the arrow goes. And so if you prefer, in all of the examples we're doing here, to think about height influencing weight, because for most interventions you can think of on height, they will necessarily influence weight. But there are fewer interventions on weight that would modify height. People can exercise and lose weight, but they don't get shorter. But if people grow taller for some reason, they must get heavier because they're more person and the geometry of the human body requires it. This isn't to say there are no interventions on weight that won't also influence height, it's just we're not considering those. And part of this is that we're confining ourselves to adults. Likewise, we're thinking about the causal influence of biological sex on height or weight, which direction should the arrow go. Because of human biology, it goes from sex to height, right? Height doesn't influence sex, at least not in humans. In some fish it does, but not in humans. And so this gives us a directionality to the arrows and the same logic for weight. And so we'll consider arrows going from sex to height and sex to weight. But already this makes a graph which has lots more connections in it. So in these little triangles, this is one of the most common structures in causal modeling. It's called a mediation graph. And what you want to think about this as a graph is saying is that it says that height influences weight. And this is what we had from the previous lecture. And that sex influences both height and weight at the same time. And therefore weight is influenced by both height and sex. But those influences, the influence of height is direct, but the influence of sex is both direct and indirect. There are two ways that sex causally influences weight. And there's this path over the top that passes through H. And that is a different kind of influence. It's still an influence, but it's a different kind of influence than the direct influence. So you know that one of the things we do when we translate these DAGs into generative models is you read them as implied functions, that height is a function of sex and weight is a function of both height and sex. So you can see, just sort of read off once you translate it into this function notation, you can read off what's going on in terms of direct causes, but not indirect causes. Indirect causes don't show up in these function declarations. To make a generative model though we're going to write these functions. It might help you understand what I mean by an indirect cause if we do a little animation. So here I've drawn the DAG again. And I'm going to set this in motion in a moment. I just want you to understand what's going to happen. Little particles of causation are going to move from sex to both height and weight. And then height will have its causal influence on W as well. But it'll be colored, its little particles will be in a different color, they'll be in red for height. And what I want you to see is that when the particles arrive at the train station of body weight over there on the upper right, there will be a kind of content, the sex effect will be transmitted in two kinds of bundles. And this is what happens in generative models as well. But this is just a heuristic cartoon way of saying it. So you'll see that that little sex particle arrives at height and it packages the red particle and then which moves on the weight. And so the causal effects that are influencing weight are both direct sex and indirectly through height. In a sense, sex and height cooperate in a way to influence weight. We could draw, as I did in the previous lecture, these unobserved other causes on each of the variables. And every DAG, these things are implied. You don't have to draw them every time, but maybe this will be the last time I draw them. But it's good to remember that they're there. And when you write the generative function, typically these variables will be stochastic because we're imagining there are unmodeled and unobserved and often unmeasurable other influences which generate variation in each of the measured variables. These are the unobserved causes. Luckily, these unobserved causes are ignorable unless they're shared among the measured variables. I want to give you an example of that and what I mean. So not in humans, but humans have really boring biology, especially when it comes to sex determination. Most vertebrates are much more interesting, so fish and reptiles. There are lots of vertebrates that determine sex by temperature, by ambient temperature when they're gestating in the egg. And so for example, turtles and lizards and alligators have temperature-based sex determination. And so for example, for turtles, when it's warmer, more females are hatched and when it's cooler, more males are hatched and the opposite for most types of lizards, interestingly. So if we were going to try to draw a DAG that had temperature as an unmeasured influence on sex, well, temperature also influences other things like body weight directly, right, because it changes the ambient ecology and the availability of food and how much of a struggle it is to gestate and hatch and so on. So in this case, we've got, it's an unmeasured temperature there. The red T on the left is an unmeasured common cause of both sex for turtles and their body weight. And now it's not safe to ignore temperature because it's a confound. And we'll talk more about confounds and what to do with them in future lectures. But I just want to warn you now about this, that when you decide to ignore something, you're making strong assumptions that it's not about how many arrows it's inserting into your graph and where. Okay, so we'll proceed in humans at least temperature, as far as we know, does not influence per sex. And so we can ignore that. And as far as we know, there's no common causes here that are unmodeled. There are almost certainly common causes of heightened weight, like nutrition, that we're not modeling. Maybe we'll take that up in a future lecture. Okay, so we have our generative bottle, and we need to write it. Here's a very simple version, continuing on with the non biological spirit of the previous lecture, just to keep things simple. This is not a course in human physiology. We have a function that simulates heightened weight, and it takes us inputs, a vector of sexes, sex data, where one indicates female and two indicates male. And then we need B, which this is beta, our proportionality of height that is weight from the previous lecture, and a new variable, A, which is going to represent the direct effect of sex. And the way I've set this up, the simulation, you can read the code there. If s equals one, indicating that the synthetic person is female, then on average, height is 150 centimeters. Otherwise, their male on average is 160. And then add to that some Gaussian noise with our norm, the random normal, with a standard deviation of five or a variance of 25 centimeters. And then weight is simulated. This is just like the previous lecture. If you go back and look at the code, it's the same approaches before from height. There's a proportionality using B, but now there are going to be two values of B, that bracket s that's on B there on the line that starts with W on the left. That indicates that for each value of s, there's possibly a different B value. And so men and women can have different proportionalities of weight to height. And then the A thing that's added to it is just a constant change in average body weight that, again, can be different for each value of s. And then all the data is returned in a data frame. This will make a lot more sense if we just run it and you see what the output looks like. So here, make an s vector that's 100 individuals with random, male or female drawn at random, simulate their heights and weights. And as input, I make both a zero so that there's no direct effect at all of sex on body weight in this simulation. And there are slightly different slopes that is slightly different proportionalities point five for women in the simulation and point six for men. And then you get a data frame, which has a column for sex. That was what you input and the column for height and weight. And why are we doing this? This helps us think through what we believe about the system and the relationships among the variables. It helps us develop an estimator in a little bit. And it will help us test that estimator as we go. Okay, so we're going to think scientifically and define our questions. What are the questions we're going to ask of this system? First question we're going to ask is what's the causal effect of height on weight? We already asked that in the previous lecture and we don't need to revisit that in any detailed sense. But it turns out this is still going to be in these models because of the indirect effect of sex through height. New questions we're going to spend most of the time on today. What is the causal effect of sex on weight? And if you think about this question for a second, one thing you realize is that it involves two kinds of subcauses, the direct and the indirect. And I've tried to highlight the paths on the right that are relevant now. All the arrows in this graph are relevant for addressing this particular causal question. And finally, what's the direct causal effect of sex on weight? If we want to partial out just one arrow there, the one I've highlighted on the right, we're going to need a different statistical model than the one we need to address the previous estimate, this one about the causal effect of sex on weight or you might call this the total causal effect of sex on weight. And we're going to develop two models, one for each of those questions. Okay, so let's get started. For both of these, what we need to do is stratify by s, stratify by the sex variable so that we can get a different estimate, a different association between sex and weight and sex and height for each sex in the data set. How do we do this mechanistically in a linear model given that it's composed of lines? Well, I have a preferred way to do it and that's what I'm going to teach you. There are a number of ways to do this because linear models are very flexible and if you're clever with how you code the variables you input that you use as data, you can do this lots of different ways. One of the most common ways, a default for lots of software is to use indicator variables. These are sometimes called dummy variables but that's not very nice to say. Variables didn't do anything to you. Let's call them indicator variables and there's zero one variables that indicate turning on some parameter, some unknown that you add to the model. They said most software, a lot of software, linear modeling software constructs indicator variables for you invisibly in the background when you pass it a factor variable. We're not going to do that here because in this course we build everything ourselves. We don't rest on automation and that's not because I'm against automation. It's because when you're learning automation is poisonous. It prevents you from learning what's actually going on in understanding the model. So we're going to build it all ourselves and when we build it ourselves there's another strategy we want to use and that's strategy of index variables. These are index variables are numbers one, two, three, four, five, six, seven. They're like ID numbers and all they do is they let us access particular parameters in the model. They're addresses if you will, postal addresses. We're going to use index variables because they have a lot of advantages over indicator variables. First of all, if you code up a model using index variables, you can change the number of categories and the code doesn't have to change. You can run it wide away so that the model that you have the same code will work with three categories as with a thousand and that's a huge advantage. With the indicator variable strategy you would need 999 indicator variables for a thousand categories and that's no fun. With index variables you just need one. You just need the index variable. It's better in most cases for specifying priors as well. If you have scientific information you want to put in the prior index variables it's easier and I'll show you why when we get to the example a little bit later. In the second half of the course when we learn multi-level models, multi-level models very naturally use index variables to talk about clusters in the data like locations or batches or things like that. How do they work mechanistically and how are they going to get coded up? Let's think about a toy example. Imagine where our categories are colors, a cyan, magenta, yellow, black, these primary pigments I think these are called and what we do is for each category we just give it an index in this case 1, 2, 3, and 4 and the ordering of this index doesn't mean anything. These are unordered categories and the 1, 2, 3, 4 are labels but they're going they're labels that let us look better positions inside a vector so we're going to have now a set of a vector of parameters. Vector just means a list casually speaking and we're going to have a vector of parameters alpha here which is of links four because there are four different colors and when we want to look up the estimate the the unknown for the corresponds to each color we look it up by its index value which is an address, a location on the on alpha street if it were that cyan lives at address one and magenta lives at address two and yellow lives at address three and black lives at address four and so mechanistically inside the code we use this index to look up the particular alpha so for example here if this was a linear regression we in a linear regression all the action goes on in defining the mean which is usually indicated as mu sub i or i is the i-th observation or the i-th row in your data frame and we have alpha street here and there's a variable called color in your data frame which is the index variable and we end up looking it up that way so let me let me show you how this works in a bit more detailed sense again alpha in these linear models is usually the intercept right it tells you what the average is and in this case it'll be the average for that category so in the context of weight body weight we develop a linear regression to estimate the effect of sex on body weight we're going to have alpha which is our intercept and in the model on your screen all this model will do is estimate the average weight because that's what alpha will be it will be the average weight in the sample but if we subscript alpha by sex which is a column in our data frame which has ones and twos in it i show you this on the left then you think of it as the sex of the i-th person where i is a row and then there's actually in the model again there's alpha street it only has two addresses on it this time because there's only two sexes in this data set and so what happens is you think about being on row one when the model runs i equals one in this case because it's the first row in the data frame and s equals two that is s on row one equals two and so what the code does when it looks up alpha and the and the subscript is now two so it pulls out the second value which is the second unknown and that's parameter alpha sub two will get estimated but it will only get estimated using observations where the value of the s column is two and likewise for rows for the value of the s column is one alpha sub one will only be estimated using information from those rows where the value of the s column is one and this does our job this is there's nothing fancy or mathematical or really interesting going on here this is just a mechanism for getting the machine to run correctly and do the stratification that we need to address our scientific goal okay well we need to specify priors and now we have this vector of of unknowns of parameters on alpha street alpha sub one and alpha sub two and very often we want to assign them all the same prior like in this case we're going to start out with the idea that we're completely ignorant of human biology and we're going to give both sexes the same prior distribution with a for body weight which is a normal distribution with mean of 60 and a standard deviation of 10 and then we're going to let the sample update those priors and show us if there's any difference at all now you don't have to assume that the same but this is often a situation we find ourselves in if you use this is easy if you use index variables because you have this list and you can just give every element of the list the same prior if you use indicator variables to do this modeling this becomes much harder because now there's not the symmetry right that you have in this model where the sexes are modeled the same way if you use indicator variables one of the sexes becomes the default and the other is an adjustment i'll say that again if you use indicator variables one of the sexes becomes the default and the other becomes an adjustment and then there's an adjustment parameter that gets a prior and it can't have the same prior as the default effect because they're all one of them is an average and the other is an adjustment to an average now i'm not against that in all cases sometimes that's more natural you want to put a prior in the adjustment because you have information about it but usually what i show you on the screen here is much more natural and it's easier to express scientific knowledge this way okay let's test our code this is what we always do what is the total causal effect of sex the causal effect of sex is something we need to measure from simulation in this right because there's two influences there's the direct and the indirect so you and in this case it's simple enough that you could even compute it mathematically but this is not a mass stats course so i'm not going to teach you that i want to teach you something that works even for the complicated cases where it can't be computed analytically you can just simulate it so you've got a generative model and you can do experiments with it and because you can do experiments because you you you are god of your simulation you can measure any causal effect you want through experiments and experiment here means intervening on a variable and only one so what we do to find the causal the total causal effect of sex through bros pass is to construct two samples one where everyone's female and the other where everyone's male and then we look at the average difference in in body weight and that's the total causal effect of sex of changing sex and only sex on the generative process that produces weight and for i do it here for two samples each of size 100 individuals and we get that the the mean difference between the the male sample and the and the female sample for these particular parameter values is about 20 21 and so when we test our estimator this is what we want to be able to find for these parameter values and then you can come back here and change these parameter values do your simulation experiment again compute what we're looking for test it on your estimator and so on and that's the cycle of testing and here's the estimator the estimator is the model that we developed that has the indicator variable so i'm going to simulate a sample now now we're we're using the same simulation code is on the previous slide but we're not doing the experiment we're getting an observational study where the sex is a random Bernoulli variable there from 100 individuals so there's a mix of men and women in the population in our sample and we observe their height and weight now and now we try to estimate the causal effect of sex through both paths the total effect using the estimator redeveloped and so we run this model you look at the precy output at the bottom you'll see that there are two a variables there because there's one for category one which is women and one for category two which is men and each of these is the average weight in that sub sample and then the difference between these two is the estimate we're looking for the total causal effect both direct and indirect through height on on weight and you'll see that it's it's a good estimator it's getting at the right area you can back up to the previous slide change the parameter values and then we run the estimator and assure yourself that it tracks as the causal effect changes as you change the input to the simulation the estimator will track and get the right answer now let's analyze the real sample where we don't know the right answer but now we're we feel confident that this is the proper estimator if the the sample is generated according to the assumptions we've sketched out in the DAG and programmed into the generative model so the code is identical as before we set up the sample I construct the s variable by adding one to the male variable and we're only looking at adults here and we get our estimate now let's do some post-processing as I keep saying usually but these models the the unknowns themselves are not what we want so the a1 and a2 in this model is not what we're after we have to do comparisons of those posterior distributions in order to get what we want so let's consider for example thinking about the difference in mean weight yeah the difference in mean weight is not in the posterior the mean weight of each category is but we'd like to look at the contrast that is the difference between category one and category two so let's compute that and when you do such calculations you have to use the whole posterior distribution so what I'm showing you here are just the posterior distributions of a sub one and a sub two that is women in red and men in blue these are the means these are not the distributions of weights in the sample we'll look at that next that looks like this in order to simulate the posterior distribution of weights you've got to use sigma you have to simulate normally distributed people in their body weights and that's what I'm doing here we use our norm and we use the posterior distribution of sigma as well the whole posterior distribution of sigma as well and then you see that there's scatter and there's a lot of overlap in body weight this is not new information to you between men and women in human populations also this population yeah men are on average heavier and the difference in those means is quite reliable but we haven't showed you that yet but the difference in means is quite reliable but there's still lots of overlap in actual realized weights so but the overlap you see between the blue and the red is not an indication of its difference what we still have to compute the contrasts in both of these cases the contrast on top and the contrast on bottom to say what the posterior distribution of the differences between category one and category two and that's usually the estimate we're after is that contrast and nearly always you have to compute it by taking a difference in posterior distributions or doing some simulation so here's how that works you must always be contrasting compute the contrast because it is not legitimate statistically to decide if two things are different or the same just because their distributions overlap there's a particular important reason for this and that's because well it simply doesn't work let me show you an example a toy example imagine you had two unknowns for a model that is two parameters and their posterior distribution looks like this as you run the model you draw samples you plot the scatterplot of those samples and it looks like the plot on your screen these these two parameters are very strongly related to one another they have a strong positive correlation i'm sure you can see that by looking at the screen if we plot their densities i plot parameter one which is on the horizontal axis in blue and parameter two on the vertical axis in red and you see the densities on the right they overlap a lot they they span the same ranges quite a lot one is on average bigger than the other but they overlap quite a lot nevertheless their difference is reliably below zero yeah i'll say that again the the distribution of each of these parameters or the distributions individual distributions of each parameter overlap a lot and that's what you see in the blue and the red that overlap doesn't tell you that these things aren't reliably different because if you take the difference of each pair of parameter one and parameter two and that and and take all those differences and then stack them up in a distribution and plot it and that's the black density on the right the the difference is reliably negative yeah and in a narrow range that's because i computed it to be that way so the what's the take home message here overlap of distributions doesn't indicate that they're the same yeah doesn't indicate that they're different either but doesn't indicate that the same you have to compute the difference the contrast so that's what we do we get a causal contrast here let's talk about the difference in means now the the densities of the means on the top there don't overlap at all that is a pretty reliable indication that they're reliably different but we still have to compute the contrast because that's that's the estimate we want and so you compute the contrast just by taking the difference in the two variables in the posterior distribution and that's what the code on this slide does you just subtract a two or subtract a one from a two and you get a distribution of differences and that's what we plot on the bottom here and that's the posterior distribution of difference and so that's our knowledge about the average difference between women and men in this sample in body weight and it's between five and nine right but centered around a little bit less than seven kilograms now what about in realized individual people not just the averages because there's a lot of variation within each category in body weight and so the lived experience of people as it were is not governed by those means but by people they actually encounter right and so you showed you that these distributions overlap a lot what's the contrast between these distributions and so same kind of calculations we simulate individuals from category one and individuals in category two using the whole posterior distribution and then we take the difference between those two groups and that's the contrast and then we plot that and that's what I show you at the bottom of this slide I've added some coloring focused on zero there because you might ask okay but what what proportion according to this sample in this model and the posterior distribution that we've we've gotten derived from them what proportion of men are heavier can expect to be heavier than women and the answer is 82% yeah or or another way to say this if you randomly select an adult man from this population and an adult woman from this population how often will the man be heavier than the woman and the answer is 82% of the time yeah there's a lot of overlap but still most of the time around 80% of the time men are heavier than women in this population okay so we've addressed the first estimate the causal effect the total causal effect of sex on weight and that's on the left and updated it and we've got these two contrasts which answer this question one just about the means and the other about the whole weight distributions and this this is a causal effect because it lets you address the hypothetical intervention of changing someone's birth sex on their weight yeah and it's a distributional answer notice both of these plots are distributions and that is the estimate there is no point that you want to use as a summary because points are not estimates points or decisions that you might take action with the statistical estimate that contains all the scientific information are these distributions and that's what you want to communicate to your colleagues now we're going to address the second estimate what's the direct causal effect of sex on weight and now we need another model because we've got to somehow partial out that indirect effect of height or or sometimes people say block it or control for it I don't like the word control because control is something you do in experiments not something that you do with with statistics but we can stratify by height to statistically block the association between sex and weight that is transmitted through height and that's our statistical goal the estimate we want partials out that particular pathway so what does that mean well if you think about our generative model there there are parameters for the indirect effect and the direct effect those parameters aren't the direct and direct effects but they change it they govern it and they create it so b is the slopes and there's one for each category for each sex and then there's a which is the direct effect so here's a simulation where I make the slopes the same and so the relation the causal effect of height on weight is the same for men and women this is a synthetic example and then the direct effects are different and the way I make this is that men are on average 10 kilograms heavier regardless of their height and then I simulate a sample and then I plot it here yeah and you can see that there's a difference so height influences weight the same way in both of these both men and women in the simulation and men are heavier on average if we fit regression lines independently to men and women you'll see that they have the same slope but the blue points are still higher and that's that effect of giving them the 10 kilogram boost as it were that there are 10 kilogram heavier on average than expected for their height yeah and that's what we mean by a direct effect so how would we then develop a statistical model that can estimate that bonus as it were beyond what we'd expect for their height well we use a linear model and we're going to augment the previous one remember we just had a model that's to make the total causal effect of sex on weight where we just had an intercept that was unique to each sex and we're going to update that model to include now the height effect because we've got to stratify by height and that means including height as a variable in the regression and this this allows the model to then say for for individuals of the same height what are the differences by category and so imagine we were ignoring sex for a second and we were doing the regression of weight on height from the previous lecture we're going to start with that and we're going to add sex to it and that's what I'm showing you on this slide we have alpha the intercept and we have beta the slope but I've changed one thing and I've changed this now because this is going to make the task a little easier and and there's no trick here there's nothing illegal that's been done what I've done is called sintering so if you look at this weird term that's multiplying beta the proportionality constant we have h sub i which remember is the individual's height individual i's height and I'm subtracting h bar that line over the h is pronounced bar in mathematics it's like a flat hat on top of the h and that indicates almost always an average and so h bar is just the average height in the population so what this does is called sintering it's extremely common I would say it's the default in linear regression for most variables you don't have to do it but it makes a lot of things easier it makes the software run better first of all and it usually makes it easier to think about the priors as well because what it does is it then means that alpha is the average weight the expected weight when an individual's height is average I'll say that again it makes the meaning of alpha the expected weight at the expected height yeah because when h sub i is equal to h hat then that difference is zero and beta doesn't have any effect on the prediction and alpha is all that's activated and this is a very convenient way to think about then alpha because then it first of all it's in the data right it's it's easy to think about it's not off your screen when you plot the data as it was before so when we develop the priors for example we can use the priors that we used just a little bit earlier for alpha at the average weight which is as a prior we used 60 so one way to think about this is if you take any scatter plot of two variables and you're going to fit a line through them you could draw a vertical line at the average on the x axis the average h there is that vertical dash line and the average weight is that horizontal dash line and there's a point where they meet that's right in the middle sort of the center of gravity of all the points and it's a fact about linear regression that the that the regression line the best fit line will pass through that point now we're we're getting we're doing posterior distributions of lines and they'll jiggle around but their center of mass is also going to be pivoting around that point so again here's the show you if you just drew one line using lee squares regression it would pass right through the grand mean as i tend to call it and why is that well because it has to be true we're just modeling linear regressions just model expectations and if all i told you about a person from the sample is that they're of average height and i ask you to predict their weight there's no better guess than the average weight i'll say that again if all i told you about about a person from the sample is that they're of average height the best guess of their weight is the average weight right it's an average person of average height they should have average weight yeah it doesn't matter what the relationship is between these two variables there could be no relationship between the two if there's no relationship between height and weight you should still guess that they're of average weight because that's the best guess yeah i hope that makes some sense and so regression lines pivot through this grand mean point and so if alpha has the meaning of the location of that horizontal line this makes a lot of tasks and modeling easier now we can just put s subscripts on everything and that's really all there is to it in stratifying by categorical variables and we're going to we're going to stratify both effects here right we're going to allow we're going to allow the slope to differ by sex and we're going to allow the direct effect the the so-called intercept alpha to vary by sex as well and this is no problem in the code we just now we have a street called alpha we have a street called beta and they have addresses one and two and all of the indexing and and addressing works exactly as before for the first row this individual is male they have a dress two in both of the vectors in both the alpha vector and the beta vector and the second individual is female has s equals one and they have a dress one in both vectors now we can analyze the sample now you want to test that as well and i encourage you to do that with with the synthetic data simulation in the comfort of your own home you can imagine how that goes how that works and now we're gonna i'm gonna jump ahead to analyzing the sample and show you what we get the model programmed into quap showing you there at the on this slide i hope has no surprises we just bracket by s and quap interprets that to mean that you want a vector and it constructs it and does it all for you i just want you to see that i've i've constructed h bar there which is the average height as well you just need to compute h bar and pass it in as data and i do that on this slide as well as just the mean height and the results i plot them on the right for the posterior means which is something i don't encourage you usually to do but just for the sake of easy illustration on this slide show you that the regression lines for men and women in this sample have almost exactly the same slope yeah almost exactly the same slope very very similar but this doesn't address our estimate yet remember we want the direct effect we have to compute the difference in expected weight at each height because that's the direct effect that we want yeah above and beyond any the weight any differences on average that are due to differences in height for men and women of the same height are they still different weights yeah now there aren't a lot of men and women of the same height but the regression line will let us imagine they were yeah yeah for any arbitrary height and so we can compute the contrast and what we're going to do is for for each height we're going to simulate individuals men and women of that height and then look at the difference in the simulated weights from the posterior distribution so here's what the code looks like i know there's a lot of code here but actually it's fairly simple i'm using the link function from the rethinking package which makes this kind of simulation a little easier you pass it data that you want simulated from so in this case you're passing it an s vector of ones first and then it it simulates expected weights because that's what the model is set up to do and then same and then again for men and then we just take the difference between these two sets of simulations and we get a posterior distribution of differences in expected weight at each height i'll say that again we get a posterior distribution of the difference in expected weight at each height between men and women according to the model and i plot this on the right and i highlight with the horizontal dash line zero there and you'll see that the gravity is right around zero there's not much difference at any height between men and women of the same height there's just a little bit on on for really tall heights women tend to be a little bit heavier and for shorter individuals men tend to be a little bit heavier and that's due to the difference in slopes of the lines but the fact is that this weird gray bow tie is sort of straddling right on zero which is the conclusion we get from this is nearly all of the causal effect of sex on weight acts through height differences i'll say that again nearly all of the causal effect of sex on weight acts through height and so we observe big mean differences in weight in this population by sex but that's because i'm in a taller okay so that we've got our second estimate now and we've partialed out the direct effect there and we've learned something extra we've got the total causal effect estimated on the left and we've got the direct causal effect estimated on the right which is almost nothing right there's almost no direct effect it's not that there's none but there's almost no direct effect of sex on weight in this population so just to summarize a bit categorical variables show up a lot in modeling and you want to get used to them if nothing else your experimental units or categories and you usually want to include those in your modeling as well i think it's nearly always better to use index coding with categorical variables and we care about them because we want to stratify our estimates by the relevant categories and then once the model is run you're going to need to extract the samples from the posterior distribution to compute the relevant contrasts which are the actual estimates of interest the parameters are rarely the estimate itself they're things they're little little latent things that you use to compute the estimate after the model runs and remember to use the whole posterior distribution not a point from it don't take the posterior mean and compute the the difference between the mean of two different parameters instead compute the difference between the distributions and then take the average yeah always any kind of summary whether it's a mean or an interval or a standard deviation should be the last thing you do in your calculations just to summarize in reporting that you have to take the difference between the whole distributions because you get different answers and the right way to do it is to summarize last the wrong way to do it is to summarize first and then take a difference okay that was a lot let's take a break you should review the material up to this point see where you're confused try to work through that confusion go for a walk have a cup of coffee listen to some music and whenever you come back we'll pick up again and talk about curves and I'll be waiting welcome back in the remainder of this lecture I want to talk about some ways the linear models can produce nonlinear shapes and this part of the lecture is not going to be as detailed and code heavy as the first part I'm sure you've had enough of that for today but there's more details in the book and lots of details in the code examples online if you want to implement these curves for yourself so first of all many natural processes over any reasonably large scale are not linear in a strict sense so for example if we plot height and weight for the whole human lifespan in this sample not just the adults includes kids we see that this is not a line the relationship between height and weight for adults is is approximately linear but over but for all ages it's not so how could we model something like this if we want to correctly stratify by height when we examine weight for example the effect of sex on weight well we don't need new technology really we can use linear models to do this because lines are like epicycles and we can cobble them together in big additive functions that can bend in any way we like now this strategy is not mechanistic in the same sense that the geocentric model was not mechanistic but you knew linear regression was like that already right it's geocentric as long as we use it wisely that's fine there there are two popular strategies for these non-mechanistic generalized ways to make linear models bend and the first is to use functions like polynomials this is probably the most common strategy and it's it's bad you should never do it i've done it i feel bad about it i will try never to do it again and i'll explain why i'll explain what polynomials are and i'll explain why you shouldn't use them the second big family of strategies is to use additive functions called splines and they're relatives to generalized additive models and there are related methods like Gaussian processes that do essentially the same thing we'll talk about in the second half of the course these are a lot less awful they're they're quite useful in fact if your goal is to have some flexible curve that passes through the center of gravity of the points you should go this route absent some more mechanistic model all right what are polynomials polynomials are these functions you met in secondary school probably where we multiply a variable the observed observed variable the x-axis variable by itself multiple times to make curves of various shapes and so a polynomial is a series like this already have some intercept and then a the so-called linear term beta sub 1 times x sub i where x is some x-axis variable that we're using and then in the simplest polynomial then we have a parabola that we achieve by adding a squared term where it's b sub 2 times x sub i 2 and and so beta sub 1 and beta sub 2 are just parameters they're part of the posterior distribution and x sub i is just a single column of data but we construct multiple terms for it inside the linear model this model is still linear technically because it's an additive function of the parameters i'll say that again this model is still linear technically because it's an additive function of the parameters right you don't the parameters aren't exponentiating anything yeah they're just factors in front of each term and the terms are added together so polynomials can make lots of shapes you can add a cube term and x to the fourth and x to the fifth and so on skies the limit and you can fit all kinds of data this way i'll have a particularly shocking example in a future lecture but the curves on the right of this slide give you an idea of what polynomial shapes can do and you can fit data with these things you can they're very limited though and they're hardly ever a good idea and the reason is because they have lots of symmetries that are undesirable from a scientific standpoint they they so for example the parabolas i'll show you on the next slide is always perfectly symmetric but often we don't want the curve to be symmetric that's not a scientifically reasonable assumption but the major issue is the way uncertainties are represented in these things you get explosive uncertainty at the edges of the data where you wouldn't with a more scientifically reasonable model and the fundamental reason for all of these problems is that polynomials don't smooth locally that is they don't they don't determine the shape of the curve by only looking in regions of the x-axis but they determine the shape of the curve by looking at the whole x-axis at once and maybe that sounds good because it sounds a bit megalomaniacal but actually it's really bad because it means a data point in any part of the x-axis can change the shape of the curve arbitrarily far away from itself i'll say that again it means that any data point in any region of the x-axis can change the shape of the curve arbitrarily far from where it is and that's bad news instead we want local smoothing and that's what splines will provide which i'll get to in a minute let me show you how polynomials behave in in Bayesian updating they behave the same way in non-Bayesian statistical paradigms as well of course but here's the the posterior distribution updating animation like the one i showed you earlier in the previous lecture for linear regression so on the left we have a prior distribution for the beta sub one and beta sub two to determine the shape of a parabola the equation for that parabola is shown above the plot on the right and the curves there are just three randomly sampled parabolas from the prior distribution you'll notice that one of them looks like a line because it bends much much further away off the off the slide now i'm going to introduce a few points and let it update you'll see from the prior they wiggle all over we see one data already they've been down why is that it's because parabolas must bend and so as you'll see as the data piles in most of the data will be on the left there's just this one point on the right from a completely general perspective there's basically no evidence that this relationship is really parabolic or symmetrical for that matter and yet that's what the parabolas must do and you'll notice that the uncertainty the width of the gray region flares out at the at the ends of the data there all these things make up parabolas and and higher order polynomials quite undesirable from a modeling perspective because they make a bunch of assumptions that you don't really regard as useful and they learn less from the data than or they learn too much from the data in regions far away from where the data lie what if we model height and weight this way well you can fit the relationship between height and weight with a parabola I do it here I want you to notice what happens on the far left on this graph after fitting this parabola to the height weight data it it thinks it that babies get heavier for some reason as they get shorter yeah below below about 50 centimeters the model predicts to get heavier now obviously this is ridiculous and you don't believe this but if it were true that you could use a curve that didn't do this wouldn't you prefer to do that yeah so don't use parabolas please this this enforced symmetry is is quite bad news it means that you can't make predictions outside of the range of the data with any kind of credibility you can fit higher order stuff too so here's a fourth order polynomial with a term that is height to the fourth there on the end and this curve bends three times right and this one makes lots of absurd predictions as well especially for the taller individuals that there will be they will go cataclysmically thinner but notice that there's a lot of uncertainty too the predictions are quite weird on both ends here you could do a lot better in this particular case by thinking scientifically which is probably unsurprising and much later in the course I'm going to revisit these exact data and we're going to use a more biologically inspired model of growth to redo all the modeling of the causal influence of height on weight and I'll show you how extremely powerful that can be is we're going to get a really good curve that explains the shape of these data with a very small number of assumptions and essentially no unknowns that is no parameters are needed to fit these two variables together but that's that comes in lecture 19 so that's just a promise I'll deliver on later let's talk about splines splines are really really useful if you don't have good scientific background information and you just want a good locally inferred function that gives you that lets you stratify by some continuously varying prediction variable splines are really nice and that the sort of simplest ones to explain are called basis splines but there are lots of different kinds of splines and they all work basically the same way and that's by adding together a bunch of locally trained terms that is that there are parameters that are trained only on local regions of the data and then there are those local regions are smoothed together to make a continuous function the word spline comes from drafting these the their tradition of doing this lots of people still do this you do this in carpentry too if you want to saw a line in the right place you'd have a flexible metal or wood bar and you attach to it these weights and then you can by adjusting the position of the weights you can bend this bar to make a smooth curve that you can then draw repeatedly and accurately on a surface and so this is what a spline is and where the term comes from so we're using metaphorical splines in the sense that they're flexible but also that there are anchor points where the weights are that are the control points for the spine and the algorithm uses those anchor points and learns their places but only only in the local region where the anchor point is so just to give you an intuition all the code for fitting splines is in the book and you can find lots of fantastic tutorials and even whole books online about how to work with splines and generalized additive models they're they're a major branch of applied statistics what I want to show you here is just some animations to primary intuition about how they work so the example that is featured on the cover of my textbook is a historical record of cherry blossom blooms in a particular region of Japan where the day of the first bloom has been recorded for a long time yeah about a thousand years with some gaps and we also have other records from these things but I'm just going to plot about a year on the horizontal the day of the first bloom on the vertical and and we're going to be interested in getting some local approximation of that so we can visualize the trend this is one of the things that splines are really nice for and then you could stratify by the trend and look at look at other associations but we're just going to try to characterize the trend and get a flexible spline what these dancing lines on here are what exist in the prior distribution of the spline before the data populate the graph so in the absence of the data basically almost any kind of wiggly curve is possible and this is the thing about splines they can take an infinite number of shapes so the posterior distribution for the spline contains an infinite number of wiggly functions it really does and you're I'm just sampling a few of them for you here and the blue and the black are just different samples from the posterior so that I can show you more functions simultaneously now we're going to populate the data and if you train the spline on the data and then we sample splines from the posterior distribution that's what the blue and the black are now they're just two samples from the posterior distribution that I showed at the same time to show you the variation so this is a wiggly function that you know there's still some uncertainty but where the mean is in any particular time but you can see how it's locally trained and so you get these reliable sort of little hills and valleys periods of history when the blooms were earlier or later in the year and this is the power of splines there's no scientific information that goes into the spline but it's it's a flexible curve that can find trends and that helps you do stratification by continuous variables when you need it how are they built just a little bit about mechanism again I'm not going to get too much into the details there's a lot in the book B splines or basis splines are just linear models with a bunch of additive terms with little synthetic variables and this is what they look like so it's a linear regression literally where the mu line mu sub i that is the expected value of the outcome for case i is this long in fact arbitrarily long series of additive terms there's an intercept and then you have these w parameters which are parameters to be estimated there's the w for weight like the weights on a spline on the right and then these b sub i's which enter the model like data but they're actually they're artificial data they're just a spline shape and then the weight determines how important that particular spline shape is in the particular region you can think about the the b sub i's as coordinates on the x axis that allow curves to form so an animation might make this a little easier the weights w are like slopes and the basis functions are synthetic synthetic variables that are local positions and so the the particular weight parameters get turned on in different regions of the x axis and this is what allows you to make a very large number of different flexible curves so here's an example maybe i know this is super weird what you're looking at here is maybe the world's simplest spline and there's an x-axis variable i haven't drawn the axis but you just imagine the x-axis is some variable like year that ranges or weight or height or something like that and the vertical axis is well it's the vertical axis but it's here it's the weight is weighted basis function as the way the spline sees it and what is a basis function well on this plot there are four basis functions they're the colored curves and the b variables in the spline are just the value of those basis functions at any particular point and what the weights are going to do is adjust the height of that they multiply the whole basis function and they make it stronger or weaker and then the black curve is the spline and it's the sum the vertical sum at each point on the horizontal axis of all the basis functions so let me show you what that what that ends up meaning so red basis is basis function one green is basis function two blue is basis function three and psi n is basis function four so um and black is the spline so there's one uh there are four weight functions which is the importance of each of these um each of these basis functions and you can imagine and for this example um I have them set to one minus one um one n minus one and that makes this particular uh shape of spline by turning them up and down so to appreciate what the weights are doing I can change their values and you can play around with this thing experimentally and see what happens so imagine I take the first weight variable and I flip its value to minus one what that does is it on the far left where the first weight function matters it devalues basis function one it makes it negative and this pulls the whole spline down but nothing else happens in the other areas of the of the curve because weight one applies to basis function one I'll say again weight one only applies to basis function one and basis function one is zero everywhere else it only has non-zero value on the left side yeah in that region of the x-axis now let's think about basis function two the green one and say we each flip its value to one instead of negative one you notice again basis function one now it's non-zero internally it's zero on the far left it's zero on the far right and it has its maximum non-zero value um it's sort of in the middle left there so when we change its weight make it really negative or really positive it either it either pulls that that only that region of the curve really far down or really far up so by flipping it to one here we've pulled the spline up but the far right has remained the same and the far left has remained the same because basis function two can't affect those regions same for the blue curve three let's flip its value to minus one again basis function three only in this particular region so changing that weight only pulls us down in that region and finally basis function four on the far right if I flip its value to one instead of negative one it pulls that only that region of the curve up and now we've essentially inverted the spline from where we started so adjusting these weights they can be they don't have to be ones or minus ones they can be any number you can make a bunch of curves and the more of these anchor points you have the more basis functions you have and the more wiggly of a curve you can get okay so just as a toy example we could fit a spline to height as a function of age now we haven't looked at a plot of height against age for the for Nancy Howell's data yet but this is what it looks like and these are humans they're born they grow rapidly they have a long adolescence and then as adults we mainly stay the same stature although we do get a bit shorter in old age this is obviously not a linear relationship best it's it's piecewise linear which would mean some lines stuck together at their ends we want a function to approximate it so that if we needed to we could stratify by age and I'm fitting a spine just as an example but I would never do this because we know a lot about the biology of height and how humans grow and we'd rather build models that way but as an example of how splines can fit arbitrary curves quite well I'll give you an example just as a comment what we're implying here is that age is a causal influence on height right so we could add it to our deck it's a little bit weird to say that age is a cause because you can't imagine an intervention right you can't experimentally adjust someone's age right you know not in the absence of time travel yeah but that adjusts everybody's age and some people think if you can't find an intervention you can't talk about it being a cause you know I think it's fine to talk about things being causes when you can't intervene on them there are lots of things like that and that's fine but age is a it is important to think carefully about what you really mean when you say age is a causal effect most of the time what we mean is that age is just a marker for time past and other things that causes accumulate during during that time but those are the real causes and if we could measure those accumulated causes and add them to the model age would have no association afterwards but we can't measure those things in this case all the accumulated growth instances and so age is a proxy and it can be extremely useful to think of it as a cause okay so now I'm going to train a Bayesian spline on the height and age data I'm going to start with 10 individuals and here's a sample from the posterior distribution trained on 10 individuals I'm going to start animating and the animation will layer in data as it goes and you can see what happens first it's extremely wiggly and we add some more points it's still a bit wiggly you'll notice it's very very wiggly outside the range of the data you can do anything it wants and one of the real things here is the spline knows nothing about biology so it's happy to let individuals get shorter by an arbitrary amount at any age which is something you would not let a biological model do right any particular individual is expected to grow until older age when they might get slightly shorter but you'll notice that the spline learns the path through the expect expectation at each age very well and has no problem having that little bin there and it's locally smoothed right and this is what lets it do it unlike a polynomial which would go crazy with data like this because in order to have that sharp bend around age 20 it's got to bend in other places too here are the basis functions I'm going to show that animation again this time with the basis function so you can get an idea of what's going on to remind you so the basis functions are these hills that are under the blue spline and what the weight parameters that are getting learned do is they make these hills shorter or smaller and then the spline is the sum of the hills at any vertical slice any vertical point on the x-axis and you can see that a little better here so you can see how the basis functions are learning from the data and then adjust their height and then if you add together the black hills at any particular vertical point you get the blue curve and this is what splines are so this is how they're fundamentally additive or linear underneath because there are some of these curves okay splines are great we often need curves and if you don't have a strong generative model with which to constrain the function splines are great and this makes them a very common tool in applied statistics Gaussian processes likewise very much like splines but a more Bayesian flavored spline in the case of height we do a lot better by thinking biologically and the scientific literature on height does not use splines and well it shouldn't if we thought about human height what we probably want to do or one strategy that I'm a fan of is to think about growth phases we know that humans have distinct biologically controlled growth phases there's infancy where there's very rapid growth and there's childhood where there's slower growth and well there's rapid growth of the head during childhood and slower growth of the body and then there's puberty where the brain is stopped growing essentially by the onset of puberty you get all the brain you're going to get by the age of 12 and it still reorganizes and you still learn things of course but it's not going to get any bigger and and then the body you grow into your head as it were during puberty and and then there's adulthood in which you you level off but there is a tends to be a slight decline in stature in the long run during adulthood because gravity wears this down modeling each of these phases is a much more productive strategy because you can put in the right constraints individuals get taller they don't get shorter during the first three growth phases and there would be there would be parameters unknowns in each phase which would have biological meaning and you could do meaningful comparisons between them and so on okay thank you for your attention i hope some of that was clear at least i encourage you to review these two lectures at least the slides before next week and for you before the lectures next week because there's been a lot in these i probably don't have to tell you that and these are foundational in terms of tools we've used you've learned how to fit lines and you've learned how to use categorical variables and we're going to in the next starting next week take those foundational tools and add no more for a while we're going to keep using linear models with categorical variables and we're going to examine new estimates and new problems and developing estimators but all of the statistical tools we're going to need for a couple weeks are in place now and so i emphasize that so that you know you should review a bit and make sure you've got these and also to relax you a bit from here on out we're going to be focusing on more examples but not so much on statistical machinery that's in place okay i'll see you next week i didn't see you there are you still here well as long as you're here you'd like some more how about some bonus content all right just a little bit so one of the things i left out in the main lecture because it honestly just got too long is this style of causal modeling that i call full luxury base this is just my joke term for it it's meant to be a bit ironic it's equivalent to what i showed you in lecture which is to say the typical approach when using linear models to do causal to to satisfy causal estimates is for each estimate we need a different statistical model and that's what i did in the main lecture when i analyzed the causal influence of sex on weight each estimate may need its own statistical model because it needs to stratify on different things but there's an there's an alternative and equivalent approach that only needs one model and it's a model that represents the generative model and then you can run simulations from that model after you fit it to to get each estimate that you require these approaches are perfectly equivalent and they require similar amounts of work it all depends upon which style you prefer this is the style that i tend to use in my own work but it it doesn't really necessarily have advantages or disadvantages from the other they're just fewer models to write but it requires more simulation typically than the other approach the other approach requires defining more models and this approach requires more simulations to write so what would this look like in the case of what we did in the main lecture same generative model and we're just going to express the generative model shown in schematic by the dag on the right of the slide as a single statistical model that has multiple relationships all the relationships inside of it so one way to think about this is that we've got a weight sub model that is our weight as a function of stuff remember the dag implies that weight is a function of height and sex so we write a linear model to express that inside the quap model here and that's the part of the formula that's commented as weight and you'll you'll see that that's the weight model we used in the main part of the lecture it's the same stratified by sex and and it's a regression of weight on height stratified by sex that's all it is but we also have a model for height and and how height is influenced by sex and this is not some this is something we did not do in the main lecture at all but when you do full luxury bays you will do this because you will program the whole dag and whole generative model inside one big Bayesian network model like this and the height model is super simpler super simple it just stratifies height by sex so we get a mean height by each sex why would we do this well the answer is because if we estimate both of these models simultaneously we're using the sample to estimate well the whole generative model all the features of the generative model and then we can run simulations out of that estimated generative model to get the causal estimates we want in exactly the same way that we can run simulations from our from our generative model counterfactually to explore what causal effects would be in any particular scenario yeah but now we've got estimates so it's like getting estimates of the parameters and then plugging them back into the generative simulation so think of it like this that the sub models the top part the weight model is those two arrows those two red arrows h and s pointing in the w and then the bottom model the height model is just the one arrow from s to h but these are these are both sub models of the full dag when we run the simulations we we run the whole dag but you'll get a posterior distribution that contains a bunch of stuff don't worry your computer it won't complain it'll do this very fast and now you want to think about causal effects as interventions and you're going to simulate those interventions and look at the effects that way and that's how you get your causal estimates in this approach so let me show you how to do what we did in the main lecture but from this approach so we've we've run the full luxury base model that was on the previous slide we extract the posterior distribution that's what the code on the left does first then there's some bookkeeping i define h bar is the average height in the population and n is the number of simulations number of simulated people i'm going to run and one e4 is 10 000 and now with the posterior distribution the the width command in r lets you define a chunk of code that's in a particular scope is what it's called it's a weird word to use but scope and so what this means is post now the posterior distribution is going to be in scope for all of the code that's in that block and that means you don't have to put post dollar sign in front of all the parts of it now you can just name the parts of it and r will find them this is very convenient and makes code easier to write and to debug and for other people to read the first thing we do inside the width block is we simulate weight for women artificial women and we simulate 10 000 of them and we simulate their heights using the symbol h of the posterior distribution which i can back up and show you if you see the little h in the height model at the bottom there that's just the average height for women or men depending on what s equals so that's just the mean and tau is the standard deviation of height and then we simulate weight for women using again the same formulas in the generative model and now we have those simulations and then we do the intervention we define a sample where we've intervened on sex counterfactually and we get a population where everyone is male instead and we use the right parameters for that and then the contrast is the causal effect we want and we we calculate it directly and so we can get all the same curves this way that we got in the main text and we fit one model but we have to do simulations like this at the end and we get the total causal effect of sex on weight from this calculation you could also get the partial one just as before remember the partial causal effect would be a contrast of the of the weight of the difference in weight between men and women at each height so we could do simulations to look at that as well so sometimes these causal effects are are written like i have this notation on the slide here p of w conditional on do s what does this do mean i'm going to talk more about this later but this do represents an intervention this is a way to one form of notation for causal inference the little p represents a distribution it's like a probability but it's continuous so it represents you read it as the distribution of weight conditional on intervening on s and that's our statistical definition of the causal effect of sex on weight you can automate this simulation luckily the sim function in the rethinking package as long as you're using quap will it knows the model formula and so it can do the simulation for you and and here's an equivalent way that gives you the same estimate of causal effect okay that's all i wanted to say about full luxury bays i think there'll be an example around just after the middle part of the course another example of doing this it's equivalent to the idea of defining multiple models you have to do the same amount of work it's just where it's where it is in the full luxury base approach the work is at the end of doing more simulations but you define only one step model and it looks like the generative model so it makes it easier to develop the estimate because the estimate is always the generative model you pack it all in there in the other approach like i showed in the main lecture you state each estimate and then you figure out using those separate logic what variables you need to stratify by and so you get different statistical models and many of the examples in the course are going to use this approach because it's valuable to know that logic for other reasons and this is the logic we're going to focus on a lot more starting in the next week