 Welcome back to lecture 15 of statistical rethinking 2022. Let's take a trip To a first order of approximation Humans are 50% of all the mammal biomass on the earth The other 50% to a first order of approximation is our livestock all other mammals are rounding error And that's a testament to the extraordinary success overwhelmingly powerful and terrifying success of our species in almost any environment We thrive and we dominate in difficult conditions The community of our undock you're seeing scenes from here is like all others makes a great living making gardens in jungle territories Domesticated animals like chickens and pigs and hunting Which requires a tremendous amount of skill and practice and most of all patience and then all of this production to survive under high levels of self-sufficiency Is shared human communities are incredibly Cooperative and that is a kind of technology as well that helps us thrive as a species Households are not independent. They're Extensive and that extensive cooperation among households is something that varies tremendously cross culturally And it's a huge topic of scientific research because it helps us understand how societies function and how we can Sustain cooperation to adapt to changing times We're going to spend this lecture in our undock or rather. We're going to spend this lecture in some data from our undock The data we're going to look at Relevant to the problem of sharing a particular slice through the lives of these people and of all peoples really the Literature in the behavioral sciences the human sciences It's taken up in a large section by the study of sharing and cooperation because it's found in a very conspicuous way in all human Societies, here's one of the most famous Quotes from the anthropological literature on exchange and sharing From Peter Foysch and 1961 book on the Inuit up in our country. We are human and since we are human We help each other. We don't like to hear anybody say thanks for that When I what I get today, you may get tomorrow up here We say by gifts one makes slaves and by whips one makes dogs this passage Echoes a sentiment we find in many societies is that there is both generalized reciprocity generalized exchange Those who are in need receive aid At the same time we know empirically that there are favored trading partners and that people are jealous and selfish At the same time that they are generous and so Cooperation and exchange is a social technology that allows communities to thrive if they were less cooperative They would be less successful But that cooperative technology is always under threat from internal individual forces And so it's a big important topic in the behavioral sciences to understand how societies Sustained this kind of cooperation The data set we have to look at today Is focused on the motivations for sharing and the people of R&Doc like all human societies share a lot These data are in the coaster lekkie data frame that you'll find in my rethinking package And it's a year of food transfers among 25 households in R&Doc So for 25 households, we're going to look at all of the dyads among those households and we have data From dr. Jeremy costar's fieldwork there for all 300 dyads Our focus our our estimate as it were is going to be how much sharing can we explain by Dyed level reciprocity that is to what extent our exchanges Balanced within dyads that is that there are friends who exchange with one another and what sustaining cooperation plausibly in those circumstances is that balance And how much instead by generalized giving as well And in the data as you can anticipate or a mix of these things and how do we Statistically unmix them in a principled way so that we can make causal inferences about these different causes of giving On top of this is the question of which diets in which households What is actually causing some to give and others not to if you just plot the outcome variable of interest here It's very difficult unsurprisingly to see what's going on. What I'm plotting is A gives B on the bottom where a is just some household in the data and B is another household And on the vertical B gives to a so this shows you in the crudest form a Visual inspection for balance and you'll see that there's really no pattern here There's no strong positive correlation in the data But what does this mean now if you've been with me through the course this far You're prepared to hear that we cannot possibly see the causes in the data this way We need a generative model and then we need to follow the logic that allows us to build a statistical model from that generative principles And that's what we're going to do. We're going to draw the social sharing owl today So let's start The first thing to understand is that When a household a gives to household B G a B is going to be our outcome variable today. They're in the center of this tiny little dag It is influenced both by some features of household a and household B This is not necessarily just their identity But it's everything except their relationship It could be how generous household a is it could be how needy household B is or some other aspects of them and Then there are things that have to do with the social relationships between the households a and B and I'm going to call this a social tie And this is the the T sub a B node at the bottom of this dag now This is a social tie and it's directed from a to B. It's a's relationship from a's perspective with household B And this is also a cause of giving if there are friends, for example, or their family That's a kind of tie and it motivates giving And also this is symmetric. So from household B's perspective There's another tie from B to a now drawn at the top on this slide And can also be a cause for household a to give the household B if reciprocity matters And both of these ties The T a B on the bottom and T B a on the top are Influenced by the identities and features of household a and B the reasons household a and B might be friends Share social relationships be kin be in-laws any of those number of things can be partially Influenced by their wealth or their locations or any number of other things their hobbies for example So this dag on the screen is the quintessential kind of social network dag and it exhibits some basic Problems in social network analysis that we're going to explore in the lecture today and develop some statistical solutions to oh Yeah, I should have done this already So when we say features of household a and household B We could mean anything from their wealth to their locations because households that are closer to one another or more likely To exchange with one another and also to share ties with one another and then there are other reasons that Households can share ties which have to do with dyadic features of those households instead of individual features of those households things like kinship and Friendship which are not properties of individual households, but a pairs of households okay, so This is just a dag and we can apply our standard logic to it and and build up to a generative model And then make a statistical model and that's what we're going to do today But there's a problem to observe from the beginning and that is that the things we want to make inference about TAB and TBA and their influence these social ties and their influence on giving they're not observable entities It's impossible to observe a social network tie What do I mean by that a social network is just a statistical concept. It means a pattern of directed exchange Social networks are abstractions. They're not data and that's not to insult them abstractions are great There is essential. In fact, you can't do data analysis without them But it's important to to note that these are things that are always uncertain They're latent variables and we're going to treat them properly that way today So what is a principled approach when you can't see the social network, but it's a target of inference? Well, we'll get to that but first let me say there are a lot of Unprincipled approaches and unfortunately unprincipled approaches to social network analysis are incredibly common I call these sorts of approaches ad hocory As a pun on the Latin phrase ad hoc meaning they're not principled They don't start from any generative model of the phenomenon and so they're incapable of successfully justifying any causal analysis of the system and they Examples of these sorts of ad hoc approaches involved are for example permutation approaches to social networks where social networks are randomized in some particular way in Attempt to have a null model of the network that you can reject By looking at the data Unfortunately, this is not going to get us anywhere near an effective causal inference for all the reasons We've been talking about since the beginning of this course These methods have other problems beyond that. They don't even have the type one error control properties that they claim to have So what do we do instead? well The field of social networks has lots of solutions that do work and I'm going to tell you about a Kind of central example today, but we're going to build it up logically. I'm not going to justify it because it's common I'm going to justify it because it makes sense from a causal perspective. We're going to draw the social owl We're going to go through our four steps here Focusing on the estimate we've established this we'd like to estimate The extent to which giving is motivated by social ties. That is that it's there's reciprocity within dyads They share their ties And we'd also like to look at what explains that level of reciprocity. What are the features of dyads? That are associated with reciprocity and are they plausibly causally reciprocity We're going to build up a generative model next and go forward First thing to note is we've got confounding So if you look at this dag you've seen a lot of dag in this course, let's analyze it with our eyes We've done this a lot. So I've highlighted in red the causal Inference where after we'd like to know the causal effects of the social ties on giving and in particular the extent to which When those both social ties are present you get more giving But there are these backdoor paths that are connecting Through the left and right of this graph the the social network ties at the top and bottom You'll see that they're those backdoor paths. The household features HA NHB are confounds literally they're forks That are common causes of the social ties on both sides And so we have to stratify by these as well in order to do the analysis We can't only focus on the social network We have to deal with individual properties of the households at the same time in order to get at our estimate However, that's going to be complicated. So in this course We're going to do it like real research is done when we have a complicated model to build We're not going to go straight to the end to the analysis. We intend to arrive at we're going to build it up Incrementally and slowly we're going to climb the ladder So we're going to leave out some effects to start and what I'm saying is We're going to get into the middle of our drawing the aisle steps And we're going to bounce back and forth between steps 2 and steps 3 and we're going to gradually fold in all the complexity of the DAG and get us where we want to go to analyze the data from our own doc But we're not going to do it all at once and the reason is because For projects like this one you can almost never get it right if you try to do it all at once instead You build in one layer of complexity start with a simple generative model that has only one force in it Then you design A statistical model appropriate for that you test it with the synthetic data And then you back up to step 2 and you fold in another force in the generative model And then you proceed to step 3 and you add that force into your statistical analysis You test with the synthetic data and you iterate back and forth between 2 and 3 Until you've reached the complexity of the project you intended But by under no means should you try to do a mature research project all at once with all the features I'm going to emphasize this now because in this course up to this point. That's the way I've made it look The examples have sprung forth sort of from my forehead fully formed as it were But that's not how scientific projects actually work That's not how I work when I do my own science I do really small steps and I test each one very aggressively make sure it works before moving on to the next So I want to show you how that looks in a stylized way in this lecture So that means to begin I'm just going to delete arrows from the household features to giving and we're going to start here We're going to make a generative model of this now This is not a realistic dag And it's not the dag that I want to convince you is appropriate for the research problem But it's where we're going to start because it's a simpler place to start and it's already complicated enough Just to model the social network ties. So let's start there Let's write the generative simulation I'm going to simulate 25 households just like in the real data set. This gives us 300 dyads if it's not obvious to you How to produce a list of all those dyads and their memberships? Well, that's fine. There's a function in our Called co mbn combinations, which will give you all the combinations of in objects taken in this case to at a time and so I make Matrix in this case the columns are going to be the members They're going to be two columns in the result dyads here and each row is a dyad I show you what it looks like just for the first 91 of the 300 dyads implied in this analysis You can see that it starts the first dyad contains households one and two the second one and three and so on and by the time We get to dyad 91 its households five and six and on and on and on This is the kind of thing computers are really good at listing combinations. Great There are 300 of those then the first thing I do is I simulate some friendships between the households I'm going to say just for the sake of this example the 10% of the households are friendships Meaning they're going to share reciprocal ties There are also going to be ties in the simulation that are not reciprocal They just have to do with Other forces that cause social relationships, but these relationships will be directed. There will be cases For example in families where sons bring their mother's resources, but not vice versa So sharing may only flow in one direction. These are also very important in real data sets They're omnipresent in human societies. I'm just going to set a base rate of these for our data across the 300 Diads there will be 600 directed ties And we're going to set approximately 5% of these To be present then I make an empty matrix. Why to hold the social network all the ties this is an in by in matrix and We fill it up. We just loop over all the entries in this matrix We skip the diagonal because that's self-to-self and we assume all household has a good sharing relationship with itself and we simulate ties from Household I to J in each case. There's a bunch of annoying index Witchcraft in this kind of code. You do it different ways in different languages. It's never pleasurable I'm not going to spend time explaining that instead. I want to focus on the important line here the probability of a tie and it comes from Two possible options either that the dyad are friends That's what f bracket the dyad means We look up the particular dyad in the friend vector and if they are friend then they're going to share a tie with the other Or they're not friends And that's the second part after the plus on this line in which case we roll the dice and we give them a directed tie about 5% of the time and Then we sample ties as a 1-0 Bernoulli variable just to keep the concept simple But this could be weighted these these ties don't have to be on and off. They can they can have intensities Okay, now we have a network a simulated network that contains both reciprocal friendships and Directed ties and we're going to simulate some gifts some giving Within this so I'm going to make gifts from a to b within the dyad and gifts from b to a within each dyad And these are just empty vectors of the links of the number of dyads and Then I set some base rates for giving in the absence and presence of a tie and that's this 0.5 and 2 These are going to be plus on rates So gifts are given at an average rate of a half when there's not a tie because there's some giving in the absence of social ties and Then a better rate of two substantially higher when there is a tie And then we just loop through all the dyads and simulate some plus on Results for each by looking up the means in that lambda vector there And that's the generative simulation and now we have some synthetic gift-giving Before we move on to design the statistical model from this Let's plot this there's a network that we've simulated and you can take a look at it if you want we'll take a look at a number of network plots in this lecture and So I've simulated one Simulate I've plotted rather one simulated network here from the code that I've just given you and you can see it there on the right Of this slide try running this code if you don't want to copy it out of the lecture you'll find the script on the on the course website and Play around with the variation you can get change the simulation settings and look at how it changes the shape of the network You can see that in some parts of the network. There are reciprocal ties where the arrows go back and forth between The different nodes and those are those friendships that we've simulated One thing to notice maybe this network looks pretty dense. There's lots of lines most of the connections are missing There are 300 dyads here and most of them don't have relationships with one another Okay Let's move on to the statistical model. We need a statistical model that models the social network ties What does that mean? We're going to make a model and I think this may be the first or second time in the course It's been this way, which is not exactly like the generative model. And the reason is the generative model Had things like friendships and directed Relationships in it and we're going to represent this in a way that tries to uncover that structure But does it in a different way a more anonymous way? It's not going to posit friendships and such specific things It's going to take a more well statistical approach So the first thing we need to do unsurprisingly is we need a Poisson GLM. We've got a Poisson outcome variable It's obviously Poisson is a count starts at zero has no obvious upper limit That's a Poisson variable and so we have this observation g sub a b the gifts from household a to b It's a Poisson variable with some rate lambda a b that we're going to model with a log link And we're going to have a parameter alpha which is the average rate of giving over the whole community across all dyads and Then we're going to have a parameter here t capital T a b You remember this from our DAG which represents the social tie between households a and b or rather from Household a to b. It's a directed tie So this is the tendency for household a to give the household b on the log scale and we're going to estimate this We have the same Symmetric thing to do with the other outcome the other direction within each dyad the gifts from b to a Everything here is the same except you'll notice that it's t b a now the other direction So for each dyad we have two parameters to estimate t a b and t b a and these are parameters They're latent variables, but they're they're structural aspects of the social network and we're going to try to estimate them from the data We need some statistical machinery to do that and You will be completely unsurprised to learn that we're going to use varying effects to do this and why because we want to model The correlation within dyads between the tie strengths I'll say that again We want to model the correlation within each within dyads that is Within dyads looking across the whole population of dyads how correlated is the strength of ties within each dyad? so for example if T a b is a large value is t b a also a large value if t a b is a small value Is t b a also a small value and that would correspond to friendships being important in driving exchange in the network if instead the correlation between The the t parameters here is small near zero Within the community that would indicate that reciprocity within dyads is not important for driving exchange instead It's generalized exchange of some kind Okay, so what we're going to construct is a varying effects Population prior just like in the lectures last week. Well, not just like in the lectures last week. This one's actually simpler It's a it's a covariance prior though because we've got two parameters to draw from a multivariate normal from a bivariate normal So for each dyad we need to have two parameters t a b and t b a these are just the ties in each direction And we're going to give them a covariance matrix on the right hand side here That has a correlation in it The row parameter there that will help us estimate the correlation and therefore the covariance within dyads between the social network ties The thing about this covariance matrix is different than the the general ones we used last week Is it simpler because it's symmetric So a and b are just labels and so everything about the statistics of dyads has to be symmetric So the sigma squares here are the variance among ties And they have to apply equally to household a and household b within each dyad You can't have different variances for household a different variance for household a Then you do for household b it would make no sense And so this actually lets us have fewer parameters in this Then we would in a generalized application and this is a good thing It's also scientifically correct Okay So to to complete our partial pooling for the network ties We need to define some priors for the correlation and i'm going to use the lkj core family of priors that i introduced last week and and there's much more detail on that in the book And for the sigma here the standard deviation in ties That is a measure of how much ties vary in the community We're going to use an exponential and then we use a normal zero one for alpha Let's code it up. So in the code here, there's a little bit of complexity But I introduced this complexity last week Because we're going to use a non-centered parameterization that works much much better in this problem And the reason is because There's not a lot of data For each dyad to inform the dyad parameters And there are a lot of dyads and in that sort of situation typically the prior is very important And we'll get much better sampling if we use a non-centered form of it So that's what we do In contrast if if you've got cases Analysis where you have a very small number of clusters and lots of data for each Then quite often the centered parameterization is better not always but quite often But this is kind of a quintessential case here. We're looking at we have 300 dyads and 600 parameters to estimate and we have Two observations for each dyad All right, so we have 600 observations and more than 600 parameters to estimate the non-centered parameterization is going to be better So the code here in the middle, uh, you you saw some examples of this last week and it looks quite messy, but it's just the non-centered kolesky magic that we use to express the mass stats By very abnormal that you see there in the upper right In code so that the Markov chain can sample it efficiently And it does sample it efficiently and you get a lot of parameters and here's just one screen of some Trace rank plots for all these t parameters that you get you get 600 of them Right and this is because there are 300 dyads and there's a social network tie going in each direction within the dyad So there are 600 social network parameters that are estimated by this model and it does it with grace and style Uh We know the truth under these data, uh, and these estimates because we wrote the truth here We made the generative model. So let's take a little look at this and think about a casual validation of what's going on in this model Uh, the first thing to note on the left here. I'm plotting The posterior distributions of the mean tie strength I remember these are parameters that are estimated For the additional giving or reduced giving in a particular direction within each dyad And I I have two densities here the blue and the red and the red are those ties Where in the simulation we know there's really no tie So the the true social network value there is zero no tie And i'm showing you the the posterior densities in that case You'll see this quite jagged and that's because the simulation is still piling up on particular values that correspond to observed counts in the outcome variable The outcome is discrete and so at the latent level it creates this jagged comb like structure in the posterior distribution This is extremely common in Poisson models Um, it's common because it's correct. It's what the it's what probability theory demands Uh, and then the blue density are those ties where there is in truth a tie And you'll see that on average the model is assigning greater weight to the cases with their ties Um, I'm not showing you the contrast between particular ties, but I'm sure you can believe that the discrimination is not a hundred percent If we had more data and you can try that experiment yourself because you have the simulation Then you can eventually separate these cases very cleanly But in any particular analysis, there is typically a lot of uncertainty at the end About the values of particular social network ties I'm going to revisit that point multiple times this lecture social networks are not observable and often they are Largely uncertain Nevertheless, we can often still get good answers to research questions despite that uncertainty What we must not do is discard or ignore that uncertainty by treating the social network as data On the right of this slide We have a parameter for the correlation within dyads that row parameter And I'm showing its posterior distribution here and you'll see almost all the mass is above zero There's a strong positive association across dyads. That's the influence of the friendships I remember there was 10 of the ties are friendships and they end up being reciprocal And so a lot of the giving in this simulation is reciprocal and that's what we're picking up here by recovering that reciprocity But notice our model doesn't know anything about friendships. It's just estimating the correlation within dyads You can look at the household a tie values plotted against household b within each dyad I'm just showing posterior means here on the plot on the right You'll see this positive correlation that's that uh in the plot that arises from the positive correlation on the left of this slide And in purple here, I've colored the dyads. They're actually friends in in the truth in the generative simulation And you can see they tend to be assigned Higher tie values and the correlation between their ties is quite obvious here Okay, if you were really going to validate this, you'd run a bunch of simulations and Move the parameter settings around a lot and make sure that you can recover The true effects under various scenarios. You'll probably find some scenarios where The data are insufficient to recover the truth for certain strengths or weaknesses of the generative model But we're going to keep moving ahead and fold in the next bit now Before we do that, why don't we go ahead and analyze the real sample? I think that'll be useful to do we're going to take the actual rn doc data now the real 25 households And feed it directly into the model. We just designed no modifications and look at what it says and now We know it's going to give us the wrong answer Whatever answer it gives us because we know the analysis is confounded or rather we believe it is because we you know in the real data There are backdoor paths to the features of the households that we have not accounted for yet in this model So even though our statistical model, uh, just validated on the synthetic data The synthetic data is still missing important causes that we eventually want to fold in But it'll be nice to see what happens In the real data set right now So you can later on appreciate how the estimates change when we account for the backdoor paths So we just Set up a new data list with the real cost or lekkie data And then put it right into the same formula and i'm not repeating the big formula here because I have it Saved in an object f underscore dyad so I can just pass that formula list to oolong as you see here and run the thing now I don't like to look at pracey tables for models of this sort because they have hundreds of parameters and huge amounts of clutter But in this case I wanted to show you one example at least to show you what's going on Uh the correlation matrix parameter here you're going to get four entries and you you saw this already in the examples last week And it's just the the one two two one entries that are the correlation Parameter that are being estimated and the other things that have the nann our hats and nf's they're constant And that's why they have the not a number results for a number of effective parameters in our hat four That's that's okay. There's nothing wrong. That's supposed to happen Just look at the the off diagonal elements in the correlation matrix And you'll see here that we have this positive posterior means at point three five And the whole posterior distribution is is above zero so indicating a positive result It's not nearly as positive as our simulation was but this is not a simulation. This is real data So the question is how is this going to change when we account for the confounding from household features? It's going to change so keep this in your mind point three five And we'll come back to this after we revisit the statistical model And what does that mean revisit the statistical model? What that means is the blue pass on this slide are the backdoor pass and we block them by stratifying or conditioning on The features of household a and household b within each dyad to and that To think causally about that and what that does is that's going to not just block the backdoor pass But it'll also allow us to estimate the direct effects on giving of the household features themselves That is dispositions of the households and things of that kind generalized giving that has nothing to do with the other household Yeah, and their social ties and that's going to turn out to be really important in this data set But let's take a break that was already a lot I encourage you to take a quick review of the slides up to this point and then take a break Have a beverage Go for a walk and whenever you come back, I'll still be here. Welcome back Let's get back into the model. We're going to draw the social owl and fold in another aspect of the generative model First we'll do it in the simulation And then we'll incorporate the same assumptions into the statistical model Our goal here is to modify the generative simulation so that we can account for the features of households Which will block the backdoor pass through the households as well as give us estimates of the household features as well So This simulation code here is exactly the same as in the first half of this lecture We're still generating a social network. There are reciprocal ties among friends And then there are non reciprocal ties as well Nothing has changed in this what we're going to add to it Is that households vary in their overall levels of wealth? This is what we're simulating at the top of this block And wealth will influence how much they give and how much they receive So you can see at the top part of highlighted in red We're simulating a standardized relative wealth in the community with the variable w It's for each household And we're defining two effects B w g is the effect of wealth on giving and this is positive 0.5. So this means that richer households give more in general And then there's b w r the effect of wealth on receiving and this is negative Which means that richer households receive less in general or another way to say this poorer households receive more And then we simulate the gifts as before except there's a modification as we've added these causal effects of wealth To the random Poisson sampling at the bottom down there and you'll see that for household a in when it's gifts a to b it's the Giving effect that matters for households a's wealth effects giving and household b Their wealth effects receiving and then the reverse when it's gifts b to a So these are general causal effects which are not features of the diads but features of the households independent of one another And this is the back door that we talked about before or it's an example of the way that a household feature Creates a confound for trying to understand the effects of the social network ties Let's modify the statistical model to incorporate this. Here's the statistical model from the first half of the lecture Just as it was then To remind you, uh, it's essentially two simultaneous Poisson regressions for each direction in the dyad and we have a Population distribution of varying effects for each dyad the the social network ties in each dyad Which allows us to estimate the correlation Within each dyad as well and and to do so with partial pooling To account for the fact that there's often not much evidence For each individual dyad, but there's a lot of total evidence in the whole sample And and through partial pooling through the varying effect strategy We can leverage the total evidence to improve Each estimate within each dyad What we're going to do to modify this is modify the linear model And then add a few more distributions to support it The first thing we need to do is augment the linear model to include generalized giving g sub a and generalized receiving r sub b This is generalized giving of household a because this is lambda a b refers to the rate at which household a gives the household b And generalized receiving b So when I say these are generalized is they don't depend upon b Uh, rather the giving from a doesn't depend upon b And the receiving of b does not depend upon a this is what the expectation of the Deviation and giving and receiving uh that these households Cause for any dyad they're in and that's what we mean by generalized There are aspects of the households and not of the particular dyad These will be parameters, uh, we insert them as well in, uh The giving of b to a but of course it's reversed now It's the giving of b and the receiving of a as you see in the two linear models highlighted in red And then we're going to need to define a prior distribution For these g and r the giving and receiving parameters They're going to be varying effects that are clustered on households So they're very much like the kinds of varying effects we did last week Um, ordinary, uh multivariate normal prior Uh, where each household a for example Has two parameters associated with it and there's some covariance structure between them And this is a typical covariance structure where the variation in g and the variation in r can be different So they have their own standard deviation parameters sigma sub g and sigma sub r There's still a correlation between them It'll be convenient to write this prior prior distribution like the ones I wrote last week Oh, sorry. I should have put this text up The vector on the left is a's giving and receiving vector And then we have a covariance matrix for the household giving and receiving It'll be convenient to rewrite this like we did last week because we'll write the stat model Using a separate correlation matrix and vector of standard deviations And so this is just another way to think about parameterizing a multivariate normal And we can put it all together We need prior distributions for that correlation matrix and vector of standard deviations And I'll use the conventional ones that I've been using in all the examples so far Okay, this is a pretty big model at least by parameter count one of the biggest ones in the course To summarize it going from the top there are 25 households that implies 300 dyads all the pairwise combinations of the households and 600 gift observations that we're analyzing and each of These observations is the number of gift transfers from particular household a to b In a year in our undock And then in blue we have all the social network parameters There are 602 social network parameters There's 300 dyads each dyad has two directed network ties the t parameters there on the left And then there are two parameters for the social network that manage the partial pooling and this is the sigma in the row That's the way to use the whole sample to improve the estimate of each social network tie And then at the bottom in red there are 53 household parameters 25 households Each of them has a g and r parameter is generalized giving and generalized receiving And then there are three parameters that manage the partial pooling For the household features two standard deviations and a correlation We can make the code and as you might expect this means we need to add yet another population prior for partial pooling And we're going to do it again with the non-centered Parameterization going from the top. We have the code that corresponds to the linear models and the Poisson Prior probabilities for the giving observations. You'll see I've just added the g r Parameters to this line the thing that's a bit different is g r is a vector of length two for each household in the code and so you'll see that I'm pulling out For household a element one of that vector. That's the g part the giving and then for household b Just after it there in lambda a b Element two which is the r part. I understand it's a bit confusing But this is a very convenient way to do Vectors of varying effects is to use their indexes to pull them into the right part of the model Then we have the diet effects. This is completely unchanged from the first half of the lecture And now another Partial pooling prior distribution for the giving and receiving parameters And this is the same as the examples from last week. No new tricks. It's just this odd looking on centered parameterization code Okay, let's analyze the simulated data. And this is now simulated data that has generalized giving and receiving And what comes out of this when I'm plotting on the left of this slide Are the individual households the posterior means for the households and the ellipses show the I think that's a 50% Compatibility region for each household estimate of the combination of their generalized giving That's the g parameters and they're generalized receiving on the vertical. That's the r parameters And you'll see that these are negatively correlated because The the generative simulation that I wrote implies that it didn't have the covariance matrix as part of the simulation But what the simulation had was simulated wealth And the effects of the wealth work in opposite directions for rich households wealth leads them to give more and so Yeah And for poor households wealth leads them to receive more and that results in this negative association between giving and receiving that you're seeing in the posterior distribution here where the household there near the bottom of the plot To the farthest to the right and generalized giving that's one of the wealthiest households And it receives the least on average of all the households because it's wealthy And so we just assume that in the simulation And then there are a larger number of poorer households They tend to give less but the poorest ones receive a lot And you can see in the blue density. I'm just showing you the Correlation parameter Between giving and receiving that we estimated as a consequence of the partial pooling between the generalized giving and receiving parameters And you'll see it's pushed up against the bottom against minus one And again, this parameter wasn't part of the generalized simulation, but it was implied It's measuring that the association between generalized giving and receiving We can look at the other aspects of the simulation as we did Previously in the first half of the lecture Look at the discrimination of true ties and false ties in red and blue And again, this is very similar to what we had before. I think it's marginally better Um The the models recovering friends with some accuracy they're shown in purple It's doing much better with the correlation now between Within households showing the nature of the importance of friendship on the data And um, and then in the bottom right the uh, rest of bound of reciprocity implied by the simulation It's pushed all the way up against one very very high levels of reciprocity. Why is it so high? The reason is because once you account for generalized giving and receiving That's what that red density in the lower right is. I'll say that again The red density in the lower right is having stratified by by general giving and receiving How much reciprocity is there within dyads and the answer is Almost all of the exchange after you account for generalized giving and receiving is due to friendships And that's exactly what the simulation assumed and that's what we're getting back From the statistical model Okay, and now the real data Uh, we can look at it very similarly, but in this case, we don't know the truth, right? We don't we don't know the gen generative process that produces the real patterns of exchange in arndach But we will interpret it through the light of imagining a range of of generative models So you see the analogous plot for the households here There's also a negative distribution between giving and receiving But it's not as extreme as it was in our particular synthetic example that I used And you can see the blue posterior distribution again negative almost all of the posterior mass is below zero the the Households that give the most receive the least and vice versa We haven't put wealth in this model, right? So we don't know if this is due to wealth in this case We're going to take a look at that in a little bit though. So hang on I show you we don't know the true ties in this case Instead i'm just showing you the social network and we'll we'll think about that a little bit more in a moment You can see the dyadic ties now show Huge amount of reciprocity almost all the posterior mass is right up against the maximum So What we're finding here very similar to the synthetic simulation The model seems to think that after you account for generalized giving and receiving all of the remaining exchange Is is reciprocal. It's due to these quote unquote friendships Of course, we don't know if they're friendships, but it's very balanced after you account For households that in general give a lot to all other households and households that in general receive a lot from other households Once you've stratified by those differences accounted for them The remaining exchange is extremely balanced in this community Let's think a little bit more about this social network What i'm showing you here is just the posterior mean network because of course, we don't know the network The network is not data. I keep saying that There's a posterior distribution of networks that we get as a result of running a model like this A large number of networks an infinite number in fact each of them Waited by its relative plausibility compared to all the others And we can sample networks from the posterior distribution and think about their compatibility regions and so on Here this is not an easy thing to do because the network is a complicated structure Here i'm animating some draws from the posterior distribution to show you the variation And what I want you to see on the right and part of this slide as the animation plays Is that there are components of the network which are extremely stable these little cliques of reciprocal households in the lower left And the bottom in the upper right And then there are ties between those little cliques But the model has much less confidence about those What does that mean? It has much less confidence that those are really regular features of the community's pattern of exchange Remember the social network here doesn't really exist. It is a latent construct that is meant to capture regular features of patterns of exchange in the community And help us make better predictions and causal inferences The real patterns of exchange aren't due to this abstraction that we've drawn This is like a compression of the complex patterns of behavioral exchange that we have measured in the data But you can see that there are things that we can be quite confident about it in terms of reciprocal exchange between certain sets of households It's very important to keep in mind if you want to do Calculate anything from the social network Uh like centrality or in-degree and out-degree or betweenness and there's a huge number of network statistics that people like to calculate from networks Since we don't know the network We don't know those statistics the network is uncertain And so are all of the network statistics you might want to calculate from it The network has a posterior distribution And so does its centrality and its betweenness and everything else that you might want to calculate Any analysis which is treating the network as known is discarding uncertainty And risking quite damaging overconfidence. We have the technology to do better now and we should Okay, let me try to summarize that a little bit and make some progressive points about what we're going to do next I keep saying or I just said social networks don't exist That's not a really shocking claim. Of course, they don't exist. They're abstractions. They're structural abstractions There are networks that do exist like the internet or the plumbing in the building you live in Those are actual networks in the sense that for anything to travel along the network It really has to go along the edges and through the nodes that are in its structure Social networks aren't like that There are statistical abstractions that represent at the minimum dyadic connections possibly also triadic or higher order connections But they're they're data compression schemes. They're varying effects And like all varying effects, they're placeholders for other causes They're like anonymous bookmarks for for Entities that generate variation in the data And what we'd like to do having estimated a social network Like with all varying effects and social networks are varying effects. They're structured varying effects Is to try to explain them away Through measured things about the community So in this case, for example, we can model the network ties the teas That represent friendships and directed unreciprocated ties Using features of the dyads Why do some dyads have reciprocal ties? Is there something we can learn about the relationships about those dyads to Explain away these varying effects in a sense And then for the features of the households that is the generalized giving and receiving What explains all the variation in that and there's a lot And we're going to look at wealth because that's what we suggested in the generative model There are things about social networks though that are in a sense endogenously causal relationships do exist that is people have States in their brains which lead them to treat other people differently and those states are often shared between brains And relationships can cause other relationships. So for example, if I'm friends with somebody And somebody else is also friends with that person Then there's a greatly increased chance that I will become friends with the other person who is friends with my friend I know that sounded a bit funny. I'll explain that again later in the in the lecture So I don't want to dismiss the idea that just because social networks don't exist in their abstractions That social relationships themselves can't be causes because of course they can and that's worth studying And that's a different issue than what we're going to do in your remainder of the lecture Okay We need to modify this model one last time and what we're going to do Is we're going to take the linear model again and we're going to expand it to try to Explain away some of the variation that's explained by the varying effects on the idea that the varying effects are just place holders Anonymous place holders for other causes that are generating variation So i'm going to take each of the parameters in this other than alpha alpha will remain the same And i'm going to just turn them into their own linear models So now we have fancy t sub a b and fancy g sub a and fancy r sub b And these are going to represent linear models for the ties between a and b And the generalized giving of a and the generalized receiving of b. Let's just start with The tie from a to b. This is going to be a function now of a varying effect the same parameter we had before just this partially pooled tie parameter from a to b parameterized exactly as it was before But now we're going to add To this linear model for the tie strength from a to b the effect of what's called the association Index between a and b and this is a variable in the cost or leaky data the association index speaking crudely Is a measure of how much time individuals from household a spend with individuals from household b It's it's independent of the gifts. It's the gifts are not included in this measure. It's other things. It's socialization It's working together. It's other kinds of activities And we might expect that this has something to do With causing gifts or at least generating an association But we're going to look and and we have a coefficient beta sub a for that will attempt to measure the association between The size of an association index for particular dyad a b and the ties within that dyad Same kind of strategy for generalized giving of household a here We have a linear model now for the generalized giving of household a and it's a function of the same parameter as before g sub a Parameterized exactly as before with the same partial pooling prior there on the left And again, we have Added something to this linear model to try to explain away some of the variation from the varying effect Now we're adding wealth. So w sub a is the wealth of household a And beta sub w g is the effect of wealth on giving And then we have the analogous thing for generalized receiving for household b There's the varying effect r sub b again parameterized exactly as before and now the wealth of household b Its effect is we're going to try to measure it with a different beta coefficient beta sub w r This is the effect of wealth on receiving And we can put all this right back in the model And we need to do the same thing for the other direction in the dyad from b to a with all of the Necessary mutations made to flip a's to b's in the right places, right because b becomes a Everywhere in the bottom and a becomes b everywhere in the bottom so that's symmetric This model is everywhere symmetric because a and b are just labels and We also add priors for the new wealth and Association coefficients And i'm just showing you the part of the new model code that's relevant here. The rest is unchanged Where to make it convenient in ulam? You can define little temporary symbols in your linear model That represent other linear models. And so you'll see in the code block here The a to b code block in the log lambda line. We have a which is alpha plus tab Plus ga plus rb and then each of those symbols tab ga and rb is its own linear model just below in which I have the parameters The coefficients and the data that are relevant to each case and then the same for b to a You don't have to do it that way. You could put it all in the same long linear model But for me at least this makes things simpler to read and it helps reduce error Okay We can run this model and I give you the full code for it in the script file for this lecture I encourage you to give it a try. It doesn't actually take that long to run And what you can get out of it are well Posterior distributions for all the relevant bits. The first thing I want to do Is make a comparison between this model and the previous one Before we added the before we added wealth in the association index And so what we're looking at here is the standard deviation sigma of the ties within dyads, right? So this is how much the strength of ties varies in the community And in red I show you the previous result before adding the association index to the model Now we've added the association index and the standard deviation has gone down And that doesn't mean of course that there's any less variation in the data What it means is the association association index has statistically explained only statistically Variation in in ties and so there's less for the varying effect parameters t to explain That's all it means is the standard effect that has happened in in Earlier models with varying effects as we add predictors you expect the variation Components of the model to get smaller Let's look at the receiving and giving effects from the new model And these are the beta coefficients for giving and receiving so in red that's beta Wg the effect of wealth on giving you'll see it's slightly positive, but it's incredibly uncertain It spans a huge range. It could be mildly negative or it could be hugely positive Not a lot of certainty, but it tends towards being slightly positive And then in contrast receiving in blue there that's beta wr the effect of wealth on receiving in general Is is negative almost all the posterior masses below zero And so as it really looks like In our undock some of the differences in how much households receive is due to the fact That they need more that they have less wealth than others and wealthier households appear to give a little more although it's quite quite variable And Poor households certainly receive more Okay So things that we haven't done in this model and we're not going to have time to do in this lecture unfortunately I apologize, but are but are equally important are the other structural features of the model All we've analyzed are dyads but It's something of a truism in parts of sociology that Society is built out of triangles, which is to say that two people is not a society But three is yeah, then you have a society. It's a bit of a joke, but it makes a lot of sense in in the sense that Triadic and higher order relationships are much more complicated and social networks can contain a lot of that structure as well And human societies definitely reflect much more social complexity than dyadic relationships We're aware of other people's relationships and that's triadic awareness so Sometimes what you'll see and and this can arise for all the reasons I I briefly mentioned before that relationships can really be causal you can become friends with somebody Simply because your friends are friends with them And this is something that's something called triangle closure And I try to represent this schematically on this slide on the left there in the middle if if A is friends with C and B is also friends with C then it's more likely that A and B will become friends Yeah, or maybe enemies, but they'll have some kind of relationship And there are a number of ways that social network models try to get at these things because they're just as important as the dyadic Relationships when you want some sort of partially pooled estimate of triangles as well and higher order relationships and cliques and clustering in the data and there are a number of different strategies One of them increasingly common and easy to work with statistically is called a block model or a stochastic block model In in block models ties are more common within certain groups within the data and those groups can represent things you've observed like families or offices or Stam tish But they can also be unobserved latent things you can simply assert that there are a certain number of blocks in the community And then try to detect them. This is a somebody's called community detection As well and those things are also done using partial pooling estimators Okay, let me say just a couple more things before ending this lecture Um I keep saying that the social network is partially pooled and it'd be nice to have a little bit more intuition about that So on this slide, I'm showing you on the left and the blue network. This is the raw data I've just taken the counts of gifts and built a network out of it And you'll see that it has a bunch of connections and some of them are darker than the others because the counts are higher and uh But it's it's completely Unregularized in the sense that this is just a raw empirical estimate To build a network out of just the sample is to take for granted that there's no variation To be expected in the exact gifts that have been observed That represents the expectation and those are the true relationships And then furthermore all of the gifts observed are due to social ties But I think I've I've convinced you I hope I have Through the modeling in this lecture that that's not a good assumption that there are also features of the nodes that are general to behavior With other nodes that have nothing to do with the particular dyadic relationships among among the nodes among the households I should say And so what we've inferred instead on the right is a regularized network And this is just a posterior mean. I've had the the animations up before so you know it varies But all the samples from the posterior are much sparser Than the raw data because that's what regularization does. It's skeptical And for the cases where there's only a couple gifts between a pair of households in the whole year What regularization does what partial pooling does Is it it says that well, I'm going to bet that that's not a regular feature of exchange in this community It's just something that happens once in a while But those communities don't share a social tie in terms of what we're trying to estimate which is stable exchange relationships Let me try to summarize that a bit here is what I've just said Social networks are abstractions. I keep saying that they don't exist But they're incredibly important statistical technologies We try to use them to discover and express the regularities of the data The data is very high dimension and complicated and it contains noise and irregular features That will not help us predict future samples nor help us learn causal effects about how the community functions And so we use regularization skeptical forms of estimation when we infer the networks In this sense, the social network is regularized just like varying effects last week But it's a very complicated structured varying effect, right? It's it's has a much more Structured to it than the examples we looked at last week There are lots of varying effect problems that are analogous to this Like phylogenies phylogenies are simpler than social networks because they branch But there are also networks And they're also not observable And they're also uncertain And if we estimate them with partial pooling, we expect to do better Spatial problems are analogous in a number of ways as well Entities in your data can share unabsorbed confounds because they're close to one another And this is called spatial confounding And we're going to work on that in the next lecture actually talk about that heritability how Organisms who are closely related can share complex patterns of phenotypic covariance This is also a problem that has to do with with regularization Knowledge something we often want to do is give tests to people to see how much they know We do it with students, but we also do it with when we do research Because we're trying to assess how much people have learned about a topic or how much they believe about a topic This is also not an observable thing and knowledge is highly structured at least sometimes it has different latent dimensions to it And when we try to estimate how much people know Or what they believe We also use regularization and varying effects Families of models known as item response models or irt item response theory And personality measurement likewise personality Like social networks doesn't really exist. It's just a set of statistical regularities that we that we compress out of high dimensional data And you used to try to predict the behavior of people and other animals in the future And also try to understand the causes of variation between people in these statistical abstractions Okay, but we've got more statistical problems that arise from these analogies What happens when the clusters in the data are not discreet? But instead continuous So households are discreet Tadpole tanks are discreet Departments are discreet and all the examples we've looked at so far The varying effects are clustered by discreet entities like that Diads households departments tanks But there are kinds of clusters implied by some of the examples I just gave which are actually continuous instead like age Or distance or time or just similarity And in these cases there are no clear discreet categories and also The closer values are together The more partial pooling you'd like to do so for example ages that are closer together will share more unobserved causes Locations that are closer together will be more similar than locations further apart Times closer together and obviously objects more similar may share similar confounds In these cases we want some kind of structured partial pooling That takes that into account takes distance into account somehow And that's what we're going to do in the next lecture Okay, this has been a lecture 15 on social networks for statistical rethinking 2022 And week eight In the next lecture we'll look at Gaussian processes. This is material that's covered in chapter 14 It's good to Follow along in the book because there's a lot of detail that I just can't get to in the lecture They'd be too long. I apologize Anyway, I hope you'll join me for the next lecture and we'll talk about Gaussian processes Which are an increasingly popular and powerful form of statistical inference. I'll see you there