 Today I'm going to talk about, so I have a background in demography and sociology, and I'm going to talk about how you can use molecular genetic data or whole genome data, but particularly from a social science approach. So I do have to say we're funded by NCRM, our SOCGEN project, and that grew out of ERC project that I had called Sociogenome. So we were doing a lot of genetic research, and I'll show you that from Sociogenome, but what we realized what we were really missing was teaching material and actually describing what we're doing and building some statistical models that could fit with our sociological and demographic questions. So the MCRM has funded that. We had a summer school last year on molecular genetics and social sciences, and we promised to do one, but we've got about 20 requests about when is it being held again? Are you doing it again? So we'll probably hold it again this year too because of the demand of people asking. We're also producing a textbook. So everything I'm showing you how to do today, we're producing a textbook that will have a lot of online open source information about how to do this. We're very pleased that we have a donation of 9,000 people who've agreed to use, we can use their genetic data for the textbook. And I can talk to you a little bit more about crowd sourcing genetic data and citizen science if you're interested, because that's another project that we're working on as well. So you'll be able to use that textbook to do some of the things that we're doing today. It'll be open source. You can use actual large-scale genetic data. And I just wanted to say that it's going to be a very introductory textbook. So if anyone's seen my, and I'm sure you haven't, but if you've seen my introduction to survival and event history analysis in R, I've written a textbook. And my mother tested the introduction to R chapter. She had some problems with the confidence intervals around the bar graphs, but that's okay, you know? So I just wanted to say that we produce things in a very accessible level. My mom's smart. I don't mean it like that, but she's the only one that will watch this video as well. But so I didn't mean it. Of course, there'll be many people watching it. But we go through things that you often don't encounter. So you get this large-scale genetic data, but it's huge. So the UK Biobank is seven terabytes. What do you do with that? How do you download it? You need to work with a cluster computer, all of these things. So we have a very accessible sort of introduction about how do you make those different steps? Because often we get these little mini, well-prepared data sets, and so we go through the sort of blueprint thing in the textbook. So look for that. I'm not alone. I work together with a team. Michela has stolen Nicola Barban, who I've worked with for the last seven years. He's now in Essex, but that's great. We're happy for him. But I work together a lot with Nicola for the last years, Felix, and there's many different people in the team. So it's a real group effort. It's also my Brexit slide to show we've got Charlie. So if we don't have freedom of movement, my team will be influenced by this. OK. So just to say a little bit of a background, and Michela's already talked about it, but what do I mean by genetics? Because I thought this was more of an introductory sort of session. So I'll just back up a bit. And throughout my presentation, if you have any questions or comments, just put your hand up, and it's really fine. We can stop, and you can ask questions. So I think some of you have heard about behavioral genetics. So those are the twin studies. And I'm sure some of you have heard of these twin studies. And what they do is they compare monozygotic twins that have almost 100% genetic similarity with dizygotic twins. Those are like siblings. So they can share between 40% to 60% of the genetic material. And from that, they try to partition out the variance. So you're R squared, and how much you can explain. And how much is related to genes? Your shared and family environment from growing up in the same household, or the same neighborhood and environment, and your unshared environment. So things such as your partner, or when you move further on in your life, and a large error term as well. So this is where the field was for very, very many years. And it's still a very active field. But I'm not doing that. I'll talk a little bit about twin studies. What I'm talking about is molecular genetics. And that's where we actually look at the whole genome data. We look at something called single nucleotide polymorphisms. But in short, we usually call them SNPs. And these SNPs vary. And this is where we examine. So we're about 90% similar. But there's on these small AGTC variants, we can see these differences. And then what we do is we look at the differences between individuals. But then we want to see, OK, well, what do these genes do? What's the function and the structure of them? So that's the difference between this behavioral genetics and molecular genetics. So we can think about heritability of traits. And I'm sure some of you have heard some of these things. There's a really broad difference in what has a genetic basis. And this is based on SNP heritability, so whole genome data. When studies have found, what we can see is that things such as eye color, freckles, height, these are largely have a strong genetic basis. But as we get further into complex traits, these are my kids, by the way. When we get into complex traits, the age at first birth, number of children, type 2 diabetes, BMI, we see larger confidence intervals. But we also see a difference and a lower genetic basis. We're going to talk about, well, why is that? We'll continue talking about that. So when I first started entering into this, I thought, why am I even doing this? Can we ignore some of this genetic research? And I'm going to say yes and no, actually, in my presentation. So maybe you thought I was going to talk about all the praises of it. But you'll see towards the end, there are some critical sides to this research as well, too. And I think as social scientists, that's why we're here, to really have an open mind and be critical. So there was a lot of studies in the 90s and the 2000s where they based it on candidate gene studies. And some of you may be familiar with this. And I think it's just important to reflect on it. These were published in major journals in psychology, but also the American Journal of Sociology Population Studies. Some of the major social science journals continue to publish these studies. They were often hypotheses that were based largely on animal models about things thinking about dopamine and serotonin and reward functions. And they compared people that had certain genes with those that didn't, so certain genetic loci. It was easily conducted. And many of the studies collected about six or 10 markers. So this is like at the end of the 90s, beginning of the 2000s. It produced a lot of bad research. And Patrick asked about that. And I'll talk about it a little bit later as well, too. So there were these studies that came out. There was a criminologist, Beaver, who found the gangster gene. And that was published later as somebody looked. And it's like, well, 40% of the population has that. So it might not be. And there's these human warrior genes and aggression genes. And there's been a lot of studies that just haven't been able to be replicated. So I think it's really important when you're serving the literature and you look at some of these older studies that look at one or two candidate genes, look and make sure, see, have they been replicated? Many have not. I'm sort of picking a sample of gangsters and comparing them to whatever your group is and saying, can we see that this group has a particular propensity to have this gene? Yeah, so basically, it's like a case control model. And you would see people that, so you look at what they call in genetics a phenotype, what we call a dependent variable or an outcome. And they see, OK, people that have these genetic markers or these candidate genes have a higher likelihood to be in a gang. So that's what they found. But it was in a very small sample. And it hasn't been replicated. And if 40% of the population has that, I mean, we'll talk about gene environment interaction. Maybe we all have the gangster inside of us. But it's, you know, and we can talk about that. But those studies just weren't replicated. If you're interested, you can look at, there's been more, but Duncan and Keller sort of did a analysis of some of these early studies. And they showed that they often focused on one sort of small set of SNPs. They were very selective, so they had selective populations. They rarely replicated. So about 10 replicated, 27, no replication at all. And then this one purely negative. So most of the studies, they just could not replicate. They were done on one population. You transferred it, tried to do it to another population, didn't find the same thing. There was also a strong publication bias. So, you know, that people have the gene for, or this or that. And that influenced the results. So the field has gone through quite, and you might have heard of this, and people say, oh, that stuff is nonsense. Yeah, some of it is. And this led, and this is amazing to me, but this led the editor of Behavior Genetics in 2012 to say the following, behavior genetics literature has become confusing. And it now seems likely that many of the published findings of the last decade are wrong or misleading and have not contributed to real advances in knowledge. I mean, I was editor of European Sociological Review for about six years. Can you imagine if I published that? I mean, it's just, it's amazing. It's amazing, OK? So the field really recognized, and this is a really high impact top journal, that, you know. So this is a field that moves quite quickly, and it's a field where if you're going to work in an interdisciplinary manner, you really have to talk with the geneticists and understand the shape of the field. So social sciences, unfortunately, got a bit of a bad name as well for some of these studies. But now there's demands. If you do something, you have to replicate it. And a study I'm about to show you, you have to post your results on open science framework or your analysis plan on open science framework first so you don't look like you're fishing, you know. So you post your analysis in advance and say, I'm going to do this. We have two quality control centers where we both analyze the data separately and then we come together to see if we have the same results. It'd be great to do this in social sciences, by the way. So it's a really a much more careful analysis now. So why have we avoided this for so long? Well, it has a very dark history when you think about genetics and eugenics. And I think I always think it's really important to focus on this. And I haven't focused too much on ethics, but maybe we can come to that later in the panel discussion. So it's got a very long history. And also, I think this was raised in the last discussion. There's been some people that have used genetics as a basis to make some claims. I'm sure some of you have heard of the bell curve. They were arguing that social stratification is based on innate genetic endowment. So that would make all of us sort of useless social scientists, because you'd never be able to have a policy intervention if everything's genetic. I hope to convince you by the end of this, but you're social sciences, so maybe I don't have to. Not everything is fully genetic, and it's much more complicated. But I also think, and I'm grateful to the NCRM, because there's a lack of interdisciplinary training. I mean, how do you even approach this? So that's why we're trying to spell it out as well. So we've ignored it because there's an allergy to it for good reason, but also because we're just not trained to be able to know how to handle this data. So there's the other side. So I think there are some things you can ignore and some approaches that are not helpful or scientific. But I think there's actually some very interesting things that we can look at as well. So social behavior and outcomes, things that we study, such as social mobility, educational attainment, fertility, which I study quite a bit, well-being. We generally study them in relation to this and sometimes this. So we think of things in our action, at least as a sociologist, we think of our action and behavior as situated within certain childcare availabilities or laws or contraception regulations or housing markets. These are the kind of institutional, structural things that we think are important and they are important. We also think in terms of choice and individual characteristics and personality and drive that individuals have. So we've looked a lot at this, but there's this other side of the literature that argues that there is a biological basis for behavior. And I remember meeting with a colleague of mine who's from genetics, this was many years ago, and he said, oh, what do you study? And I said, oh, I look at reproduction, age at first birth and number of children. And he said, oh, that's interesting, I do that too. So we started talking. And by the end of the conversation, he said, you're entirely socially deterministic. You think, and then we went out for some drinks and his friends joined and he said, oh, I want to introduce you to Melinda. She thinks that fertility has no biological basis and everybody laughed, huh? So, and it was funny. So why on earth, I was looking at reproduction, why on earth had I looked at it in terms of just availability of childcare and work life reconciliation and all of these things and gender equality? Why hadn't I considered, particularly with postponement, that there could be some biological factors? And if you look into the literature, it's relatively sparse looking at biology. So one of the examples I'm going to show you that I'll focus in more detail is about some of the things we look at. So the timing of when you have your first child and a number of children ever born. And I'll talk also about education as well because that's another study where there's been more genetic work on. So there's been a strong postponement in the age at first birth. This is women's age at mean age at first birth by different countries. So we see from the 1970s until 2012 and it's even gone up, we see about a four to six year postponement in having a first child. So that would make us think, okay, well, at a certain age, biological and genetic factors may become more important than they have in the past. And what's even more interesting is, I'll show you in a minute, it's likely just as if not more important for men, which we often don't look at. We've also had, if we look from the 1900s to, these are people born in the 1900s to those born in 1970, we see this sort of U-shape, which is very interesting historically in terms of childlessness. So this shows the European average, but we see that childlessness, so people that don't have children by the end of their reproductive period, it's up to 25% or 20% in many societies. And it was also very high as well at the turn of the century. So there's some historical factors, and I'll come back to this too, but there's other reasons to think, wow, why do we have this growth in people not having children? So if you're interested in, I'm gonna talk now about some of the methods. If you're interested in, I think an accessible introduction to them, we have a short article in the proceedings of the National Academy of Science where we outline some of these techniques. So we talk about how we used to do the classic design to twin and family, and now I'm going to move to talk about gremal methods and polygenic scores from genetic wide association searches. It's always a mouthful, all of them. Okay. I haven't put the mathematics into it, but if you're interested in that, I can also give you some sources as well. I just thought it would be nice to have an introductory talk about it. So gremal is really interesting. This is a genetic restricted maximum likelihood estimator and it's done in our package called GCTA. Now, basically what this does is, instead of using twin data where you compare monozygotic twins, 100% related to dizygotic twins, variation in relatedness, it allows you to look at unrelated individuals in a data set to see how much they might be related. So we might be 55% related and Gabby and I might be 30%, and it's nothing, we might be 30% related. Who knows? So that gives you an idea, but we're all in the same data set. And then I'm going to talk to you about genome wide association studies. And from those, we draw what's called polygenic scores. So does anybody have any questions or things they'd like to ask? Not about these techniques, don't worry. Okay, just put up your hand if you want. Okay, so in 2015, we did what's called this gremal or analysis where we looked at whole genome data from various populations and we wanted to see what is the heritability of age at first birth and number of children ever born? So let's do some guesses. What do you think the genetic variants explained or the genetic heritability of age at first birth would be? I showed you some of the heritabilities. Oh yeah, I kind of showed you a little bit. For those of you who are paying attention. So what do you think? 40%? 40%? Anybody else? 20%? What about number of children ever born? Do you think there'll be a difference between those? So it's when you have your children and the number of children you have? No? Could be some differences? Okay, this is what we found. Ta-da. 15% of when you had your first child could be explained by the SNP heritability and around 10% for number of children ever born. So this seems to be rather high. So we, but remember with the gremal methods, the GCTA, the analysis we did, we can just say similar to twin studies that it might be 15% or 10% heritable, but don't really know what's going on. So this was actually a rather, this finding got quite a bit of attention because we didn't know that before for these kind of behaviors that they would have this high of a genetic basis. So we got all excited. But then the journalist and everybody said, okay, it's 15% or 10%, what are the actual genes? And do they do anything? So what are you actually finding? And I mean, this is where you have to dig a little bit deeper. So what we did is we worked on what's called, excuse me, a genetic wide association study. So this beautiful graph is, it shows all the different chromosomes and it shows where there's been a genetic loci located. So in chromosome one, we see there's been all of these different genetic loci located things, oh, it's really hard to see. Sorry, things related to cardiovascular disease, cancer and everything. So you can see all of these different traits. And to date, there's been about 3,100, so over 3,000 of these genetic wide studies. So if you see, genes found for this are implicated in that, it's these studies and they have a catalog where you can look it up and see if there's genetic basis found for certain traits. We have one of these dots now, no, 13 actually. But so what is it? It's you look across the human genome, so the whole genome, this molecular genetic data and you wanna identify associations. So it's what Michaela said very nicely. It's zero one, it's dummy variables. So I remember asking one of my colleagues, I was so excited, I'm gonna enter this new field and can I look at the data and you look at it and it's a whole bunch of dummy zero one and you think, oh, it's a massive amounts of data, but you're looking at correlations along the genome. So what I showed you here, zero one with your outcome. So in my case, the one that we conducted, one of the ones was age at first birth and number of children ever born. So you look for associations. So the goal is to find an association between your whole genome with your phenotype, which they call it in our language, outcome or dependent variable. And you look at the statistical association between that. And then what you do is you take the results and this is what we did and you turn it over to experts in biology and molecular genetics and bioinformatics. They take the results, they have a pipeline, I'll show you it in a minute and they look at what the function is of the genes. So some early social scientists thought they could do that part. And I would strongly advise against it. I was saying before that I got all excited and I was talking to the biologist and then I called him two months later and he said, oh, that technique. Oh, no, no, that one's so last month. So the techniques also just are very rapid and you'd never be able to keep up with that those kind of rapid techniques. So why do we need it? Well, I think what we've realized now is it's never one gene. So it's always polygenic, so multiple genetic loci that are predicting behavior. Certainly for some diseases, there are Mendelian diseases, so Huntington's disease. There is a gene mutation. But for most complex outcomes that social scientists look at, educational attainment, well-being, depression, neuroticism, agent first birth, all of these things, they're complex or what they call distal. So far away out comes from biology. So it's usually multiple genes. There's a great paper that came out a few months ago saying, well, it's probably omnigenic, which means that all of the genes might have an influence. So the field moves very quickly. The cost of genotyping has been reduced and we've had an explosion of biosocial data sets. And I was so happy to see understanding society presented because it's really one of the gems in the sense that it's representative and longitudinal and it's very rare to have that. So for social scientists, that's a treasure tove because many of the studies are very selective. So the UK Biobank, for example, has a 5.7% response rate and it's fairly selective. So most of the things that we look at are complex and I've already talked about this. And I think that there's a lot of gene environment interaction, which I'll tell you about. And what we've seen is that it's much easier to genotype data and if I have some time at the end I'll tell you about the new citizen science and crowdsourcing approach to this because the cost of genotyping individuals has just gone down just rapidly over time and no one would have expected that. We've also had the rise of direct to consumer data and companies where you can, like 23andMe and AncestryDNA where you can get your own genetic data. So I already talked about these SNPs. But I think the most important thing is just to note from this is it's never just one SNP except for these rare diseases. It's a combination and it's polygenic. If you can take one thing away from it's that. So we conducted a GWAS and that's what we call data mining and what the geneticists call hypothesis free research. So it's just different approach to it. And you regress on your outcome a million times all of these different outcomes and you adjust of course for multiple testing. And there's been an explosion in the number of GWAS. So this is just to give you a timeline. So the candidate gene studies I talked about were coming out sort of around here, maybe a little bit here unfortunately, but we started to have around 2007 this just explosion of genome wide studies. So these are these polygenic scores that you can use now we're over 3000 and I'll reflect on those later. So let me give you some examples that might be relevant for you. There was one that I was involved in as well published last year in Nature Genetics and that produces polygenic scores for depression, wellbeing and neuroticism. And it shows the genetic overlaps between some of those traits. So those things you could actually use as polygenic scores and this could be interesting for some of your analysis. This is being replicated at the moment as well in a larger sample. So probably we'll have more predictive power. The one that many of you may heard about and I know that Dan Benjamin will be coming and talking about it. He was one of the people involved in this. In science in 2013, they first isolated around six genetic loci that were related to educational attainment. So they looked at how many years of education you have and whether you attend college or not. Some of you may have heard of this. It was replicated again and published last June in Nature in 2016 and they found 74 loci. It's now a new one is a larger one under review and they're around 1300 loci and I'll show you, Dan allowed me to show some of the preliminary results from that as well too. And this is our study where we found 12 loci related to human reproductive behavior. So timing and number of children you have published in Nature Genetics last October. And I'm gonna talk a little bit more about that now. So in this study, so this is not for the faint of heart. If you get excited and you think, I'm gonna conduct a GWAS after you leave, it's a huge amount of work, not just in terms of the mass of data, but in terms of the communication between people, but also just trying to get all of these different data sets together, analyzing them. It's not a glamorous thing because a lot of it is emailing to try to get the data and you said you deliver it and you didn't deliver it and yeah, so it's a lot of that. But we realized that in order to get a large sample and we're one of the out of the 3000 studies we're one of the top 10 of the largest genome. It wasn't our goal, we just realized this later. We were one of the top 10 largest studies. And we got our data from medical companies, medical studies, insurance, direct to consumer companies and we can talk about that later if you have questions. But what's interesting is that we have a lot of men and anyone that's examined things in relation to children and in relation to fertility, you'll know that in many surveys, they don't even ask men about fertility. So, and the excuses are that they don't know how many children they have for sure and these kind of questions, but so we were really, really excited. Yeah, oh, we got a man in the back, yeah. I asked him quite clear as well. So, what you're doing is you're finding lots and lots of studies that have been done, data sets that have been put together that have the same potentially outcouraged will be used to doing. And the genome turned into these SNP binary indicators and you sort of combine them. And that gives you more power to, that's the... That's exactly, so I'll just repeat in case you... So, what we do is what you need to locate is data that has genetic data, but also has your outcome variables. So, education in years and in our case, the age at first birth and number of children ever born. And actually that's often collected as basic demographic control. So, for our data, there was actually quite a high availability. You combine all of those together and you produce what's called a meta-analysis. So, everyone does their separate analyses and sometimes we do them ourselves. You combine all of their results together in a meta-analysis and then you produce what I'm about to show you, these sort of genetic loci that you've located. So, that's how it's done. With a lot of emailing in between. So, what we found for age at first birth, we found 10 significant loci and maybe I'll describe these plots first. These are called Manhattan plots because they sort of look like the skyline of Manhattan. Or we could rename them Oxford plots with the spires, but this is age at first birth and this is number of children ever born. These are the chromosomes and these there are the ones that have hit a significance level. So, this level of minus six means that it's suggestive and if it's above this line to the P to the 10 to the minus eight, then it means that you have a significant finding and a hit. And I'm sorry, it's so small. And you'll see that estrogen comes up a lot. That's why I was asking you. It comes up in most studies in genetics. So, what you'll see is we found all of these different loci, but what was interesting, we found some only in men and some only in women. And then we turn and don't worry, I'm not gonna go through all of these. But we turned it over then to the biologists, the bioinformatics people and the molecular geneticists to figure out, okay, so what are these genes doing? You know, what's there? Are they doing anything causal? Are they related to methylation? So, we heard about epigenetics previously. You know, what's been found about them before? How do they relate in terms of a network? In terms of their similar functions? And indeed, some of the more interesting findings were related to men. So, it looks like men might have something to do with fertility. And they might have something to do with infertility and I'll return to that in a moment. So, things were found and related to sperm quality, sperm production, which was really quite interesting and hadn't been seen before. It was related to some findings found in mouse models, but also in relation to hormones, in relation to estrogen and follicle stimulating hormones and hormones related to development. So, that makes sense when we think about fertility. But also, it was related with some overlaps for infertility factors found for endometriosis and polycystic ovarian syndrome that are known causes of infertility in women. So, we clearly had found something with our behavioral outcomes, we had found some sort of biological things that hadn't been found before. And something that we're following up now in another study was that the lead loci on chromosome three was linked to the methylation and expression of genes in relation to sperm function and sperm count and quality. So, this was really exciting to find some things related to older men and mutations, but also genetic expression. So, this will be coming out, we're finalizing it now in our different analysis center. So, look for it in a probably nature genetics, or some hopefully, some similar type journal where we again look at these things, but we also look at childlessness and we also look at age at first sex. So, this is crazy, but if you look at the literature, there's virtually no very few studies that connect sex and fertility. And we're thinking also that there might be a link. So, that was another hunch that we decided to explore. And this is really interesting, I'm not gonna show you the results today, but this one also has overlaps with risk, not only development, but risk behavior as well. So, it's really got some really interesting findings and that will be coming out soon. So, we imputed it on more dense data, which gives us better resolution and we're looking also at the X chromosome. So, our new coverage is almost up to a million for some of the outcomes. And this is age at first birth, age at first sex. We actually have a reasonable sample for age at first sex, childlessness in number of children ever born. So, you can look for that coming to a study near you. And then we produce the polygenetic scores and you can add them into understanding society and all of the different data sets. So, if you want to include that as a variable in your model or control variable, you're able to do that and see, okay, I'm looking at a fertility model or age at first sex or I wanna look at smoking and I wanna look at risk. You can use then these genetic variables as controls or as interactions. So, what we do from these genetic wide studies is we produce what's called polygenic scores. So, these are looking, it's a weighted average and I think you should think about it in terms of and it's easier if you see the math but it's just a single quantitative measure of genetic risk. So, your genetic propensity to have higher years of education or lower or to have your child later or to have more children, think about it like that. And I'll now show you some results using these polygenic scores. So, from the nature, the old nature article last June, they were saying that once you include all of the SNPs and weight them, if you include this polygenic score for education in years, you'll have a predictive power of about 6% just alone. So, it will predict around 6% of educational attainment in white European populations and I'll come back to this in a moment too. In the new study that should be coming out soon is incredible in the sense that they're now up to predicting and this is in ad health in the Wisconsin Longitudinal Survey, they predict about maybe 10 or 11% of their polygenic score for years of education. And if we compare it in relation to our usual suspects that we usually include in our models, so things like your parents' education, your dad and mom's education, your cognitive ability, it has a fairly high predictive power. So, we have maybe been ignoring some important factors. So, we wanted to see what the predictive power was of our genetic scores. So, we looked at it in all of these different countries and cohorts and I'll speed up a bit so I can tell you about why I think it's interesting to look across countries and cohorts. So, what we found is our age at first birth score actually explains around 1% of the variance. So, remember I told you before that the, you saw before that from the SNP heritability from the Grimel methods, we were about at 15% when we actually, and this happens with all of the genetic studies when you get complex behavior, you get what's called the missing heritability problem, we actually were able to predict around 1% I'll return back to why this could be the case in a minute. But if you just enter one variable into a model, you'll see that social science variables, if you just don't take a multivariate approach, they'll predict around 6% or 10%, so it isn't as dramatic as you think. You can think about it as one standard deviation, a variation of our polygenics course related to postponement of around six months for women and around four months for men. It gets a bit trickier to think about the effect sizes of polygenic scores for a number of children ever born. We haven't been able to visualize it. So, I've done it on my son. It's, it predicts around 0.04 of a child. It's hard to describe this effect size, but I'll show you in a moment how you could use it in a model where it'll make more sense. So, another thing we wanted to look at, we wanted to see are the polygenic scores that we looked at, are they associated with other fertility traits? Or other, not just fertility traits, but all sorts of other kind of traits. So, what do you think that timing of when you have your child and number of children ever born, what other traits or outcomes could it be associated with? What do you think? Psychological traits or what? Yeah? Income? So, what do you think? It's my interactive moment. I learned it from Miquel. I thought, oh, that's a good idea. Sexuality? Sexuality? So, we can come back to that question. That'll be an interesting one after, has there been a GWAS done on sexuality? No. We'll come back to that after. Okay, let me, let me ruin the surprise. You can't see it anyways, I'll have to tell you. This is the genetic correlation of 27 different genome-wide association searches that were done and how they correlate with our age at first birth and number of children ever born. Red is age at first birth, blue is number of children ever born. If it has a start, significant. This shows your genetic correlation, lower or higher. This shows the 27 different traits. I'm just gonna talk about one set of traits. I can't go through all of them, don't worry. And it's things related to development, related to lifestyle, related to, you know, the big five, Alzheimer's, subjective well-being and related to some health and height and BMI, those factors. So, what we found was really interesting is that there's a strong correlation and a link between what we found with human development and its fertility. So, it actually kinda makes sense, right? So, we did something called LD score regression and you can look up some or I can send you some of the references if you're interested. But what we found was there was a very strong link, a correlation of, you know, almost over 0.7 with age at first birth and age at first sexual intercourse. So, this can be related to development. Also, related to age at monarchy and age at menopause. And for boys, it's hard to get a measure but there was a significant relationship with voice breaking and boys. So, the genetic studies that had been done there. So, it shows some sort of relationship in terms of human development. But then, what we also thought, wait a minute, there's a group of individuals who seem to have a shifting of their entire reproductive period to have later monarchy, later age at first sex, later age at first birth, also later age at menopause. So, then we thought, okay, it would be interesting to see if it's linked with longevity. Did these people, and we're following up with epigenetic studies too, is it, you know, are there these people that live longer that have this genetic profile? But somebody else did it first. And here's the results, one of our co-authors even. But it's all nice. It's published in Plus Biology now. And what they found, they used our age at first birth polygenic score and they found they related it to mother's age at death. So, you'd have to make the jump in the assumption that death has intergenerational transmission, but still it's some interesting results. So, it appears to be that we have a shifting for some people and that's really interesting in terms of the way we think about fertility in the life course. So, our polygenic score for number of children ever born. What does it do? Well, if you include it in a model, actually if you have a higher propensity to, you have a higher polygenic score for having more children, it decreases the probability that you'll remain childless for women. So, it does have some predictive power in models. We also started thinking about what's called sexual dimorphism. And this was just published earlier this year. And I'm sure some of you, you can be honest, who's been thinking how on earth is it possible that infertility and childlessness can be transmitted? Right? You've been, some of you have been thinking of that. Because by nature, if I'm infertile, how could I pass it on to my children? Right? So, this is the question somebody put something, you know? So, I get that question so often that I thought, okay, let's do an article. So, we published this in the European Journal of Human Genetics. And it looks as though it's related to what's called sexual dimorphism. And you can look at this in more detail. If you're interested. So, basically, what we're seeing is genes that are related to male childlessness seem to be passed on via the male lineage and vice versa for women. So, they are just passed down and they skip a generation. And if you're interested, you can look more at that study. But basically, what we wanted to look at, and I'm just gonna go a bit faster, is it appears that there seems to be different sets of genes implicated for men and women. And I think as sociologists or social scientists, you must sort of think, yeah, of course, it's biology and fertility. They must be different sets of genes. But often the studies are combined for men and women. So, we got the question, why are you dividing men and women? You know, and it's really interesting. So, we come in with a very, and I'll show you a very different perspective onto this data because we just think, yeah, okay, there's differences. We know there's differences here as well, too. So, we showed that. So, if we use our polygenic scores, and I'll just give the example of childlessness, but we have other studies we're working on, too. So, how good are they as predictors when you include other social factors? So, I showed you the results when they were alone. So, we looked at it using the health and retirement study, and then we replicated it on the Wisconsin Longitudinal Study. We can replicate it on understanding society, too, but. And we wanted to see what's the probability that you remain childless. And we related it to different causes of social science causes, usual suspects that we know, but also infertility. So, just to see if you're awake, do you remember some of the infertility? What do you think for men and women? What are some of the genetic factors that could predict infertility? Foltysperm. Foltysperm, yeah? Yeah, the polycystic ovarian syndrome. Endometriosis. So, we wanted to see, okay, if we look at these different factors, you know, how does our polygenic score relate to these other factors? And this is hopefully coming out soon in social science and medicine. So, we know that from infertility, you know, we know that there's some genetic-wide association searches that have found things for endometriosis and polycystic ovarian syndrome. But we also know, and this is examined less often, but that it's related to male sperm quality, but also with individuals that have had chlamydia. So, there's some mutation of particularly it has effects for older men. So, we included our polygenic scores. We included some of these scores that we could get from individuals for male genetic male infertility and some of the female ones. So, this is genetics all here, and we include that in the model. I won't show you all the model story. And then we include the usual suspects in sociological research. So, things related to, you know, if you have a partner or not, educational attainment, occupation, birth year. And it's really small, but this is just showing it for health and retirement survey, but it replicates exactly almost on the Wisconsin Olongitudinal Study. What we found is education's important for whether you remain childless. For men, this is for women, this is for men. So, education is really our predictor. So, if you stay in education longer, you have a higher probability of remaining childless for, you know, so all the stuff I did for 20 years wasn't exactly wrong. You know, work-life reconciliation, motherhood penalty, gender equality, all of these aspects are still here. But we see that actually the age at first birth, the polygenic score remains significant when all of these social factors are added. And the same holds for men, but for men it seems that this low sperm count is really, really an important predictor of childlessness. And now I'm working on a two-stage model because it's actually selection into partnership as well, too, that's really important. Okay. So, I've talked about, you know, the relationship with social sciences, but one thing I didn't think about when I entered into this was how important we would be for, so I was thinking how important they would be for my research, but it never dawned on me, and I don't have inferiority complex or anything, but it was just, you know, that we actually have quite a bit to offer to the field of molecular genetics. And that's what I'm starting to realize, and I hope I can bring some of you over to help me, because I think we have so much to offer. We think about things, as I showed you before, we think about the social context and the environment, and we measure it in just excellent ways. We focus a lot on attentions to group differences, socioeconomic status, sex differences, gender differences, and ethnicity. And that's something that's missing, not in all of it, but in a lot of this research. So it's very blanket. So this is showing from twin studies, and it's an eye test, I see. It's showing age at first birth, the number of children ever born, and the heritability from twin studies. So we did a review and we wanted to see, you know, what's the, this was before we did the genetic wide association search. We see that there's heritability. This is different countries and different cohorts. So what do you see when you look at this? Even though you can't see it, I've kind of, if you can remember what I told you it was. This is different cohorts and countries, and this is the level of heritability. Yeah, there's differences across countries, right? There's also differences across cohorts. And there's differences by men and women. And I mean, this is the social scientist in me thinking, wait a minute, you know, there's all of these differences. So, yeah, they see this huge variation in heritability as a population specific trait. So, you know, it's estimated within populations. So there should be some variants, but it really differs by country and cohort. There's virtually no studies of men, I've said that before. So that was something that really bothered us because we were combining 60 or more data sets together across cohorts across different, they grew up across different periods in different countries. And we were thinking, can we do this? We wouldn't do it in social sciences, but can we do this? You know, and that was something we kept asking. Everybody was like, why do you keep asking that? So, you know, and I've shown you this before. It's a, this is birth year. This is a women's mean age at first birth. This is the famous you curve. I keep showing you, sorry for the repetition. These are different countries, but you'll see, you know, that there's quite some variation in the age at first birth across these countries. And if you're born in 1940 and you're born in 1980, you know, you have a different social context and a different age at first birth as does your peers. So, oh yes, and here's an example. This is my mom. Hi, mom, if you're watching. And this is her mother-in-law and this is her grandmother. This is me. You can see that myopic or myopia and eyeglasses is heritable from this thing, but what you'll also see, this is her with her first child and this is me. And I just wanted to show you the differences in, you know, you can actually just see it in this picture of when she decided to have her children, look at the social control in the room. Look at me. I don't know, I'm out on the street somewhere holding a baby. You know, the differences are dramatic. So she had her children in Canada and I had mine in the Netherlands. So it's really just a very different social environment and very different levels of social controls. So the genes that are related to her having a baby might be different than the genes related to me having a baby, right? So that's what we thought as social scientists. And we had many years of rejection, but now last month we got the cover of nature, human behavior, where we actually explore this, you know, and this was years of trying to show these effects. So we're arguing that when you combine all of these data sets together, you might be missing a lot of heterogeneity, so differences across birth cohorts and populations. And it's all social scientists and you're thinking, yeah, of course, but this really hadn't been discussed. So this meta-analysis and all of these things I was showing you, it just lumps everything together and it kept bothering us, you know, that we were doing it. So we wanted to see, you know, this, remember I showed you the heritability estimates from Greenwell? They were 15%, but when we got to GWAS, they were 1%. We wanted to see, well, could that be, you know, the difference that we're just sort of missing something, something's hidden because of this heterogeneity. So we estimate it using real data from multiple data sets and then we engage in a series of very detailed simulations. It's going through different matrices across populations and countries and if you're interested, you can look into our fascinating supplementary material that includes all of the mathematics but also the information on the simulations. I'm not gonna talk about that here. We looked at education, age at first birth, number of children ever born, BMI and height. We wanted to see if you just look at genetics but you add cohort or the country they come from or the interaction between the three, how much do you explain? So what do you see? Well, this is height. It looks like a lot of it is explained by genetic factors. But what's happening over here? It looks like, and that's the non-blue, that the cohort you're in or the country you're in is actually seems to be explaining more of the veritability and the variance. So we're arguing that it was hidden. And we show for each of the phenotypes that for things such as number of children ever born, actually a lot of the things, by combining all of these multiple data sets, you've been missing the importance of cohort and country. And you can go into that article if you're interested. But for some of these things such as height or some of the harder medical outcomes, you might not be missing anything by not considering environment. So it's a plea to think about, and this is just really simple, right? Cohort and country. You can think about other things, socioeconomic things, but there's more going on here. And that's why I think it's important for social scientists to just question very basic aspects of these types of analyses. So just to give you an example, and Miquel has already introduced it already. So it's perfect. So we often think also about socioeconomic differences. So we wanted to know if you're living in a poor neighborhood, but you have a really high genetic disposition to have a high education, what does that influence you or not? And this is actually an old idea and people have looked at it before. We know that in poor families, actually the genetic component in twin models was almost zero. So, and this has been a study that's been replicated for many things. So if you come from a poor environment, you might have a high cognitive ability or a high genetic ability, but that might not be realized. Conversely, if you come from a high socioeconomic environment, other things might not be suppressed or realized. So the example that's already always given is genes related to regression and self-control. So if you come from a low socioeconomic environment and you express these sort of aggressive or lack of self-control, you may get put into prison. If you come from a high socioeconomic environment, you could become a politician or a CEO with the same sort of traits. So that just gives you an example. So children growing up in poor environments often face isolation in terms of adult role models, disorganization, there's lower social control and monitoring of their homework. They might be in areas of low quality schools. We've seen this, this white flight or going around in catchment areas that have higher school qualities and people can't afford to live in those areas. And as was discussed before, environmental impact. So there might be a lot of noise or something that influences the children when they're trying to study. So we look at it in an American data set. It's longitudinal using ad health data. And we wanted to see if you have this higher or lower genetic propensity for education in years. Does it matter the environment that you grow up in? We looked at different census blocks and we did a principle component analysis on neighborhood quality. I won't go into that in detail, but we looked at it at wave one so when they were about 10 or 11 years old. So if you grow up in an impoverished or a high status neighborhood when you're 10 or 11, what is your education when you're 24 to 32? So do you have a higher education or have you attended college? We run various regression models, but I'm not gonna go into those now. We can talk about them later if you're interested. And this was kind of amazing when we saw it. This is the polygenic score, and this is the prediction. These are the high socioeconomic, the children that grew up in a high socioeconomic neighborhood. And these are the kids that grow up in a low socioeconomic neighborhood. This is your polygenic score from lowest to highest sort of genetic propensity to have a higher education in years. So what do you see? Yeah, you see an interaction, that's for sure. And you were gonna say something too? So social inequality and I'm a social scientist. But look at this, if you have a higher, so these two groups both have a high, and this has a high correlation with cognitive scores, right? So these two groups have a very high cognitive score. These ones come from poorer environments. These ones come from wealthy environments. Don't tweet it though, because it's not published yet. But it's under review. But you can just really see that these ones are able to realize their genetic potential. And this is why when Patrick asked about these groups that divide up racial differences, so-called racial differences, they're usually picking up a lot of socioeconomic differences that aren't related to ethnicity or race at all. So we wanted to think about it, is it probably this whole environmental thing and is it the educational aspiration of parents? So the higher socioeconomic parents probably want their kids to go on to higher education. And indeed, the question, how disappointed would you be if your son or daughter did not graduate from college predicting this polygenic score? So you see that these ones are more realistic maybe about their low cognitive children that they won't be disappointed. But this is flat. So there's, even though this is a really, really, these are really, really bright children on the very right end of the spectrum here, the really bright, but there's, so this is really interesting in terms of social aspects. Okay, now the elephant in the room. Everything I've been showing you so far is about white people, almost all of it. So there's an article last year in Nature that shows, and you can't see it here, in 2009, 96 of the genetic studies of these 3,000 I was talking to you about is on people of European ancestry, white European ancestry. And 81% in 2016, I'll show you something else in a minute. So, and then this is Asian populations. Largely, there's been some large Chinese and hand Chinese studies, mostly there. Very few studies. This is African ancestry, very few studies of African ancestry and other groups. What's going on? So there's something related to the population stratification and the structure of the data. So there's a technical aspect here related to minor and major alleles. Funding. So I'll tell you, we have a paper, hopefully coming out soon, we call it the conspiracy theory paper, but it's something coming out where we actually analyze all of these studies. We analyze the authors of these studies, we analyze where they're funded, we look at their gender. We look at different aspects and we look at the different outcomes. And this isn't published, but this is something we're working on. And if you look at over time, all of these different studies, blue is European ancestry, red is Asian, and then we find very small anything else looking at these genetic studies. And what we found is actually a turn back with the UK Biobank to looking into more, so back up to 95% of looking at these studies. And so why is this important? It's important because drug targets and many sort of medical developments are being developed based on these and they might not be as useful for other populations as relevant. So if you're interested, we're working on a paper right now related to this and we look at the authors and their networks and the different data sets across time and the different sort of groups that are examined because I think this is really something as a social scientist you come in and you think, oh yeah, so we can discuss that later if you like. So can we ignore it? I guess just to summarize, I think there's a lot of non-replicated gene studies that you can ignore. And I think you can get excited about some of these results. But as I showed you, it's still some of the things are not all biological and they might not have a high predictive power. And I think our social science predictors are actually pretty good. So a lot of the things that we've been used to predict inequality and outcomes are actually very good predictors. But it's when, and I tried to show you with that neighborhood example, it's when you do the interaction that you see, wait a minute, there's some people, and think about it related to disease. I showed it in relation to education. There's some people not realizing their potential or having things triggered, genetic predispositions triggered that are detrimental. So I think it's really interesting because it really challenges us to think in a different way. I know that in social sciences where I've worked in until now, everything's in terms of choice and agency and structure. But what if you throw the genetic component in there? How does that challenge our theory? So there's a lot of work for theorists here. But also I think we'll probably produce some new methods and findings. So for that, looking at these gremel methods across country and cohort, the R package just didn't have anything. It doesn't now. But it didn't have anything that could do that. So we had to do all of our matrix algebra and calculate that ourselves. But I think we have something to offer too in terms of developing new models and pathways. I think it's really exciting about all the different polygenic scores. So the one for educational attainment, well-being, neuroticism, all those traits I showed you, people make them available. It's part of the condition, some of them faster than others. But we published ours, for example, the day of the publication. So you could get all of our results, make it all open. And I think it's really exciting to think about gene environment interaction. And it's a good moment for the social sciences because I think we have a lot to say in this field. So what happens if we ignore all of this? And we think, okay, let's not combine social sciences and molecular genetics. I tried to find the most suspicious-looking lab technician I could. He does look a little sneaky. I don't know if you can see it. But if we don't bring in the right measures, because I've seen studies where they say, oh no, socioeconomic status doesn't matter. And then you go and you look at it and you think, well, that's not measured properly. So I think we have very good expert measurement. And I think we're, just as they're better at measuring lab things and different genetic aspects, we're experts at measurement. And I think we often forget the strengths of our own field. So I think we can introduce very good measures. I think we're the masters of measurement. I've said this already. But I think there's some really, and maybe we'll have time to talk about it in the panel. There's some really interesting policy and ethical issues about everything I've talked about today and what was discussed before. So what are the policy implications of this? And Jason Fletcher did a really interesting study. If you're interested, he looked at something in the US that happened in relation to tobacco control. So what happened in the US, but it happened also in many countries in Europe where they put high taxation on tobacco and they limited smoking in public places. And what he found was the people that had, that were just sort of, they had the nicotine addiction. So they had the polygenic scores related to high addiction to nicotine. No matter what you did to them, they were gonna smoke. But the other ones, those policies worked. So the people, the social smokers or the people that just weren't hard wired to be addicted to smoking, that actually worked for them. So that's interesting in thinking about, we might wanna have these 80s or 90s, 1990s kind of policies that blankets everybody, but we might wanna have tailored policies. So for these people that are really addicted to smoking, you might wanna think of pharmaceutical patches or something for them because no matter how much you tax them or forbid them, they will have difficulties in stopping. So I'm not convinced it's gonna overturn everything we know, but I think it will complement our substantive findings. So that's it. So thank you very much.