 All right, so our second session is on anatomy, physiology and behavior, and it's being chaired by my dear friend, Naomi Ray. Okay, well welcome to this second section session, which is about genetic architecture. So I guess genetic architecture, as we've heard this morning can be defined as number of variants that affect sizes, their frequency in the population and the way they interact with each And so with the introductory talks before the panel discussion we were tasked with providing empirical evidence for genetic architecture and traits, and then to give insights into evolutionary forces. I think the panel discussion that follows were tasked with a broader set of questions building on those foundations. So in this section we've got three speakers, we've got Michelle George from the University of Leos, Neil Riche online, and yours truly somebody at the last minute dropped out and so I was tasked with to take their place. So, just a little bit about myself I've just relocated to the University of Oxford where I'm in the Department of Psychiatry but my first career was in livestock genetics. And one of the things about livestock genetics is that you have very, very intense selection. And one of the incredible things is that with that intense selection. There's so much progress is happening every generation. And the only way you can explain that is by polygenicity and new variants. There's so much polygenicity that that selection continues in each generation. And that really underpins my approach to human genetics as well and it doesn't make sense to me that the genetic architecture of livestock traits would be different to the genetic architecture of human traits. And therefore it's great pleasure that I'm going to introduce Michelle George who's Professor of genetics and genomics at the University of Leos in Belgium. He is well known in the livestock genomics community for his functional genomics work. And he also works on inflammatory bowel disease in humans. And so I think his story about muscling in cattle is going to be very insightful so I'm really delighted that Michelle's here today to join us. How do we advance the slides? Okay, so it's a real great honor and stress to be here. I thank the organizers Nancy, Molly and Naomi. I'm not thanking Peter. And I asked Peter a few weeks ago, you know, why, why am I on the program of this of this incredible conference and he said it's Naomi's fault. I'm thanking Naomi and the genetic architecture of anatomy physiology and behaviors I sort of looked around in the lap and what I have done and, and that's sort of the closest that I could get to anatomy and physiology. So I'll tell you about. It's a bit connection with the beginning of my career, the phenotype you will see but it's the work, primarily from a group in the lap PI called Tom Dre, who is in charge of genetic selection in this breed. It's primarily the work of a PhD student, Chan Yuan, but with contribution of a postdoctoral silhouette while John and supervision of other PIS, Carol Charlier and Aruko. So I'll tell you about this beast. So it's essentially a cattle breeds that is well known in Belgium where it accounts for approximately half of the herd. And in fact there has been unconsciously sort of a selection experiment, strong direction of selection that has been going on for approximately 60 years so the breed was there before it sort of was born around 1900s it was what we call a dual purpose breeds. People observed sometimes what is called in French in English we call them double muscle animals, and they create problems because they generate dystocia. Despite that fact in the 60s Belgian breeders decided that they were going to fix this qualitative double muscle phenotype and they started a selection process towards that goal. In the largest fraction of the population that I call it the size of population one but they maintained if you will a control population. And this this you will see that the corresponding gene was quite readily fixed. Since then, which I call 1980 until now they have continued to select for increasing muscularity but now dealing with a quantitative trades, if you want. And so I'll tell you a bit about the results the observations we have made during this first phase of that selection experiments and the second one. The first phase of the selection experiment started with before the era of DNA markers with segregation analysis and on set my predecessor if I may say is what University of Liege was convinced that there was a major locus at play, although the, the Naomi raise of the world from Edinburgh and all these places that it's impossible major genes don't exist. It's all polygenic. So here with segregation analysis was essentially saying, there is this locus. It's partially dominant. And that gives you an idea of the magnitude of the effects, distinguishing the homozygous wild types from the hetero zygots and the homozygous mutants, in terms of residual standard deviations. So if you want to go and look at the actual phenotype muscle mass is, let's say increased by 25%, but the weight of other organs is actually reduced by a stronger fraction. And then to make a long story short. We cloned and identified the corresponding gene benefiting a lot from staging lease work in the mouse. The gene is a color hormone secreted by a tissue to reduce its growth, and it's called myosatin. And so we found a mutation that disrupts that that gene that is now fixed in these Belgian blues and that is essentially the having the effect that were predicted in the segregation analysis. And it left a strong sweep in the breed at the present time it's a region that is entirely monomorphic the length is not so much. It's sort of a thing that is between a heart and a soft sweep because the mutation was probably segregating in a population for a while. But at that time people thought that this mutation was actually coming from England and this this phenotype had this been described in multiple breeds and the stories were that all these breeds were sharing the same identical by descent mutation. In fact, when we looked at all the European breeds in which they describe this double muscle phenotype. And every time a distinct mutation so there was a clear phenomenon of Locus homogeneity it was always the same gene that was coming out, but these were independently captured by selection. There was a lot of allylic heterogeneity and of course that's quite remarkable because you would sort of imagine that muscular hypertrophy can appear through a variety of pathways and genes. But clearly the fact that we always have this identical gene in the case that there are not very many ways, at least in this species to obtain that phenotype. In the majority of these breeds because of the dystopic problems, this thing is maintained in the form of balancing selection people were not aware of the mutation, but we're trying to keep an intermediate phenotype. And this situation of balance balancing selection. So, after that to mutation or mutations in the same gene, we're observing a series of other species but the interesting thing is that it's sort of species specific so it seems to be a similar situation in some breeds or some species sorry, but not in other ones for instance if you know what is gene in the pic, the pic is sick as a, as a summary, also briefly mentioned that there are hypermorphically is that we're detected and there's a nice one in the sheet because the mutation creates an illegitimate target site for micro RNAs which is sort of a fun story. So that's essentially the, the first phase of the selection experiments, the fixation of knockout mutations in a very specific genes, and the demonstration that there are not many pathways that can lead to this double muscle phenotype. They were apparently not satisfied with the increased muscle mass due to this gene and they continued to want to have more muscle animals as you have seen in the first slide that I showed you and they moved in their way to something that they call quantitative. But in fact the quantitative thing for them is they define a certain number of phenotypes, which they quantitatively measure with their eyes so they have specialized technicians that go visit cows. And they ran for 22 characters the animals on a range from one to 50, there's only one that is actually measured its height. And these 22 phenotypes, they cluster into stature muscularity and another thing which is least related to ossification I won't talk about that I'll only look at stature and muscularity today so you would say my God this can't be very heritable. And in fact, these are the estimates of the heritabilities estimated on a population over the last five years so you see heights. So I've organized the phenotypes in on the left stature on the right muscularity. The first one height is the only one that is measured. It has a heritability of around 40%. And the other ones have a reasonable heritability looks like there is a bit more on the muscularity side and on the stature side. And as expected, you know we still have the highest estimates when we use genealogy pedigrees to estimate kinship, but we get close based on snip based heritability. This is with a medium density. And in the array if we go to genome based we gain one or 2% more but we're still below the, the pedigree based heritability. You know, say if it's heritable it should respond to selection. Sorry for the quality of this slide what I'm showing here is the in white, yellow and orange is the phenotypic distribution in this control population, sorted by the phenotype. Sorry, atomized statin look is this control population still segregate so you still have the three genotypic classes and the red ones is what we have now in the Belgian blue and so it gives you an idea how much has been gained between 1980. So there's a substantial gain but the gain is actually relatively limited when compared to what my static mutation was doing on its own. So for today essentially the question is, you know there is, there's still heritability, and I should show you there is a response to selection. I sort of rediscover these things while preparing the talk here. It is, it is not spectacular. So when compared to the major effect of the my static mutation. There is progress there is variation there is heritability, but it's not spectacular and maybe we'll can connect that with the rest of the talk. So, of course what is the architecture of this residual heritability if you want and the data that will show you has been obtained by having phenotypes for approximately 15,000 animals that includes this eyeballs phenotypes, and all these animals are genotypes and imputed we have around genotypes at around different positions. And of course to go after the architecture, Tom and his crew first did a G was using very standard methods. So you use Chema either in univariate or in multivariate way and essentially when he looked at all the phenotypes separately. So he tries a certain number of, of loci but there are not that many loci it's the same loci for the different phenotypes. So he's doing a multivariate analysis to try to sort of refine or the increase the accuracy so this is the result for musculature in stature and so you can sort of see that these are nearly identical patterns, and I allowed myself sort of to merge all these in one. I guess for us what it means is, it could of course be related to the way that these people phenotype the animals, but it suggests a lot of Cleotropy. And so before we sort of say Cleotropy we convinced ourselves that not not only the locations were the same but at the underlying mutations variants that drive these peaks with the same so we will do co localization essentially confirms it's the same mountains. I'm showing you two examples where we sort of look at, whether it's a correlation between p values of science. t statistics for instance. So what you see here is that the, the slope of the lower triangle can be positive or negative here, while it's always positive here essentially we have some loci for which the signs of the effect on the two groups, stature and are the same, and some other loci for which it's opposite. So this is sort of when I put everything together using a Z score if you want. And so you have all these peaks and little numbers there are the alec frequency of the, of the minor frequency of the top variants. So what comes out of that so if we look in detail at these peaks. The first striking thing is that eight out of the 15 for eight out of the 15 peaks. There's clearly a coding variant that is either the top I was actually surprised of how often it is the top variant we sort of don't expect to have that level of accuracy that these guys would come on top but for nearly all of them. There's a coding variant that is coming on top. So the first thing is. So if I compare with the work in human on IBD we're out of 200 and something loci we have 10 where there's clearly a coding variant at play and all the rest is regulatory here at least in the top ones. The proportion of loci with coding violence was very large, but maybe a bigger surprise was that if we look at these guys we knew. In the beginning of these 15 we knew them because we have detected them as responsible for a genetic defect. So in fact, three out of the 15 corresponded to a genetic defects, and another one here. We don't know exactly what is happening but there is a very high significant depletion in homozygotes. So we don't know why the homozygotes for this mutation disappear from the population we don't have a phenotype attached to it but there are much fewer homozygotes unexpected so it's sort of a genetic defects. So the first conclusion was that a significant part of the heritability will see in a minute how much is actually two things that one dose dosage of the population increases muscle mass, but to, and the animal falls apart, and essentially these things were maintained in the population, also by balancing selection, until we started to select by Mark resisted selection against that. And then if we look at and so what I should also say is that these for defects are breed specific. So the other coding variants the one that I show in green here. So these are variants that are shared with other breed so they predate the creation of that breed. And if you look at the genes underneath our genes that are also reported in human studies related to height like L corral for instance but all of them have been reported previously. So these are probably things that are affected by regulatory variation, and we started sort of to try to identify the gene to the eqtl information. And for one I think it's quite clear and it's again one of these low side that seems to act on stature across species. And for the other one I mentioned it's an interesting one so one of the coding variants that was neither in a defect nor in a known gene affecting humans is easy age to those of you that are doing epigenetics when you hear easy age to say oh that's an interesting one. And the one that comes out regulatory is a binding part partner so they seem sort of to be potentially an effect on epigenetics so we're sort of keen to go and look at chromatin signatures. So we put all that together, how much do these top peaks with 15,000 animals explain well, you know, not very much it's the, it's the red bar so it can go up to 20% of the heritability let's say but certainly not more so obviously you know we are trying to sort of to, to dig into the rest. And I'll, I'll, I'll tell you about preliminary results here so bear with me. They're not finished but the next phase after G was is to is to go and do this partitioning of the heritability by genome compartment and unfortunately when you play with bovine you don't have all these genome compartments ready to go. So you sort of have to generate them. So people in the lab especially I will go take it out generated the catalog of bovine attack see peaks in 63 tissues. She obtained 1 million peaks approximately. And we sort of assigned them in an unsupervised way to 16 components but that correspond very well to anatomical systems. So these are the key messages from the studies. When we overlay that with the genetic variants that we know of in cattle. Well we find more genetic variants in these peaks, then would be expected based on the proportion of the genome they represent which I think now is sort of has also been more species and this is due to the fact that these attack seek peaks are mutational hotspots you know the second thing here's a distribution of single tons as surrogates of the novel mutation. There's really a peak of, of the novel mutations in these, in these peaks but nevertheless, if you look at the sites frequency spectrum, there's evidence for purifying selection. And we sort of overlay the attack seek peaks with the QTL credible sets, and we find the expected enrichment in a tissue specific way I won't go into too much details, except to say that we use the degree of overlap between these credible sets and the attack seek peaks to try to answer two questions. What is the fraction of regulatory variants that map to attack seek peaks and our estimates is one in three. And what is the fraction of variants mapping to attack seek peaks that are regulatory one in 25. So they say, who am I really going to get as much information as I thought I would out of these catalogs sensitivity and specificity are maybe not as good as what we would have hoped. So why is that for instance the first question why is that either we don't have all the attack seek peaks which is sure, but another possibility is that you can perturb gene switches, call the QTL effects by being a snip that is outside of a gene switch I think it's very predictable. That's contrary to what you see for coding variants to a large extent perturbation of the gene switches may occur through the effects of variants that are not lying into the gene switch. With regards to missing peaks. On the one hand we probably need to go for developmental stages etc. But I think an interesting thing is if you go back retrospectively, and you take regulatory variants that have been unambiguously pinpointed by positional cloning, and you say, could I have used attack seek information to identify them more efficiently. I think that for three, three such regulatory variants causative variants of phenotypes, and we know just from genetic analysis that two out of these three effect the silencer and not an enhancer. And when we look into the attack seek peaks are no peaks there. So it's possible for instance that silencer are not cannot be detected using the standard chromatin chromatin assays and so we have data that we can't publish because nobody, nobody believes it. So we have, we have data from single cell RNA sequencing that actually point towards the fact that repressor effects are very common in, in this case, it's the, it's the retina so I think that there's a whole thing about silence that that may be interesting to go. And I finished with the preliminary stuff. Just a few slides. So, once we have these compartments we do like, we've learned from you guys about seeing you know if I go, if I look under the significance level. Do I see enrichment in some compartments and the first thing we did was just to go down the Manhattan plots. And so see as we go down if the specific compartments are enriched. And so, so I did that by taking the first peak, taking the credible sets and storing it and all the rest I throw away, and then I go to the next peak so I don't use the snips that are in a peak but that are not part of the credible sets will tell me which is a good idea or not. So we move down and then you see the different categories. So what we see here is an enrichment of or if altering variants in the top, but that's due to the top peaks the one that I showed you. I think more, more importantly is the lower line. I think if I look at p values what I believe in this thing here is a depletion of intergenic region so I think there is an enrichment of genetic genetic regions can I distinguish between the compartments. That's another thing. If I take away the top, the top snips enrichment of the coding variants disappears and I have this funny mountain of synonymous variants that are sort of appearing later I think other people have described similar things I have no idea what the significance of that. And if we bring in attack seek sub domains, then. So what we see is an enrichment in snips falling into muscle specific attack peaks. You would say oh fantastic but in fact if I look where the signals coming from. It's the same genes that are giving me the signal with the or f altering virus so I'm not sure what that means but I wouldn't believe it too much. And then of course we go to variance component analysis and base are. And the thing is that the methods that are being described are. We can talk over coffee. So the, the, the bottom line message is that whether I include or exclude the genome white significant signals which clearly are enriched in, in a coding variance. It seems that and attack seek peaks and coding variants are disproportionately explaining a lot of the, a lot of the variants but I really take this at this point with a grain of salt. So I'll let you read the conclusion. So I'm cognizant that we've got panel discussions and around table I don't want to take the time for that but I do want to have a couple of questions. So, can we have a couple of questions from Michelle. No, well if not then we will move on to the next speaker. And that I think that was a really wonderful view a few attacks on me which I won't rise to right now. This is Neil rich and who's going to be online Neil, why aren't you here. And we all know me well as past president of the American Society of Human Genetics is 1996 paper with Kathleen Merrick Henderson the pins are whole fields but of course he's also done a lot of modeling of disease architecture so Neil I think you're going to be talking about major genes in complex traits. So over to you. Am I sharing my slides, or are they going to do them. What are we going to do, what would you prefer. Either ways fine with me. You share then you're in control I think that's much better. Okay. Thanks Naomi, I guess I could spend 10 minutes explaining why I'm not in your meeting in DC and Bethesda today, but actually this thing I've really enjoyed the meeting so far and listening to conversations, especially when this morning and as Xander said I, I guess the word of the day or the keyword of today is context dependency. And I was sort of thinking how this conversation might be viewed. If we're saying place at the site of epidemiologic epidemiologic research. I think the comments and questions might have been somewhat different. I don't know what genetics had on today but I can just tell you, many years ago when I was at Yale, one of my epidemiologist colleagues asked me, like, Why are you studying genetics that's not a modifiable risk factor, but I think he probably wasn't anticipating CRISPR. Anyway, so when Xander asked me to talk. I said, Well, actually, the last 10 years mostly I've been working a lot on Mendelian traits and could I, could I talk about that and he said well, well if you want to argue that Mendelian traits are complex than maybe. So I said, Okay, yeah, it's easy to make an argument that that Mendelian traits are also complex but I guess what I'm going to talk about is the relationship between between them and as you see I added something to my title which is complex traits the confluence of forward and reverse genetics. So I guess I characterize Mendelian genetics as reverse genetics. We're talking about rare disease diagnosis. Nowadays is being done by defining or identifying pathogenic or likely pathogenic variant or variants. Individual variants in these genes will have a quote large impact. This is primarily on exome and adjacent and chronic sites. This is one perhaps critical difference between reverse and forward genetics, at least in my experience, the way it's operating now is there's really no statistical inference. It's all based on subjective judgments about variant annotation using current annotation criteria, such as those offered by the ACMG. The process of genotype to phenotype so so basically we identify individuals that have pathogenic or likely pathogenic variants, then we're in a position to try to characterize the phenotypic expression related to those variants. Often in Mendelian genetics, the phenotype is syndromic. That is, it's a constellation of multiple features not just a single one. But another hallmark that I've learned, we all know actually from Mendelian genetics is the extensive clinical and phenotypic variability associated with these disorders, ranging from severe to mild to unaffected, even within the same family when you have individuals have exactly the same genotype. In contrast, non Mendelian genetics I would characterize as forward genetics. So here you start with a phenotype, and then you try to obtain a genotypic explanation. This is would be typical of GWAS. The challenge of course here, and the focus is trying to functionally characterize these variants as causal and that's especially challenging because if exercise they're modest. As we've heard a lot about Muslim or non exonic suggesting and implying that they're transcriptional and regulatory. And again, here the contrast is that these studies are based on statistics these are require statistical evidence for association. So, a lot of the description of this session was about, you know, the Jack architecture and certain kinds of disorders. And as Naomi said, really focusing on explanations for real frequencies effect sizes and then potential interactions. So, for Mendelian traits of course, we know a lot of this, these frequencies are determined by selection, which is probably most often directional but could be balancing, but also very important is genetic drift. And of course the degree of selection determines the potential frequency. And dominant leels because they're directly exposed to selection tend to be very rare, but there are a few examples where maybe not so rare but usually that's related to late onset of the disorder or or incomplete penetrance recessive alleles tend to be more common, especially when carriers are a little or not affected. As we now realize both recessive and dominant variants can show founder effects. So in the end of the day, if you go back maybe 3040 years people would think it's really only recessive disorders that do but we now know that actually that's not true. And that balancing selection can actually to lead some pretty high frequencies for core pathogenic variance, and just a couple examples at the well one kidney disease of course we'll know about G6 BD deficiency. So, Mendelian traits. There's a range of phenotype expression and pleotropy. And, as I'll show in a minute, the, the closer the phenotype you're studying to the actual direct effect of the variant, the larger the effects going to be. So, this is a classic paper from Sir Lionel Pal, Penrose. I noticed this is one that I didn't include. I want to make sure I didn't overlap with what he was talking about because of course he covered the terrain pretty well this morning. He was published in the annals of eugenics in 1951, and basically showing that he characterized for a phenotype. Those who carry a predisposing genotype in this case is PKU. So this would be someone who has two pathogenic mutations for PKU versus a control sample that had no such variance. And basically what he showed in this diagram is that for individuals PKU who are on the right in the dark shade. If you look at their phenylalanine levels in the blood, you're talking about 13 standard deviations difference between the controls. But when you get to IQ, you know, now you're talking about six standard deviations difference when you when you measuring head size. You're talking down to less, you know, maybe one standard deviation difference and then hair color. People PKU tend to have lighter hair color. Now you're talking about less than one standard deviation difference. So, again, the point here is that, you know, phenylalanine levels are most directly related to having a missing phenylalanine hydroxylase enzyme. And so that's a very strong direct effect. But when you get to hair color, there's so many other factors that influence hair color that the effect of that gene is greatly diminished. Now there's a causal relationship, but it's sort of swamped and overwhelmed by so many other variants that are involved. Now here we go context dependency. So, of genetic variance so the most obvious example I could give of context dependency genetic context dependencies recessive variance. So a recessive variant in a heterozygote in a just a carrier may have no effect whatever. And, but when it's in a homozygote, it can be pathogenic, but then it can depends completely on what the other real is. So, for example, in go shade disease these n 370 s variant homozygosity of completed disease but it's laid on set and often is benign and not even it won't even be affected but if your compound heterozygote with a complete loss of function variant is 84 g g. That's not a pathogenic combination so the pathogenicity of this allele is completely determined by the other allele that the individual has now. Okay, that's for recessive variance but for all variants and we've already talked about this a bit. Polygenic background, sex, age, environment and ancestry are all contexts that can impact the effects of those variants. Okay, so first thing I'm going to mention this is a paper from my UCSF colleague junior colleague David Blair, who showed that for a number I think he looked at 50 different Mendelian disorders. He showed that actually polygenic background for the traits that are involved in this disorder actually determined whether an individual had the trait, given that they were carrying a pathogenic variant in the first place. And he illustrated that pathogenic variants determine the clinical phenotype, even in the Mendelian disease. Here's another example of sex so this is sex difference so so no hypercholesterolemia turns out women have higher LDL who women who have FFFH mutations tend to have higher LDL cholesterol than males do and actually a higher incidence of a cardiovascular consequence from it. Obviously age has a major role here as well. As I mentioned with crochet disease, homozygotes for N370S have a late age of onset and your compound head for what complete loss of function you're going to have much earlier on set. Environmental background. There's another obvious case phenylketonuria right. The progression of that gene depends completely on diet. And if you're in a diet with heavy phenylalanine you're going to be severely affected but with removing that from the diet, you're going to be much better off. Now here's some now we're getting to some of the work we've done at UCSF over the last 10 years so this is from the Caesar Consortium. This is clinical sequencing evidence gene research funded by the NHGRI there were five centers. Did exome genome sequencing for the diagnosis of prenatal, NICU and pediatric patients across the five centers over 3000 patients. The results actually I'm going to show you are, there was no association actually with genetic ancestry in this analysis, but there was a positive association with a number of indications. So this is work that was done by my graduate student, UCSF and collaboration with many folks but if you look, again, most of the variants discovered here were dominant. Some were inherited, not all de novo, many de novo, but there were recessive and XN cases as well. But if you look at the logistic regression results on the table on the right, what you're going to see is that kind of show you that none of these are significant. None of the ancestry results for any of these answers were significant, but sex was male sex with females rough some males were less affected and on the number of indications if you had more than one indication and increase your risk, or your increase of probability you're going to get a genetic diagnosis. So now what about common or quote complex traits. So, in terms of allele frequencies of low risk variance or, I would say largely unaffected by selection so they can occur at all frequencies so that's not entirely true but mostly true perhaps. The problem is detection power is really high for more common low risk variance and association studies. So, on the other hand, using Mendelian, Mendelian results. still characterized low potential over this variance by annotation so, again, characterizing missense or loss of function variance to be included in association studies by combining them, even if they're low frequency they can they can still be identified statistically, but I would, I would suggest that perhaps some lower risk variance, maybe largely missing there I would imagine they're quite prevalent, but might be missing just because of the statistical power. There are examples where higher risk variance can be more common when there's balancing selection. So, just the example HLA would be an example of this. There's some very moderately high risk variance for type one diabetes for example which historically we've had a lot of selection but they're still occurring with high frequency. I know there's tremendous balancing selection occurring at that in that region. So actually one thing I was thinking the discussion this morning. You know, if you're talking about trace like blood pressure or lipids or whatever there's actually potential disadvantages of being in either extreme. So, so the optimal phenotype is somewhere probably in the middle. And that, that optimum taxi changed, depending on the environmental situation but still a selection could be occurring at both ends of the distribution which is going to push, you know variance, you know, which is going to push for polymorphism and more higher allele frequencies towards pushing people towards the center. In terms of effect sizes for common alleles. So this is a sort of phenomenon of pleotropy and context dependency. So, this is just one example of pleotropy. This is a work with John witty, when he was at UCSF this Hawks be 13 g 84 you will this was actually work with Tom Hoffman did. So basically if you look here. This, this variant has a frequency of being made about half a percent in Europeans. The odds ratio for prostate cancer is about 3.6 but you can see there's a bunch of other cancers for which this leads to be a pre just predisposing factor if you look at the column of odds ratios blood pressure okay context of sex. As I mentioned this morning, blood pressure has higher heritability women and overall it appears that some of the effect sizes are greater in women than in men. Age, I would imagine that for many polygenic risk scores of the, the effect is greater at younger age and I think that's true for cardiovascular related conditions. I would, I would suggest that pharmacogenetics is actually a very good example of an environmental context for snip effects. So basically what you what you're going to do is you're going to compare regression coefficients of a snip for individuals when they're exposed or not exposed to the drug. This is one of our junior faculty, I can get me on your Senate UCSF, looking at the results of statins, whether the effect sizes of variants differ when an image is on statins versus not you can see there are three snips here for which that's true. If you look at the beta coefficients you can see they're statistically significantly different. So I would argue here's a good example of an environmental exposure and exogenous environmental exposure, where, where the snip effects differ. So this has great significance in terms of doing pharmacogenetics. Okay, now what about ancestry. So, here's an example of genetic ancestry so this is cutaneous scrimaceous carcinoma, which this is work of Eric Jorgensen on our Kaiser Jura cohort, basically showed that, first of all, scrimaceous carcinoma is dramatically more common in whites than any other group it's has a moderate frequency in Latinos if you look at the ethnicity frequencies here but it's very uncommon East Asians or African Americans. Okay, and it turns out there's a very strong ancestry effect a genetic ancestry factor on the right you can see the Europeans and it's really highest, the farther north you go in Europe, but it's also true in Latinos. The more European ancestry, the higher the risk of scrimaceous carcinoma. Now, what turns out is, in terms of the genetic ancestry, a skin pigmentation, a genetic prediction, or a polygenic risk score for it. All of these effects are larger in Latinos actually than in the non Hispanic whites. The genetic ancestry effects are stronger, and this the pigmentation effects are stronger so in the con in this context, there's appears to be a genetic context here in which the snips are operating in terms of ancestry. But here's an example of exactly the opposite another skin phenotype this is a topic dermatitis, which is twice as common in African Americans. This, this is again in our GERA cohort. If you look you can see the first row twice as common in African Americans. When we looked at genetic ancestry, which is the first row, no relationship to genetic ancestry in the African Americans, there was no higher risk associated with African ancestry. When we look at African ancestry by itself without including the race variable is highly significant. When we include both the race variable and African ancestry in the entire cohort of whites and African Americans, only the race variable is significant. The same thing happens when we include a skin prediction by itself is predictive, but including the race variable is no longer predictive. Like risk or free topic dermatitis is highly significant in the whites, and it's not all significant in the African Americans. So again, this is this is a context of a race ethnicity or ancestry related context that's appears to be entirely environmental. So now to the topic of the role of major genes for non Mendelian trade so I refer to this as the confluence of foreign reverse genetics. So typically these variants are identified by reverse genetics positional cloning or association studies. But basically, what results in is a phenotypic characterization of the Mendelian subset of disease. But the big question is how relevant is this subject the subset ideologically to the remainder of the trade that doesn't have a major gene effect. Historically, this was done by segregation and families these genes were found that way. For example, BRC one and two Alzheimer's Parkinson's disease FH. Maybe Naomi will talk about this it doesn't seem that to be true. These kind of pedigrees do not exist or the variants of segregating don't exist for some psychiatric disorders. But now I would think that we have the Nova is a strong criteria now for autosomal dominant disorders. And it's not clear whether the Nova variants. Well, they don't contribute probably to feel from a family based heritability so we select or how much they contribute to disease overall but they may have relevance for finding non non pathogenic or non LP variants. So, for example, we know that in the lipidest in our analysis, all the Mendelian lipid disorders have common GWAS hits and then suggesting that are common illogic pathways but if we look at blood pressure that doesn't seem to be the case the Mendelian blood pressure syndromes don't seem to have these these common variants in them. And for more comprehensive look at this. My colleague UCSF colleague Catherine Catherine Chang you who's I think in the audience there did a more comprehensive study looking at phenomide associations in 26 million genes showing that there are common variants, both common and rare variants that have phenotypic expression within those genes. In summary, I guess I'd say, said, why do you all traits are complex, depending on how you're defining the trait for and reverse genetics or complementary approaches that can help understand trait genetic complexity. Major genes for common traits provides a confluence of these two approaches which may have been, which I think you know there are people who really only like to do Mendelian genetics because they love the biology but that's not the reality of the world because so much of disease is not Mendelian, and I think both can contribute to understanding and background genetic and demographic effects I've shown can have strong impacts on these underlying traits. Okay, thank you. Thanks Neil that was a really lovely overview of many, many, many papers and things so any questions for Neil. Hi Neil Bruce Walsh. Excellent talk I really like how you frame a lot of these issues from the population genetic perspective. vacation. If you have a trait that's under optimal selection the trade itself is an optimal intermediate for it. The underlying loci are actually under under dominant selection. Alan Robertson showed this in the 50s very counter intuitive. If you have genes and an optimal selection, actually selection tends to remove variation not maintained it, and that removes the balancing selection. So, right. So if you're doing, like, we just heard about the cattle. That's right that's what's going to happen, but that's not I'm not sure that's what happens in a natural population because I think what happens in a natural population is not selection for the extreme but selection for the middle. So, for example, you're going to be in trouble if you have like blood pressure of like 180 over 100, but you're probably also going to be in trouble if your blood pressure is like 80 over 40, or something like that. So, you don't want to be an either extreme, you want to see this in the middle and that's what I was sort of arguing the balancing selection because, you know, that's going to push all these variants, you know, to be polymorphic to be in the middle because the middle is actually the I'm not sure that's actually my point. If you have an intermediate optimum for the tree. Yeah, you're lying low side or actually under under selections very counter intuitive but Alan Robertson show the statistics I'll send that stuff to you, but this was shown in the 50s. Very counter intuitive. Yeah, that's kind of just counter intuitive I just wonder if that's true for like some of the human traits that we've been talking about. Okay, I'm going to take two more questions, and then we can carry on in the discussion so low. Yeah, thanks. Lloyd, you know, great talk. Thank you very much. I was wondering about your some of the observations you showed us about the effect sizes being different in different groups. I was wondering to which extent that can be predicted from knowing prevalence is because when you for some of your main measures of other ratios they are sort of dependent on the prevalence and you're starting your point saying that the prevalence is different. And I was wondering to which extent some of those effect sizes were essentially scale dependent. I don't know. I'm not sure I could have predicted, like with the topic dermatitis for example is twice as common African Americans. I don't know that I could have predicted beforehand that had nothing to do with genetic ancestry that that difference had nothing to do with genetic ancestry. And I'm not going to scream a cell carcinoma because you know in scream a cell carcinoma is skin pigment related that that that that is sort of what I would have predicted but the topic and we're sort of left bewildered in a way because I showed you that skin pigmentation genetics, genetically predicted skin pigmentation, which probably is correlated with some social factors also my African Americans. So, I don't know that we and I told you that the PRS for that is null and African Americans. So, I don't know, you know, there could be difference in allele frequencies underlying that PRS but I my suspicion is, it's just the the variance are not operating in the same way in the same context. And I guess the point I was trying to make is that when you have an ancestry context, it could be genetic ancestry that's the issue or it could be environmental correlates with genetic ancestry that's the issue and I don't know. I wouldn't know beforehand. I mean but are you suggesting you might, you might know. I wasn't suggesting that. Okay, I wasn't sure. But that would be great. I mean if you had some other information or evidence to suggest where how would you go. I mean I think that would be really useful. Okay, our of interview don't mind I think we'll move on and leave your question to the panel discussion. That's okay. And so the next piece. I wanted to sit on. So, thank you Neil, hope you'll be there and for the discussion. Move on to my slides. So we didn't confer before the talks but I think we've made a really nice session with covering different topics so in my talk, I am going to try and contrast evidence of genetic architectures and do a very introduced many papers with a very quick skim over the top, hoping to kind of trigger things for the discussion later on and some of the things I'm talking about are not my expertise at all so essentially we had two problems one was about genetic architecture examples and one was about evolutionary processes. And so I'm going to start at looking at genetic architectures, essentially increasing complexity in the types of phenotypes so I'm going to start with the eqtl gen paper we heard this morning about eqtls. So in this paper of 32,000 people with gene expression in blood. They identified 88% of genes had a ccqtl so in less than one megabase from the gene, and of those 92% were within 100 kilobases and those without a ccqtl had evidence for selection constraint. We also looked at trans eqtls I'll think about those in the context of this proteomics paper so protein qtls. This was the UK Biobank paper published this year, nearly 3000 proteins. So on the left hand side we've got the chromosome position plot, and the red dots are the cis pqtls and the blue dots are the trans pqtls. Again it was about 82% or I think 88% of proteins had a cis pqtl for the gene which encodes them. The bar chart across the top shows the number of proteins associated with different positions in the in the genome. So in terms of SNP based heritability the average SNP based heritability is about 16% of that about 20% was explained that by the pqtls and 10% from the trans pqtls. In terms of sample size on the right hand side, you can see is that sample size increase weren't detecting more cis pqtls after about the sample size of 10,000, but as sample size increases identify more and more trans pqtls. In terms of what trans effects could look like I like this example of fatty acids opposite fatty acids come from the diet but there's enzymes which change fatty acid composition. You can see in this known pathway of fatty acids. If we do a GWAS on the first fatty acid then one locus comes up. When you go to the next one down the pathway, the first locus remains and a new one comes third one down the pathway, the first two are still there in a third one comes I think that's a nice example of pathway results. So here I'm looking at vitamin D this is a UK Biobank study that I led with John McGraw. And so vitamin D is obviously a blood biomarker more complex that gene expression. As you can see from the genome wide association plots there's variants from the strength of the association you can detect this variance of large effect, but still very polygenic. What was surprising to John and I because we're used to looking at psychiatric disorders was just how interpretable those results were in terms of what's known about vitamin D metabolism in terms of properties of the skin in terms of liquid and lipo protein pathways, in terms of a liver metabolism in general. I'm highlighting this paper from Jonathan Pritchard's group, which we talked about this morning how the EQTLs can be mapped to genomic annotations and promoters and not so for the GWAS results and just again putting this a trigger for discussion we had some discussion about that this morning, and how that's interpreted. Moving on. The graph on the left hand side is the kind of the old way that we used to look at genetic architecture so on the x axis we've got number of cases. The y axis the number of associations and here contrasting Crohn's disease and schizophrenia both of which are relatively uncommon half to 1%, both of which have relative similar architecture about 80%. But we identified variants for Crohn's disease with much smaller sample sizes than for schizophrenia. And then on the right hand side looking at what you know now we've further down the GWAS track for both these disorders. I really like this review on inflammatory bowel disease where I think we've got more than 200 loci. And those variants can be mapped to genes and the genes can be interpreted in the general. We've seen about inflammatory bowel disease, but still very complex many pathways, reflecting the structure of the gut, the interactions with the host micro microbes, the fact that it's an inflammatory disease and the infection responses. So we've got the psychiatric genetics consortium paper last year, which now has 77,000 cases we've got 287 loci. Again, we try to interpret those results with post GWAS analysis and interpret them in terms of synaptic biology but I would say the interpretation is for me less clear than this example of inflammatory bowel disease. So I'm taking this forward now, comparing two disorders, one of the brain and one of the guts now common disorders. So I've put them on to this, I've extended this plot a bit on the X and the Y axis. And so, again, contrasting that these two disorders which have got similar, similarly common in the population 12% for diverticulosis, 15% for major depression similar heritabilities. Again, much smaller number of cases were needed to identify a large number of variants. Taking this example forward. So on the right hand side major depression. This is the latest results results in the psychiatric genomics consortium. We have 525,000 cases we've now identified more than 600 loci. You can see from the Y axis, the minus log 10p values go up to about 40, which is much lower number than on the left hand side for diverticulosis, where with just 77,000 cases you can see much higher minus log 10p values indicative of larger effect sizes. So diverticulosis is very common. I don't think many people have actually heard of it because it's not a disorder we talk about leading out of your backside is not the sort of thing you tend to tell even your family, let alone your friends. But yet it is very common. And what was again striking to me this is what my PhD student did was how interpretable those results were compared to my work with the psychiatric disorders so we could map the snips to genes and your gene. And those are interpretable in terms of the structure of the the intestine gut motility elasticity of the gut. And this is a disorder where most people comment, you know, you're told it's about diet but clearly the genetic contribution is important as well. So in that first half of the talk I tried to whiz through complexity of traits and showing how results from these complex diseases are interpretable, even though there's increasing complexity complexity. So now going on to prompt to so this is a plastic way that we think about selection and the evidence of selection so on the x axis we've got minor allele frequency and the y axis effect size and you can see this this shape of curves exactly what we expect when selection is taking place that variants of large effect are only likely to be present in the population at low frequencies and otherwise the selection acts to to stop them from getting too high frequency. So here showing this for height on the left hand side and schizophrenia on the right hand side. I've got my PhD student and her primary supervisor Alan McCrae. She's got a study which is under review now where she identified parents which are antagonistic so looked for g was for for regions where or snips which are associated with two traits but if you kind of map them to what you'd expect the direction to be for fitness but they were opposite. And what she found was that if you've got snips which are it's hard without a pointer but you can see on that example of the height graph is this two dots which seem to be have a larger effect size and you'd expect from their minor allele frequency. And she was finding those attending to be the ones which are antagonistic effects which is, I think what you'd expect from evolutionary theory. So just to say it leave evolutionary theory applied to G was results is a very active area of research and not one that I'm an expert in at all. I've put up a whole series of papers apologies I've missed some. The point of these is to trigger people's memories of what's out there as many of the senior authors are here in the audience and should be contributing to the discussion. So we're going to talk about one, some evolution analysis that we did in two papers one in 2018 and the other in 2021 the first one, but it was based on individual level data and the second one, carrying the method forward into G was summary statistics. And this was done by the amazing Zhang Zheng and Zhang Yang. And so this is a Bayesian regression random regression model, you're most of you be familiar with this type of model where we're relating phenotype to genetic effect sizes through their, through their genotypes. Beta effects could be drawn from two distributions either have no effect at all, or proportion have an effect where their effect sizes are drawn from a normal distribution with variants such that the effect sizes effect sizes related to the heterozygosity or allele frequency. And so that's relationship between effect size and allele frequency is is measured by this coefficient s. So from these methods we can estimate three key parameters from just for genetic architecture, the snip based heritability that selection coefficient s which relates at my now your frequency to effect size and this polygenicity parameter what proportion of snips have an effect. So in terms of results on the left hand side contrasting height and BMI, and the distribution is the posterior distribution from from the analysis from Bayesian analysis analysis. And so you can see that BMI is more polygenic than than height that height has got a higher snip based heritability than BMI. And this selection coefficient is more negative for is negative for both traits but more negative for height. And then contrasting the selection parameters for different types of traits, disease traits reproductive traits physical measures and cognitive traits, and just a very whiz summary of this is to pick out a couple of things that the intensity estimate seems to be much lower for these disease traits, then for physical traits or cognitive traits. We've got a couple of outliers in the disease traits and this one is schizophrenia and this one is bipolar disorder. So these are obviously traits of the brain which kind of link them down to these traits down here. And in terms of selection coefficient stronger as coefficient for these disease traits and for cognitive traits. And since previously I was contrasting blood biomarkers that disorders and the psychiatric disorders I just made a plot of those. Again showing much higher polygenicity for the psychiatric disorders compared to the gut disorders and stronger S coefficients more negative S coefficients for the gut disorders than the brain disorders. So here up at these we did some evolutionary simulations when I say we, that's Jean Zheng. And there's a lot of information on here I don't expect you to look at the detail but just to say these were bonafide a slim three evolutionary formats forward simulation models where over time there's new mutations, the mutations could either have a neutral effect or have an effect on on fitness with plyotropic effects on traits and going through 58,000 generations, etc, etc. And then in the last generation we generate g was data which we can then analyze in the same way that we do the actual empirical traits. So in these simulations the key input parameters that are varied are the selection coefficient with a relationship to fitness, the proportion of mutations that have a causal effect. And then the specific variation, attributable to causal mutations. And then the key output parameters of this S selection coefficient, the polygistic parameter, and the S based heritability set based heritability. So again summarizing very quickly the results, again emphasizing that the input parameters to the simulation and output parameters and you can see that they are kind of related to each other but just to emphasize that they're not exactly the same. So a key thing that we found was the interdependence of the underlying evolutionary input parameters, such that when we the things that we can estimate from the real data we found that we can't you can't just interpret them on their own you have to interpret them as a package. So for example, if we did a simulation where we have the same proportion mutations of the cause or for the trait. If we had a high selection strength simulation or low selection strength simulation, then the estimate of polygenicity would be different, even though the proportion mutations for the causal trait with the same. What that meant was that for our empirical results, where we showed that those cognitive traits seem to have a less negative S coefficient. In fact, that would happen because of the large mutational target. So if you account for the fact that the large mutational, there's a much larger mutational target, then in fact, the simulation suggested that the selection strength from an evolutionary point would actually have been higher on the cognitive traits. And similarly that very negative S estimate for disease traits is because they've got a lower mutational target. So that's the estimates of these large mutational target size necessarily implicates, implicates widespread platrophy. So I'm just going to end with acknowledging that the moderator of the discussion panel this guy here who's just had a paper put on bio archive which also looks at this evolutionary modeling to G was some sense I just wanted to give him a shout out as another trigger for And with that, my acknowledgments of funding. Thank you. Any questions for me, you might disagree with what I said I had to fly through lots of things very quickly. Essentially I was presenting a lot of overview of work so I'd be happy to leave it to the panel discussion unless there's burning questions. So this is a question online for you. Thank you for a great talk in the S base method how does the S relate to a local specific selection coefficient to the locus. Well there isn't a locus specific selection coefficient because so the S is relating the real frequency to the effect size is modeling that the selection is based on a set of variants and their relationship to fit with fitness trade. So yeah I think the question related to genome white selection versus local specific selection. Yeah. Well, the variants are cumulative across the genome to make a trait of fitness and selection acts on that trait so that will help. Yeah. Thank you for nodding at me Molly. The fact that it's very hard to find do our hits for major depression. Is that because it's slowly heritable or or does these simulation results explain why it's there's so few hits for major depression. Sorry I flew through that there aren't few. Now we've got I think 600 hits for major depression from a massive sample size the half a million cases. And so we do have the hits, even though major depression is a very heterogeneous phenotype. And we can interpret those results in post you was analyses where we integrate and say with single cell sequencing results, where we can see that they map to. They're rich when you map to expression in the brain and particular cell type so we can interpret them but the point I was trying to make is that I feel they're less interpretable than the than those disorders of the gut. And what is it about the architecture of depression that makes it so difficult to find those hits that you needed such large sample size. Well, I think what. So, large, so large sample sizes means that the. Essentially, we've got more variance the effect sizes are small and that's what was showing I think also without evolutionary modeling that the essentially the mutational target size for brain traits is is larger that's how I'm interpreting it. I want to follow up on Mike's question. You think for the cognitive behavioral phenotypes, the GWAS are not interpretable in theory or just in practice. Sorry, I don't want to say they're not interpretable is more of a relative thing. And I think the. The thing is, yeah, disorders that maybe the, you know, what does polygystery mean it means that there's many backup routes, there's many, many ways to get to the same thing to time which means there's, you know, so maybe the brain needs to be more protective. And of course, another reason why maybe they're less interpretable is because it's harder to study the brain in general we've got less knowledge to map it on to but I think by putting the results together with that evolutionary modeling was trying to say that the mutational target seems to be higher for those cognitive traits. So I think we should move on to the panel discussion. Yeah. Relating to this last thing in general genes expressed in the brain are larger. So let's take a look at your first interrupts with regulatory elements if you correct for that view, tried to see whether the number of gene, I mean, we sort of somehow assume that polygynicity is high so therefore they are more genes but that sort of has an embedded assumption all genes are of the same size and equal importance. So do we do we really know that or that. The brain genes are longer and therefore the mutational target is larger or In general, when people are studying the psychiatric disorders we recognize that the brain genes are longer and all the post you was analysis will account for that. So I was trying to make this very quick talk was asking about this trying to address that evolutionary modeling point that we were prompted with and saying that I feel like the evidence from the from our modeling is that these cognitive traits seem to have a higher mental target. So I'm going to pass over to the panel discussion then. Okay, so just to get started with a couple of questions based on some prompts we got from the organizer so one one question that maybe we could actually go through pretty quickly is a game. So do we know and not know about the genetic architecture of complex traits defined as it was defined by know me I think previously in terms of the number of variants, and they're going to solution of their frequencies and effect sizes, and just to give elaborations what do we know about the genetic architecture of common variants versus rare variants. Another one is when we learn about them. Do we expect. So we measure the effect sizes and we measure the frequencies in these data sets and then maybe there's some issue of fine mapping, but also the effect sizes we get a in GWAS are subject to confounding and whether we expect this confounding to be different kinds of traits. So that's another pointer on that. And then maybe how we expect this genetic architecture to and what we know about how it looks different and different human populations. So that's a one question and just the second question just relating it to the topics of the other sessions here is okay we got this object which is the joint frequency distribution and number of variants. And then how could we tie that into the biology of the traits. So, um, maybe feel free so yeah. So, yeah, why don't say. Okay, hold it. Yeah, I mean, I think I maybe agree with what you're maybe saying which is confounding it's going to turn on every snip right like like so population stratification should be this teeny little signal that hits on every snip so the bigger the the bigger the bigger. Okay, yeah. Yeah, the straights with more stratification bias should have way more tiny little effects distributed across the genome. That should look like polygenicity. I don't really know I think that like, maybe this is more of a question than an answer but I think the different methods give quite different answers as to how polygenic things are. I'm not, I don't feel like I have a good understanding for why different things are polygenic I definitely feel like the complexity of depression and heterogeneity is likely a contributor to why it's so polygenic. But I definitely don't know that that's the only thing. Yeah. Maybe the best way to phrase this discussion is to think that as an organism we have to conflicting things we must do. Number one, we must be have our developmental system buffered enough that will get to a reasonable outcome, no matter how bad the situation is. But number two, we also must have it flexible enough to respond to subtle changes and getting that fine balance I think is very tricky. And I think that understanding and understanding of that, things like small world scale free networks and things like that and the robustness I have in there. I think an understanding that is kind of the underpinning to a lot of these architectural issues. Most the pop gen models people have worked out and by the way, the dirty little secret that no one will tell you is evolutionary population genesis has been trying to figure out models for the maintenance of quantitative variation and incredibly elegant models none of which work. There's little flaws that if you look at them carefully, and it's still in the process of being resolved, but I think understanding the fact that systems are buffered. And just the final point I make is that there's some classic work from fly and mice people in the 50s and 60s, where you take a gene that trait that shows essentially no variation, and you introduce a single major mutation that's it. And all of a sudden it shifts the trait distribution, and there's a huge amount of hidden variation that's uncovered in that. And systems are incredibly well buffering I think an understanding that buffering which is more systems biology, I think will give us some more insight into architecture. So that's a great non answer for you. So I think kind of building on Naomi's nice presentation of some of the sb s work. One thing I really appreciated about that is you get. To look at some consistently done analyses looking at GWAS that have been running the same cohorts across similar traits. You've got like a pretty comparable setup and you can say how consistently polygenic or inconsistently polygenic are different classes of traits. So I think there's some nice leverage from massive study designs that we've had recently in big biobanks where we can start to tease those apart and from those things become very clear, like biomarkers are a whole lot less polygenic by and large things like LDL cholesterol and things that are just circulating in the blood. It's a whole lot less polygenic than things like say schizophrenia and bipolar disorder which seem to be on another end of the spectrum along with height and other other really complex traits that we treat as sort of like these model phenotypes, you know, BMI is there too. But we're really interested in those really complicated things. We're really interested in thinking about, you know, clinical models for cardiovascular disease for example, we know that some of the causal risk factors for those include LDL relatively simple blood pressure on the simpler end BMI really complicated. And so putting all of that together, it's really hard to think about how you get towards modeling the genetic architecture of really complex traits that are sort of related to other traits like that plyotropy is really going to impact I think the genetic architecture that we're trying to measure with disease oriented traits. And so looking across and trying to get at measures of genetic architecture in a consistent setup is really nice and really helpful but at the end of the day like what what progress can we make when we don't really fully understand in absolute terms what the like, you know, number of causal variants that we're sort of shooting for to try to understand the full molecular basis of these diseases actually is so there's I think a lot of different parts out on the front. Suhini. Hi. I wasn't sure if I would be a disembodied voice because I couldn't see myself on the screen. Yeah, I think everybody brings up really good points as an evolutionary population geneticist I also agree there's a lot of buffering in the system to the second point made you know just Alicia basically said what I was thinking about but maybe another another thing, another question that comes up for me with this question is. And Michelle touched on this the role of just a little like heterogeneity and traits to so you can talk about like consistency as Alicia did and kind of by bank based studies and I think that space us work is like really nice example of that but Yeah, going back to our in this point on hierarchical organization of snips I think some of what we've seen in this session is also touching on the fact that even just at the sequence level there's some heterogeneity and how the same trait can be generated and and that's something that you know, I think it's kind of interesting to bring to this question as well. Maybe I'll say one thing relating to this work that Naomi mentioned that's with you've all the Simmons and with Johnson Pritchard that somewhere over here in that. So what we did there is we looked at 95 quantitative traits and just looked at those were in the UK buyer bank you have over 100 hits to have enough power to look at and what do we are pretty certain at this point that we found is that when you look at the joint difference of frequencies and effect size and then. Okay, so I'll start a step before is, if you would imagine the situation where all complex traits have the same distribution of frequencies and effect size and then you would ask how would you expect it to look under a GWAS. So what you would expect that you need to do a certain scaling, basically based on the heritability and the trait and, and the number of sites and the genome affecting them, just because of power and GWAS because to identify a given variant, then, then your power reduces the more variants that you have and the environmental variants also diminishes this so you if everything would look the same, the same distribution, then you would have to do the scaling in order to compare the GWAS and different traits. And quite surprisingly it seems that when you actually do that scaling, then a lot of a highly quantitative traits have extremely similar architecture. So, so similar architecture and the distribution of effect sizes and frequency, but they differ a lot in the target size as it's been pointed out by different people and we would expect them to differ by the target size, because different traits are different, even though I think there's deeper questions to explore about that. But maybe I can ask a second just relating. Okay, so I'm done with that so maybe just to connect it to the session we saw this morning. So, you know, now we have this, this architecture as defined by in the GWAS community has joined distribution of effect sizes and frequencies, and the polygynicity. Then now how do related to the biology of trades. I'm just throwing this out there because we're supposed to point out gaps. So it struck me that the, the scaling effect would relate to mutational target size. And the evolutionary effect would be that the stronger there is selection on the, the trait, the tighter the tighter the association between effect size and frequency. If there's no association then you have neutrality of a watershed distribution. You have the underlying biology which gives you the target size, then you have evolution which basically gives you how tight is that connection. It is our allele, our major alleles tend to be rare. Right if there's stronger selection that tennis should be amplified that's what asked that no one was talking about. If that tendency is weak, as should approach zero there's no association. And I think that's one way to kind of think about target size versus evolutionary history. It's a crude way but it gives you something to think about. By the way, you should feel in, feel free to expand the perspective. I'm learning. Feel free to object, you know, contrary viewpoints these meetings are only fine when there's respectable tension. If we all agree then we might as well go home. Yeah, so I think that biology is more about like cell types and, and these things right I mean so the, the GWAS sits are. I'm thinking about a thing Brandon on the computer said earlier this morning which is like I spent all my time thinking about protein interactions and I never actually explain anything about the complex trait variation. And I have like the opposite problem like I spent all my time thinking about like this sort of our squared, especially in sample, especially in European ancestry individuals who are white British and live in the UK, right. And I spent like zero time actually doing biology and I'm trying to like actually get to cells and to genes and to drug targets to actually like cure diseases. And so that's I guess more the type of direction I'm trying to go is to move backwards from the complex trait associations and to funnel these things back into mechanisms that are a little bit more tractable, and we could work on. And then I guess to make a G by e ish like presumably the ease are going to be working through similar mechanisms. Like they're going to be sort of funneled together through this thing that truly showed with the little, the pyramid of things funneling together right. So I want to find these little nodes in that network and treat disease by hitting them. I guess there's a few potential directions to go with this so one, some of these sort of polygenicity estimates understanding like methods to try to understand genetic architecture or cross traits are of course a little bit sensitive to confounding are sensitive to like how well of you controlled for population stratification if that's even like the overall goal. And when looking across the world so to your point of looking at diverse populations. That can get a little tricky if you have controlled for population stratification in different ways or, or different ancestors in different ways. And then, you know, you're to your point if you're also interested in trying to layer on environmental contributions with genetic contributions to better understand complex trait genetic architecture overall. That's a fairly tricky problem but I think we're not totally lost here like we can couple genetic tools with other multiomic tools as truly already started mentioning and I actually have had a lot of hope and inspiration lately from some of the work that Naomi mentioned on the proteomics front. So she mentioned the massive discovery and proteomics like pqtls for example like the vast majority of genes have locally associated sys pqtl associated with the protein levels. But one thing that I found that's incredibly cool is that the proteins themselves are not just related to the regulatory architecture of expression, but they're also incredibly related to environment. So when you look at like how much of the proteome is perturbed by smoking, let's say, which is one of the very, very, very easiest possible places to start when looking at the environment. The plasma proteome is like just hammered by smoking like half a third to half of the proteome is vastly significantly responsive to that. So when trying to understand complex rates overall obviously this is a genetic and environmental conversation we need to be having. We also need to be appreciative of like the multiomic tools that we have at our disposal to try to dissect both inherited and modifiable risk factors. And that gives us some like really cool tools to try to leverage and understand like disease insights so maybe just rambling on a little bit further. COPD is like a drug as like a disease of smoking basically proteins that predict smoking, also predict COPD. But if you build a sport to try to predict smoking with these proteins you do a really good job of predicting smoking, you can also predict COPD incidents in never smokers which is really cool. And that's something that we can fundamentally learn from these multiomic technologies about shared biological insults from our broad exposures that gives us a nice lever I would say to try to dissect architectures of disease that are not necessarily always inherited, but have a common biological process and that then gives us a tool to try to understand and dissect these like fundamental molecular mechanisms or tissues or pathways that are related to those. I'm sorry for the massive tangent but it's all about engines. Suhini, do you want to weigh in. Well, I kind of want to ask Alicia to continue in her tangent which relates to a point that Neil's brought up in the chat which is that we haven't talked a lot about defining traits in this meeting yet and that is you know especially as we talk about a little bit more and you know the fact that we've largely been talking even as we talk about sweets of traits looking at them kind of one by one versus maybe looking at them in a sort of joint way or multivariate way and Alicia's done really cool work on on this recently in predictability and I don't know if you want to talk about that a little bit. I feel like I have already absorbed a fair bit of rambling time but I'm happy to continue expounding if you like. I think it's an interesting point to the gap that you know when we and it came up in NASA's talk to you right look that the way that we were limited in some ways, especially with biobank scale data sets and the way that we're defining traits and that that might also be affecting the insights we're getting. I have a question. Sorry, this is captain. I was actually having a thought in the same line, which is a lot of a lot of what we do, especially like to us in complex diseases. And also when we look at the population diversity background we see that a lot of the position a city that we find especially Europe and doesn't this is very translate you know the population question is, and we know that they always see the biology of the disease is the same across populations to the bed. The main question often is what are the factors that are driving these differences, but one thing that I've been thinking about a lot now wonder. We haven't talked about is how much of this, the way we define the disease susceptibility is actually bias by the difference in risk factor. In the case of coronary artery disease for example which is the one I know the most it's that the way we run to us is just a case is a CD, however we define that. We don't know if at risk or whatever in, in, in, in our city code and whatever definition of see we have, but the control is non CD, but the reality is that if you look at all of these data type to diabetes a very important risk factor for CD and if it's not classified as a CD today, maybe that person already have some some form of at risk or related things that don't really classify that individual as control but that's what we do and I wonder how much of that is hindering the work that we're doing and how much of that actually is related to really taking the perfect estimate that we have in the snip into straight understanding of the biology. So, I'm, I'm just not 100% sure understood the question so you're talking about, like, how are classification into cases and controls affects the results of the G was. Yeah, so like an ascertainment issue. That is, if you ascertain things differently, you can define traits differently. There's an element of time wrapped up into the question. So, there's a lot of modifiable risk factors that we need to consider alongside our inherited risk factors we have, you know, kind of a nice privilege working in the genetics field where we don't have to do much, with the exception of mostly, you know, age and other covariates that we need to consider along with our, our analyses and maybe the context in which people live related to related to time but like with other exposures and people's lives and other, you know, risk factors that people are exposed to like those you need to really consider time with regards to disease incidence and progression, as opposed to just case control status is that kind of what you're getting at. Yes, and how much of that we're actually in there, the, the, if it takes me that we see across and how much of that will vary across population because some of those risk factor have prevalence that are different across population. So how can we actually combine all of these information to maybe have a better overview of the genetic estimate that we want to see. Well, I think that both Suhini and you raised a really important point, which is that a lot of the ways we define traits are a have a big arbitrary component to them. And also, there's a lot of overlap when you look at the bio banks, then in terms of if you try to reduce genetic correlation between the number of traits then you kind of like go down from any hundreds to many fewer ones. I wonder if one interesting direction to go now in G was, which I know several people have have started going into is to try to use the fact that we have a genetic data in and we have it on multiple traits in order to maybe try to define a more meaningful biological traits in the way of a maybe some types of a dimensional reduction in terms of the genetic correlation between various traits but also maybe one might relate it. I'd say, being biased as an evolutionary biologist to fitness components, because fitness is an objective measure, at least one for biological importance of a trade. And I don't know a lot about this field but I think that specifically when it comes to a say psychiatric disorders than the degree of arbitrariness and definition is also a very big issue. So I'm just throwing this out there and yeah please jump in any what anybody. So I have a question from the Q&A online. The question relates to what we know about mutational rate differences across the genome and how that variation and mutation rate can explain or not variation and how much each individual locus explains trait variation so it's the distribution of trait variation across the genome and mutation rate differences. So there's a real rich history and sort of classical quantitative generics of measuring the so called mutational heritability, and it's easy to take an inbred line. You break it up and you look for divergence between lines to solve the mutations, and the surprisingly consistent result is the mutational variation seems to be about one 1000 the environmental variation per generation with deal obviously variation about that, but but there's a lot of variation that is generated by that and so members we're not looking at sequence variation looking at sequence variation times of exercise. But this gets to the broader question which really underlies a lot of these issues in architecture, and that's pliotropy, because you might ask well why is my specific trait under selection. Your trait may not be under section but it may have a underlying variation which also have pliotropic effects on traits that are under selection. And one thing we really don't have a good feel on is pliotropy. I mean, and by pliotropy I don't mean she affects two traits I mean pliotropy. These are the 6000 defined traits of this one mutation effects. And that's really the underpinning we have to have to get models about how evolution shapes architecture is really have to have a good understanding of pliotropy. A classic basic question. If you have a large effect to Leo, it's generally assumed that if it that it has more pliotropic effects and those pliotropic effects are larger, but that's an assumption. So far as I know there's, there's some data on it but not a lot of data on that. You're speaking about pliotropy of, of really fitness rather than many of the traits we're talking about. So yeah, if you want to get the apple that's a good point if you want to look at the evolution that concern well, it's a two step process. Next, do you have an effect x I color, I color affects some other trade why and trade why is under selection. So, ultimately what shapes it is pliotropy with fitness, but that gets that's an indirectly measured by a pliotrope with other traits which affect fitness components. So typically, with regards to the question that was about mutation I think it's an excellent question, because we do know that, you know everything being equal if the mutation rate would be larger in the region, then, then it would contribute more genetic variation. And we actually know a lot about the determinants of mutation today in the human genome. I am not completely sure but I think it's actually easily a quite accessible question if it hasn't been looked at systematically and whether there's a direct relationship between between the mutation rates and in regions in the genome and and how much different does it contribute I think that's a readily studyable question right now, but I don't know the answer of like how much does it affect things but I think it's actually a low handing fruit to anybody chooses to do so. There seems to be a lot of other questions I didn't look. I've got the microphone. I have a question and comment about pliotropy and I was wondering if the panel can answer whether it becomes more problematic, the rarer the variance or the bigger effect for example if I interpret Michelle's talk properly then if that if that my static mutation or one of those mutations had been first mapped as in its homozygous state so like an embryo embryonic lethal or something you would have called it is a gene. But in the heterozygous state it was a double muscling gene. In humans we have all these Mendelian syndromes that are called after the person usually who first described them, but they often also apply a topic effects on height or growth or something but we don't call them high genes. Right. So, so we're missing something there when we were being very trade specific. And the second one is a is a is a comment or something to think about I don't know which word comes beyond omni. But the late Bill Hills model sort of for complex trade was all mutations affect all trades the question is by how much. Yeah, I like that model. That sounds right to me. Yeah, it's like really zero is like a very magic number. Right. So it's like really hard to have zero effect. Yeah, I guess I don't really know the problem of pyotropy is like, especially I guess you mentioned mental disorders right like I think, like, we, I don't think we know how to define mental disorders very well I think that's fair to say. Um, and so to me that's more of like a strength than a weakness that you can sort of use the pyotropy to actually try and answer clinical questions. I'm not sure that applies as much to other disorders, but I think that I think the move is to throw in all of the traits to all of the analyses and try and find parsimony in a like latent space or in a mixture of latent spaces or something like this. And I feel like once you get the mechanism right like once you have whatever this magic gene is that's doing double muscling like you don't like you know it's going to increase height you know what it's going to do because you actually understand the biology right. I think maybe with regards to Peter's comment. Yeah, zero is a magic number, but a, but a, but a, you know, just saying that every variant is a affects everything to a certain degree and is a kind of like throwing the baby with the bath water a little bit so actually when we're looking at a, you know, a specific complex trait and you know there's a beautiful paper for example from NASA where he actually looked at a, at a complex traits where we actually for biomarkers where we know a lot about about the underlying pathways and then you could ask whether how much of heritable variance that we identify in GWAS comes from the things that we know to be the underlying pathways and you discover that even in these cases which are maybe on the lower polygenicity range of of complex traits, then they explain things that are directly relatable to these pathways explain and correct me if I'm getting this wrong maybe 15% of the of the of the heritable variance in the trade so actually it's a perhaps quite surprising that we find a lot of the heritable variance elsewhere. And, you know, I want to add another comment on that it's often a aim, you know, there's this a notion that I think it was even mentioned here that the fixtures into an intestinal model has found to be correct now strictly speaking that cannot be true a proposal and zero being in a, you know, a special number, because infinity, decimal effects do not exist and the genome is not infinite. Otherwise we'd be in real trouble as doing GWAS. And, you know, I think there's this a mistaken impression that even though quantitative geneticists have been assuming that model for very long they actually knew it to be true or a good approximation, because actually as far as I can tell and maybe I'm risking myself here but do educate me the evidence for assuming that the number of variants contributing to heritable variation in in traits rather than just assuming it is huge has been very limited right like as far as I know there's a experiment director direct so you know the experimental evolution and looking at the response to direct selection and analysis by Bill Hill in others where you could tell that maybe it's north of 20 variants that are contributing to the selective response maybe north 50 as best. Maybe to distinguish 50 and 50,000 and now after GWAS we know that we're living in the 50,000 world and and beyond for some traits, a bit below for the other. So I think we should be a bit more careful was saying we knew this all along and it's all an infinitesimal model. There's really quick comments on that. The test model is wrong because what we will we see in the GWAS is small effects across the board, but those effects are measured as variant sizes. So you have large effect alleles are very rare. So you get this pattern of small effect infinitesimal, but the effect is the variance, whereas large effect alleles are out there they're just rare. Zero is a magic number but so is for an ES. If the effective population size is relative to S is such that it's not bigger than one than those if those alleles are neutral. So they're basically just bouncing around there. So he knew do you want to weigh in because it's hard being the way. No, thanks. I do agree that for any as a magic number. Maybe that's what I have to do. The, to me, the most shocking thing about all the studies about genetic architecture is how big the mutation target size is for, for, you know, lots of traits it's 10 to the seven base pairs or bigger. That's telling us for more or less any complex trait you think about. There are 10 million sites in the genome where you can perturb that trait that that's, you know, surprising to people have an explanation for that. And what does it tell us about the biology. It tells us that there, there really are lots of different ways in which you can affect more or less any trait you think of. Yeah, I think for some traits there is a lot of variation in that though, like, you know, for body mass index you get like the numbers you're you're talking about what when you look at a biomarkers you actually get something that's like an in order and a half of magnitude lower. So it's not that all variants are affecting everything. You know, other than the fact that zero is a magic number which keeps coming back in. And I think the question of what determines in the target size is actually a very interesting one, even though you know you could say things about the traits themselves you can hand wave and say, okay body mass index is something that you know involves a lot of tissues and a a lot of stages and developments and so on and so forth and it kind of makes sense that it has very a larger target size than your eight levels, for example, but being able to move a beyond that and say something systematic about the relationship and the target sizes that we're identifying I think is a very interesting question and might involve things like Bruce mentioned before, like, you know, maybe how a modular the genetic system that's affecting it is or or a buffering and and questions like that as well. But just let me re-inverse how extraordinary this is. Sites affecting body mass index, even if every 20 every one of the 20,000 beans had an effect, it would take 500 sites in each of those genes. It's postulating that every gene in the body effects body mass index, and in every gene there's 500 sites where you could have that effect that that to me is a very surprising conclusion. Regulation is a wonderful thing. My view is a lot of the variation you see, which is probably incredibly tiny. We just pick it up in these massively powered GWAS is is suppose you have a small RNA that binds something that regulates which regulates your gene. Now you have a random DNA sequence that has a one base per change. And now it's some very low frequency it can take some of that small RNA and bind it by mistake. It changes the stoichiometry slightly, not a big effect, but you have lots of potential for that scatter throughout the genome. That's kind of my view, it's probably dead wrong but I would also like to just add that the like BMI variant associations are maybe not all biologically similarly interesting or useful. So we can look at BMI associations with type two diabetes look across populations. So one of those are going to line up in opposite effects compared to like consistent effects. And so where we're finding opposite effects there's probably some interesting biology to be worked out there whereas like you're looking at the most minuscule effect that's like the most genome wide significant, they're like just, you know, over your threshold where you have no idea what's going on. And that's not where you would start to try to understand biology so you know I like this idea of using pliotropy to like also try to understand where biology is pointing us into consistent and different directions. And using that to kind of like try to bear down into biology and pathways. So pliotropy but also selection so I was really taken aback at some of Michelle's findings that, you know, for, for these identified quantitative traits. So, not even most of the measure just eyeballed. And, but still, you know for these size traits. And a substantial fraction of these looked like they were consequences of balancing selection or, you know, some pretty interesting selection effects. And so I think there's potentially interesting parallels back to the biomarker space. And some of us actually do have big effects across some populations that reflect important action of selection. I mean, you know, Duffy blood group polymorphisms and white cell counts and G6 PD effects on things like hemoglobin A1C. So maybe we had to be a lot more thoughtful about trying to hone in on more variation that that clearly been seen by it, but for grabbing hold of more of really important biology that we can hang in one quick comments but I think we need to finish fairly soon maybe we'll take another question. So I think in that regard I agree with you and I gave you pointed out by many people like our Vinda, for example, and and other people in the context of looking at rare variants so even though we know a lot about common variants architecture now, then, then because of imputation limits and in various other technical issues, then we're actually missing a lot of what's happening in the rare variants and, you know, those rare variants may be much closer to the biology that's specific for the trade and and might tell us a lot about that. Just Yeah, I can go here. I just want to make on the, the biological interpretation I was wondering whether what you said guy earlier, admitting that all genes or all variants affect all trades wasn't really plausible for some physical constraints on the genome, etc. But I was wondering whether that that was actually a miss opportunity because if what if this true, this is true. I was wondering whether the we are being constrained by our ability to run biological experiments. And that will change in, you know, 100 years or 200 years and and I was wondering what what if this is true. Shouldn't we be thinking about how to expand our ways to run those biological experiments under a model where everything affects everything else. But the question is how much and how can we know just thinking for example about this Chris per perturbation experiments when you sort of you can alter multiple starts at the same time. And maybe that's what will help us better understanding the, you know, linking what we learn from characterizing the architecture and have some biological sort of end point so common slash over to you. So, yeah, I don't think in an effective way that everything affects everything like you know I am, you know maybe in a minuscule amount that we, I don't know whether what will be able to measure in 200 years or whether we'll even be here, I as the humankind, but I am in and I think that when you get to, you know, if in a test small effects if you return to something else it also becomes less biologically interesting I think we are seeing a clear differences in in polygenicity and estimates of target size between traits where you have polygenicity for biomarkers in, you know, study, you know is around the 10,000 range and and you know you move up and for height. It's maybe, you know, a bit north of 100,000. And when you go to target sizes you have things like body mass index which are really like seem like more than half of what we estimate to be the functional human genome but then we have to measure which target size that are much lower. So, I, I think that thinking of everything as affecting everything in an effective way is not what we're seeing but also I'm pretty happy that's the case because that would be a pretty difficult world to to work in. People at the top of the distribution who have, you know, 100 or 1000 of those tiny effect but you know it can happen by chance that you know just like we with the Mendelian traits. So for those people, it matters and they will have a big phenotype nonetheless, but I think just for the purpose of people getting coffee and so on we might continue this in coffee because I already exceeded the time that was allotted to us. So we'll have a quick 10 minute break and then come back at 340 for the round table discussion.