 So for those of you who don't know me, my name is Carolyn Hutter, I'm the division director for the division of genome sciences and our extramural program here at NHGRI. And I'm here to welcome you to this meeting. So one of the things Zander and I were talking about is we've got some international people here and maybe some people and so I thought I'd just do the one minute. So what is NIH welcoming you to NIH, although you're not on our main campus you're sort of in one of our satellite spaces we're really excited to have all of you here in Bethesda and I was just discussing with Nancy how nice it is that we're all able to gather together again. So this mission is to look into fundamental knowledge about nature and behavior of living systems, and how do we apply that knowledge to enhance health, lengthening of life and reducing illness and disability. For those of you who like nerd out over mission statements, NIH is actually making some changes to its, its mission statement. And if you follow this link, you can provide input on what you think of the revised mission statement. So at a broader sense, NIH is obviously one of the major funders of biomedical research it is major funder in the US and worldwide. The 2023 budget was 47.7 billion, I do not know what the 2024 budget will be but I'm happy to report that we will not be shutting down on Saturday. So we do. It's we comprise of 27 different institutes and centers. This graph here shows the size of the centers by their congressional appropriations I don't expect you to read the small lines the top three are NCI or the Cancer Institute. The NCI, allergy to infectious disease at NHLBI heart, lung and blood down at the bottom you have the Fogarty International Center and NHGRI or the National Human Genome Research Institute who where I am is sort of there we debate whether we're the one of the smallest of the medium enterprises, institutes are largest of the smaller institutes, but across all of the institutes and centers of the, you know, almost $50 billion, 80% is awarded out through competitive grants contracts cooperative agreements etc to fund biomedical research. There are 300,000 researchers in every US state around the world, about 10% supports scientists in NIH labs, which we call you know sort of NIH intramural and then the difference, the remaining is sort of supports infrastructure some salaries those types of keeping the buildings open. I also had meant to notice in this. If you go up a little bit from us right here you have an ICHD the Child Health Institute. They are the people who have graciously given us their space for today, and also are providing some aid is the support so that we can give you to them. And I heard some people were wondering, you have to go through security when there's enough federal employees working in a specific building so this building used to not have security and then all of us moved in upstairs and tipped it over, not because they're all the all of us program but just the number of people. And then on NHGRI as an institute our vision is to improve the health of all humans through advances in genomic research, and we're really driven right now as an institute by our 2020 strategic vision published here in nature. So if you break down that vision, we really sort of divided our activity into a number of areas, starting with guiding principles and values that should underlie everything we do. The need to build and sustain a robust foundation for genomics the idea that if we're going to push things forward we need to start with a solid base. And then looking down barriers what are clear, where there are clear obstacles or things standing in the way that we need to get past to move forward, and then the development of compelling genomics research projects. Most relevant to this meeting today. If we zoom in on these compelling genomics research projects. There's a number of them listed here and I'm just going to call out one. We need to determine the genetic architecture of most human diseases and traits. And so, if you want to read more about our strategic plan in general or about this particular thing you can go to the website. Put an NHGRI strategic vision, you'll get 2020 you'll get the document, but it's really this idea that the field right now is poised to have a much more comprehensive understanding of the genetic architecture of human diseases and traits. And then we need to be in a position where we're understanding the myriad complexities that we anticipate and maybe some that we don't anticipate underlying as we outlined in the strategic plan. In the strategic vision, one of the things that's needed is new methods that account for human diversity, coupled with growing clarity about genotype phenotype relationships and innovative approaches to deduce associations, interactions with environmental factors estimation of penetrates of expressivity, etc. So what we've been doing since we sort of developed the plan in 2020 is understanding how can we move forward with some of these different components. And so in this particular place you're all here today. Sorry about the graphics for part of today's workshop. And we really see this as helping us sort of map out and understand what types of activities should we be doing as we think about how to tackle that particular compelling genomics research project. So as I welcome you today, I invite you all to participate in this meeting, actively listen to one another, commit to having this be a safe and open environment. More importantly for me, ask the hard questions. Think about what you need to do to facilitate the interdisciplinary interactions for this amazing group of people who are here in this room and provide input to us at NHGRI and to each other for ideas that you may have for investigator initiative research, etc. And what are these key gaps and opportunities and how do we advance this framework. So with that as my introduction, I'm going to turn it over to the real driving person behind today's meeting, Xander, and he's going to go over some logistics. And again, I welcome you all to Bethesda to our NICHD conference room to this NHGRI meeting and to NIH. Thank you. Thanks everyone for joining us. I just want to say a brief words more specifically about the scientific context of the meeting. As you'll notice through the program one of sort of the major themes of today's and tomorrow's workshop is one of biological levels of organization. So, you know, thinking back into the history of quantitative genetics we sort of started with many sort of organismal level traits in terms of understanding genetic influences, but more recently we've had a lot of data being generated at lower levels of molecular cellular features. Somebody's phones up here. A lot of data be generated using new sequencing technologies of these molecular level assays, but we're also getting a lot more phenotypic data on these large cohorts that we also have genetic information for. So, so one of the idea is, can we use the same sort of theories and models that we developed early in quantitative genetics and apply this to lower sub organism traits, as well as super organismal traits and by super organismal. So, these are traits that are still measured for the individual but it's the individual in the context of the larger social ecological environment, and how these sort of quantitative genetic models apply across these levels and can we integrate across these levels so that's sort of one of the major motivating factors right we have all this data. Can we make sense of it all using the same sort of theoretical conceptual frameworks. So, a little bit crowdsourcing and planning for this meeting we asked all of you to provide what you thought were some outstanding or interesting questions and different areas of genetics and sort of one of the cross cutting themes that we notice and many of your responses. One is that you know we're collecting all these data but we're thinking about traits that are mostly static like cell states so types, even now we're getting into cell context, but how can we incorporate more dynamic processes into understanding genetic effects and dynamic processes that are being sort of across development and across evolutionary histories how can we incorporate these changing processes and understanding genetic effects. So something to think about as we're discussing everything and across the next two days. Another thing we noticed is that many times or observations our data can be consistent with many different types of models, but how can we distinguish which one of those models is correct. So we can use our approaches as we start refining our models, or we can generate more data to distinguish among competing models and that's something again to consider as we're discussing various topics. The balance between doing you update our theories and our models or do we just need a better data. You know, related to that we do have a couple of funding opportunities available. Now these are for our 21s and our ones that are mostly looking for new theories and methods development in the area of complex trait genetic architecture. So the links for these will be in the meeting booklet in which you can scan some of the qr codes and find that meeting booklet, and these have standard receipt dates and it receipts days the first one being in February and if you have any questions about these funding opportunities please let me know In other words again about levels of organization one thing sort of I've noticed and watching many discussions and human genetics and other areas as many people often times be cast each other, or many disagreements are really having to do with the fact that many many people are approaching the question from different levels of analysis. Right. So and they often see their particular level of analysis as privileged among the others. So a lot of disagreements happen this way. And for my medical researchers especially we have the tendency to look downward in terms of explanation. Right, so we're looking for mechanism. It's usually turtles all the way down, but it's rarely ever turtles all the way up. So thinking about how we come up with mechanisms and explanations, you know we often look at lower levels but oftentimes we can look at higher levels as well. And so one of the hopes of this meeting is trying to see if, you know, we can all come to an understanding in terms of a common framework to work from so that no matter what your entry point is no matter what vantage point you take a whatever level of biological organization, we're all basically standing on the same turtle. So something to think about as we go forward. So some quick housekeeping. If you ordered lunch, they'll be available at 1230 it'll be on your own you can eat wherever you like. If you didn't get a lunch there's a cafe just a few minutes walk from here where they have hot and cold sandwiches as well as a food bar. In terms of sort of zoom so everyone received a panelist link on zoom. So if you want you can log in while you're at the meeting. If you're in person please do not connect the audio disconnect the audio so we won't get any feedback issues. The advantage of you logging in while you're at the meeting is you can engage with questions in the Q&A. So if you see a question that particularly interests you, and you can respond to it online. If you're a virtual participant, just raise your hand and turn on your camera so that we can call on you during the sessions. And the webinar participants, you have access to the Q&A function. So you can enter all your questions will try to get to as many Q&As we get from webinar participants as we can if you don't get to them, hopefully we can address them online. And now I'll be happy to introduce our first speaker. Aravinda Chakradu giving the first keynote on sort of broad picture overview of genetic architecture from the Center for Human Genetics and Genomics at NYU. Otherwise you're in hearing distance. Okay. All right, good morning. I hope this doesn't presage what's going to happen over the next day and a half so things would go far more smoothly. So let me begin by first congratulation, congratulating the organizers for inviting me. This is only one of very few conferences, large conferences have attended since COVID and it's really great to see many old friends as well as me. Many other new ones whose works I've only read about. I was asked and tasked really to do a historical overview of genetic architecture or any kind of historical view is fraught with all kinds of problems. You may disagree with my omissions with my inclusions and my inferences, but we can sort this out in discussions throughout this meeting. I'm going to speak about really true twin ideas, and the twin ideas have to do with mapping, which has really been a central core of genetics, since its inception, mapping in many ways. And then the idea is that the purpose of that is of course to understand the causal factors and mechanism. And as I will outline this is again been from the very beginnings of genetics. This is just a gedonkin experiment I'm sure most of us have thought about it and sample sizes intrinsic to many of our discussions and, but so it's sampling and whom we study. But to me at least that first glance the answer to this no because we have become very good in finding associations, and the hard past and remains is to identify causality, and this has been far more difficult than we had imagined. So it's useful to begin with the definition of genetic architecture there's a great article by Trudy McKay in 2000 in the annual reviews of genetics you can go back and refer but this, I think for some of us or some of us who remember that far back is that there was this meeting that was hosted organized by Irene Extra and was a program officer at an IGMS here at the NIH. And this meeting was called the genetic architecture complex traits is way back way back even before the genome sequence, where we were trying to view what the future we looked like I think Nancy Cox may have been the only other member I think here, given our ages was at this meeting there's really two things to think about the first paragraph, you can read as a simple definition of all of the kind of parts and details we would like to know. But the second was an important observation of this meeting that the genetic architect is a moving target it's not the same for all populations is not the same for all time. And that's because of evolutionary forces and in genetics we always have to remember their proximal causes, as well as distal causes, and the distal causes almost all evolution. So it's useful to begin with Gregor Mendel, we don't know of course his primary work and primal work on figuring out the rules of inheritance. And people's early work in peas were actually contrary to observations of many similar breeding experiments that many other experiments, perhaps shouldn't be called experiments because they were not scientifically designed, but nevertheless. And the trees to one segregation ratio and independence was contrary to many of the other experiments that Mendel did with many other plants, these were not the only thing that he studied. He was just genius to understand the fundamental nature of that observations and recognize that there were exceptions. And I believe at least my interpretation is it really he happened to have strains that had some kind of genetic variation that he could isolate the effects of a major gene that in other words polygenic variation was low. And you can read more about it in his biography. Together with the rediscovery of mentalism in 2000, I'm sorry in 1900. By the way, I don't have a timer here so hopefully somebody will tell me that inheritance work had as many exceptions as there were rules. The first experiment of the first geneticist upon this out was your Hanson, who took a number of seeds and he measured their weight and it had the much more usual kind of normal distribution, but he could self fertilize plants from individual seeds and showed they had much more restricted distribution and thus began this debate as to what the nature of genetic variation under like phenotypes were. There were other studies by you who went back and look back who went and reanalyze Mendel's own data and white and purple crosses, talking about gradations that were not this yes no phenotype that Mendel interpreted, and then some very beautiful work in wheat in grain color in wheat that clearly showed that there was strong epistasis between a number of factors not call epistasis then that could lead to for a single pair of three is to one segregation but with more genes and some traits in this case, 15 is to one or 63 is to one which we now recognize and all undergraduates learn. And the definitive experiment was this and this study Altenberg and mothers. This was by the way Brian Charles would point this paper out to me I think now, close to 30 years ago that this was the first study in which a trade that was difficult. It was refractory to Mendelian analysis was analyzed into three distinct gene pairs. They could even show that the one on the X chromosome was not the sex determining factor. And mother in particular I think because it's been written historically was a person who recognized that this way of analysis using visible markers in their case markers that mentalize nevertheless using linked identifying factors could be generally used in humans. And the last occur and the last paragraph defines their features. This by the way is what all genetic variants that we use now are. So they give a way of the analysis using linkage, but of course this carries over to association as well as I will tell. This period was in fact fraught with a lot of debate, of course the protagonist we know appears and gotten on one side and William Bates and on the other. And apparently, many, in fact, right that Ronald Fisher's work in 1918 was probably the first one that showed how multifactorial inheritance or quantitative inheritance was compatible with Mendel's rules. And of course these kinds of thinking as I showed you earlier on earlier slides already existed in genetics, but there were still many other questions as to whether this variation observed was only genetic whether it was environmental the magnitude of the variation. And, however, Fisher statistical model wasn't a molecular genetic model that was has been very useful in many, many circumstances. But there are other ways of viewing mentalizing traits, and this is one from function. So this is from 1908 even earlier Archibald Garrett in London was studying a phenotype that really puzzled him called Al Capitanuria, and he first showed with the help of Bates and that this was autosomal recessive so this was the first disease gene really fought to be Mendelian. He did something else, his analysis of this alcohol Al Capitanuria urine, introduced to him the idea that human beings, she was a physician had inborn areas of metabolism. And he thought that diseases were the result of missing or false steps in the body's chemical pathways, leading to chemical individuality. This full idea of genetic individuality hadn't quite set, but in his mind, genetic individuality translated into this chemical individuality which was the basis of disease, the function was true function by chemical function was figured out a few decades later. I'm going to take a strong leap from there there's a lot of history in between, but at least me as a human geneticist. And the next set of events that are quite important are the work of these three individuals, many of you know it's Jim Neil, Victor Mikusek and Arnold Matalski in the late 1950s and establishing three departments of human genetics and medical genetics in Ann Arbor and Baltimore and in Seattle, respectively. Beyond making the study of genetics in human disease, important by their own examples and work clinically important. They did two things. They medicalized human genetics with all of the consequences that we see today, including our funding. Secondly, he made the point that geneticists could now take their place in medicine. The geneticists had their own organ, just like the nephrologists have the kidney and of course the organ is the genome, which we all believe. But the other thing that these three geneticists did was to make an intense search for pathophysiology as physicians they want to know the causes of disease not nearly inheritance. And of course a lot of it was on establishing modes of inheritance. You would think that speaking about recessive or dominant inheritance would be simple but human beings have small families their acid in men biases and classical methods of segregation analysis starting from William Weinberg of Hardy Weinberg fame could be used to try and sort out what the inheritance was. And that's not the only thing that they did. They established cytogenetics as a discipline within medical genetics. They're buying in many of the biochemical assays and metabolites and serum proteins and enzymes that led to defects found not only in patient but in patient fibroblast cultures. And of course some systems left to the treatment of many in bond errors treatments that are used still today but often in many other ways and they understood that the mapping part was intrinsic to have a root into unbiased pathophysiology. So the grand example was that our first Joe Goldstein and then Mike Brown in figuring out the role of the LDL receptor in heart attacks and more specifically in familial hypercholesterolemia. All of you know the story I'm not going to go into the detail, but they started on the top left with epidemiological studies of survivors of heart attacks in Boeing in Seattle. And that actually was a clinical fellow with Arnold Matalski, and that led to eventually studies of families, exceptional families that segregated a disorder, finding rare homozygous patients that led to isolation of the defect in fibroblast cultures that later came together with the evidence to show that they would be in fact an effective treatment of this elevation of cholesterol. So mechanism was key to understanding that. It's apparent from Joe Goldstein's work and those of Helmut Schraut and many others at that time in Seattle that the existing statistical models were wanting. They could find major genes, single genes if you will, but that they required the effect of other genetic backgrounds. Essentially that of polygens and that led to the emergence of a whole host of statistical models, then called the mixed model that had the effects of major genes as well as a polygenic background provisions for all other kinds of genetic phenomena that were known to be important. This is what led to the mapping of major gene effects with the recognition that there were other genetic factors. And a lot of studies by Newton Morton and his colleagues, Robert Elston and his colleague and your God made the time in the early 1970s important and interesting, because that's the first time we could do linkage studies in large scale using at that point in time serum protein and enzyme markers. Again, a leap. And the leap was the finding of human non coding variation. Till that time, almost every variant that we have had were assayable in serum or plasma by electrophoresis many of you know the fly literature that did very similar kinds of studies and in blood groups. There was no general way of studying all proteins or including the rest of the DNA, we knew that the genome was large. And of course there was debate on the number of genes. So why the blue cans work with Andre doji was the first with the beta globin gene was the first human gene clown. And recognizing that there were polymorphisms in DNA, not a surprise, but that these polymorphisms could be used to crack the effects of genetic mutations well known mutations like the signal mutation. In 2008 there was a plastic paper to meet as the first recognition of a DNA polymorphism, and it did some other thing that many of you will recognize that it made an association a population level association between a specific genetic marker and a disease. And that led to our thinking that we could now invert it that is we could use arbitrary genetic markers we could clon and leads to the identification of the disease gene. And this is really largely the work and a collaborative work with that she Choi who was really the hero in the position of cloning and in the identification of CFTR in this major mutation delta 508 work that he did with Francis Collins. And I'm showing you is a very crude attempt at making a map of RFLPs. This was incredibly difficult. The labs he did. And essentially we use the principle of linkage to sacroiliabrium in this region of several mega basis to try and find the mutation and on the back is our rendition of the ancestral chromosome that existed in patients, pointing to a disease location. The history of this kind of association tests in human genetics is not recent is gone on for a very long time. Ever since studies in the 1950 showing the relationship, these were direct association studies taking functions, variants like the AB o locus and showing its relationship to peptic ulcers, the statistical findings around that is actually very interesting. There are false positive findings because of structure and then statistical methods to remove them. And there were many other studies of course, of which the most successful were HLA associations in autoimmune disease, by me and mostly really by Walter Bartmer and his colleagues. Polymorphisms initially were used as, as functional probes, but pretty soon, it was obvious that they were indirect probes I spoke about the work on beta globin. But this was very quickly used to identify haptotypes in different patients that differed and likely carry different mutations work that I think is as an instance to work in it. And it is that work that led to broadly again suggesting that linkage to separately mapping was important. And this became feasible once the scale of LB for the human genome was figured out that there's a number of individuals smart daily. My group that others did studies and of course, David Alshaw and others in 2000 did this on a much larger scale across the genome. It's not only restricted to qualitative traits, there were studies on quantitative traits. Harry Harris's work in 1964 clearly shows the relationship between acid phosphatase in its structural variants, meaning its protein variants and its activity. But of course the heyday of it started with studies in multiple organisms, including humans in the early 2000s of looking at gene expression as a peanut. So, I think there was a lot of precedence to think that genome wide association studies would be a natural progression. And there's a review, very nice review of genetic approaches to mapping by Eric Lander and Nick Shork in 1994. But I find genome wide studies in particular largely is very clearly stated by Neil Rich and Kathy Mary Kangus in 1996. And you can read the quote that makes it quite clear that large scale testing was necessary. So what were the impediments and there were two major impediments. One is population geneticists knew that human population structure could be an impediment. And this of course was the reason why many past studies hadn't succeeded. And the second was the ability to genotype, what we knew were at least hundreds of thousands of markets, and the development initially by affymetrics, and later by Lumina of making a single tube assay where you could create a hundred thousand, or even a million markers was really the bottleneck that overcame was really the bottleneck we had to overcome to do you know why the association studies that are so widespread today. I do have one of these original cartridges, and it has 200 snips, and I have three exclamation points we could do 201 assay so a million was really great. And this is what this is taught us and I think we're going to talk about this throughout the meeting. It really did solidify that fishers infinitesimal model was correct, at least for multifactorial inheritance. But newly it did emphasize the importance of the non coding genome, and the ubiquity of small genetic effects in brain biology, I don't know that we fully understood the magnitude and the meaning of it. And that's really quite important, but the other features to it. I think Jonathan Pritchard is quite eloquently described a nonogenic model that we have to face and answer the attendant questions with. And, but I also recognize that there's a lot of dispute over that model that we need to bring the data to it, but it's turned out that implicating specific variance genes and pathways on the mapping side has become much better but on the functional side. This has been success has been far more halting and understanding disease mechanisms has been even more difficult, I think we need to answer. Why do these genes cause this phenotype of this disease. To me an important question is how do cells and tissues count the mutational low. How does it sell in the diabetic kidney now that it has 100 risk, causing variants rather than three. And mechanisms for that and mechanisms how phenotypic differences lead to disease are important to unravel. In my view, we do study what you would call truly complex traits, such as cardiovascular disease of various sorts, but almost all of the complex genetics that I've learned has been through this model example of Hirschman's disease. I'm not going to tell you much that the sort of intestinal motility where kids are born without that got to being innovative. And there are many features of this that I've described here. But the question is what is it that we learned, we learned about many many genes. We clearly now know of at least 38 genes that have multiple statistically significant variants pathogenic variants that lead to disease. We have at least three genes that have multiple enhancers with variants that are statistically associated in the population. We can explain about 63% of the attributable risk mostly as you can imagine from the common variants but that's quite a bit. But the important lessons is that almost all of the genes that we know not all they are functionally united into a single gene regulatory network. And we know that that network controls the expression of 2P genes, RET and ED and RV, they're two different receptors, kinds of receptors, and they do so specifically in enteric neural press cells. So we know that RET and ED and RV are rate limiting because you could have mutations within these genes, outside these genes and the enhancers or in trans factors or other interacting molecules. And they always change the expression of RET and ED and RV and I think finding such rate limiting steps in a giant network is important. We know now from mouse models recreating these or modeling this that Hirschsprings arises from a disrupted two kinds of both autonomous and non cell autonomous interactions between these enteric neural press cell precursors and the gut meson kind. And because we know the mechanism we know the timer developing the kind of cell and the network. We can predict many other genes in which we can find pathogenic mutations, not in all of them because that depends on the mutation structure of the gene. But who we do in fact, if any of your interested that's a very strong association been Hirschspringer trisomy 21. We are now being able to explain how that comes about. So I'm going to end by saying, we become very good in defining genetic individuality, and in Garrett's work words, we got to define the chemical individuality if we are to understand disease much better. So there are many kinds of cellular mechanisms that can be affected by multigenic components. I think Jonathan's omnigenic model is the first line, but there are many other things that happen. The words that I've used in yellow to describe this was suggested by Nancy not in any one particular conversation is to say what are the known unknowns but the known unknowns of the regulatory networks by the way this is regulatory networks and Eric Davidson sense, which I can explain later, but ultimately these have to alter some cellular phenotype. It could be tissue composition, it could be proliferation as it is in Hirschsprings or apoptosis or cell metabolism, and we have to be able to understand that to know the relationship between variants and genes, and how they change a phenotype. One of the things to remember is increasingly it's obvious from both the genetics and other literature that protein complex dosage is a rate limiting step of many cell behaviors, particularly because of proteostasis. When synthesis folding and degradation of proteins and their rates, these can be interrupted in a generic way by many kinds of specific gene action and cellular stress often results. So how the disease comes about may invoke nothing specific to those genes, but far more generic pathways. There are many unknowns. I think we, in many ways still live in a world that Hall Lane first described in this famous debate with Ernst Meyer in Cold Spring Harbor, called a defense of beanbag genetics. I think we still live in his world as, which is that we look at the genome one gene at a time. But the genome is hardly that we know this from many studies of evolution. And so we do need to consider a much more systems response and that's much more easily said than done. So I want to thank you for your attention and giving me this opportunity I can answer questions throughout the meeting. Or now if there's time but I want to particularly thank Nancy and Molly for helping me think through some of these issues. Thanks. So, I think, I think we need both. I think it's obvious we need both. When we know nothing about the specific genes or the biology that of course reverse is the only approach that we have and genetic mapping is one of a wonderful sort of really reverse approach. I'm often awed by the power that it has it reveals other things we don't understand but it does reveal. I think increasingly as we understand more about genes pathways that I think using a forward approach becomes far more compelling. And I think I see in things of it in the work of many and so I don't have a choice in one or the other. So I think the way in which and that so the question was, if you couldn't hear Michael Goddard is that epistasis appears to be far more important but when we detect we don't see much of it. Number one I think epistasis being important depends on the scale at which we work. There is not one scale for the phenotype and epistasis maybe far more employees like penetrance, it may be far more important on one scale than another. I think the definition I ended by saying we need a systems approach the systems approach essentially concedes that most things interact with one another. Now whether we statistically detected or not maybe because of the design of the experiment power and many other things. The two receptors retin ED and RB they're two very different so receptor kerosene kinase and a serpentine receptor but they're very strongly epistatic. They're very, very strongly epistatic. And we've been able to measure that epistasis in two very special circumstances in mice when you knock out the genes, or even with hyper warms, or in a particular isolate of old auto menonites because there they have a rare variant. So there are lots of enhances lots of common variants in red, almost none in ED and RB. That's what evolution gave us in this case, but there's a rare mutation in ED and RB that's very common among menonites. So in that case we can show epistasis quite effectively. Yeah. You know, I sort of, I guess it's sort of like my food habits. I, not only at Indian food but almost anything that's that's not anything that's edible but almost anything that is really wonderful so I think all model organisms have their value. And I think through Hirschman's disease started because I, anyway, I sat on a student's defense and she was describing all these categories and I was surprised that that behavior was quite distinct from models that were already known in the literature. There were mouse models, rat model, and there's even a horse model, a white fold syndrome. So, and they were almost all Mendelian, because they were all associated with pigmentary anomalies. And humans had bed down because of those pigmentary anomalies. I think model organisms have different lessons to teach us, but they're very important in this case it told us the cell type it told us it had to be pigment and have to be a neuro press origin because it's in the gut. So, I'm from, I'm forgetting data from any place that you can. I think the exact model organism I don't think we should stick to only flies and mice or CL against it depends on the question that you're answering all kinds of new model systems. So, so, so, so Bruce Walsh, I actually is also with Nancy at that original meeting it's amazing how wrong we were collectively, but I want to say that an important model organism that people over luck, which I think has the best GWAS outside of humans is maze. The nested association mapping designs and maze are incredibly powerful because they've got unique genotypes you can replicate. So you can look at really tremendous power with G by E and get a precise measurement in genotype. And then that conclusion from that are very much the same with humans are you know rare effects tend to have been large but rare. And also, most of their issues is being non coding regions. So I think we need to go even beyond kind of animal model systems to give us two questions about architecture. Yeah, thank you Bruce. I'm, I'm, I think, I'm seeing you after decades so so pardon me if I didn't recognize you in the morning that I have age as an excuse but but I completely agree with you. I think there are two kinds of inferences we need to make. I think we need to understand the nature of a genetic system that can lead to phenotypic variation and we can learn that. And if we should learn that from a whole variety of systems. But I think we do have an obligation and really a great scientific interest in doing this for specific human phenotypes and disease. If not for anything else it's the most widely is, you know, is is is a organism that's completely outbred that we can study in its sort of natural habitat so to speak. So when we do it for a specific instance, we need to be able to invoke all of those kinds of mechanisms to be able to do it but I agree with you. Okay, thank you. So, we'll start our first session, which will be shared by Shamil Sunia from Harvard. Let's continue it's actually humbling to stand here after a vintage of art and after being reminded about many, many years of genetic research that brought us here. So I'd like to welcome you to the first session, where we'll have its its title cells tissues and organs. We have two amazing speakers to the local island from KTA Shrull Institute of Technology, NASA said I'm strong from Fred H and Francesca Luca from Wayne State. These talks will be followed by discussion chair by Barbara stranger and we have a set of panelists and we suggest that this are the questions that would feed into the discussion, in addition to the talks and as you see the first revolves around this idea of context for individual cell cell trajectory developmental stage cell type cell state, because there is a lot of realization that genetic effect manifests itself in a given cell in a given time and state. And the second question is about transforming this cell effects into larger systematic units, not the whole organism yet but tissues organs and so forth. And the third question. We reflect the belief that the way we're thinking about this moving target of aletic architecture is defined by evolutionary biology, and I decided that I'm not going to quote Dubrowski because everybody doesn't we should avoid this at least once. So, going forward. I'd like to state very simple questions which I believe are three primary questions for the field, and in my opinion, are the major questions in life sciences today. First, we have this non coding variation that are being described. And we have no idea, in my opinion, at least about the proximal functions what this variance do the variance we call regulatory. So what are the mechanism underlying the function. Do we have the right molecular and the phenotype we can collect and analyze is a bulging expression single cell expression, some change in response to stimulus maybe rate of transcription maybe, maybe, we know that rate of transcription should be mediator for many of the answers slicing but what's happening at the molecular level. So this is, this is a big question, and again context dependency right so as, as we discussed at the first prompt. All this effects manifest themselves in a given cell not in the vacuum. The second question is how the collective action of numerous variants translate into phenotypic variation right. So we have this observation that independent action of multiple small effect variance, influence the phenotype and this is a very satisfactory model because he opened a biology textbook, and biology textbook is filled with specific pathways and networks and mechanisms. It's not independent action of tens of thousands of variants. So the question is, what are the slow dimensional biological units. What is the hierarchy are this pathways networks. Are there some end of phenotypes intermediate phenotypes is a core versus periphery is this the hierarchy of simple phenotypes feeding into complex phenotypes or is there some sort of and again what's the role of context in this in this story. In the last question is the. Why do we see this from evolutionary biologist standpoint right because we know that natural selection, usually favor specific trade value. We know that monogenic diseases like hypercholesterolemia or diabetes are rare and we attribute this to to action of national selection. However, polygenic forms are quite common. And even phenotypes that are known epidemiologically to be associated with a fitness loss are relatively common in the population. And this brings us to again the question of evolutionary biology and how this genetic architecture exists in population so I with with this, I would like to invite to link. I don't know if I haven't seen her in person so I wonder whether she's. She's virtual. It is my pleasure to invite virtual to lay. Is it actually okay if I share my own slides, I made a few last minute edits. All right, I think this works. So it's a it's a huge pleasure to join you, even if virtually I'm very sad that I'm not able to be there in in person but but it's a pleasure to kind of follow these Shamil's thoughts and our windows amazing keynote to to discuss the functional functional architecture of genetic variation and complex trace and think and think about how how we characterize this regulatory effects of genetic variance. And then how do we move forward from that to think about molecular phenotypes of the cell and beyond. So we can of course think about. I'm sorry, can I share my own slides. Do you see them. You stop by sharing now. Okay, is this okay. All right, so we can think about functional effects of genetic variants and in particular the focus here being on regulatory variants across multiple different phenotype levels as the previous speakers already kind of mentioned so. So we can think about the sort of molecular effects at the levels of epigenome, the kind of chromatin level effects transcriptome proteome effects cells tissue function, physiological phenotypes and disease. And there is of course, multiple like a huge number of functional variants at the lower phenotypic levels which don't often have any downstream effects and this affects the selective constraint that to that these variants are under. I want to point out and I will return to this later. A couple of slides from now is that we have relatively rich data of genetic effects on the kind of lower molecular levels. We also have very rich data on from she was on disease and physiological variant associations, but we have relative sparsity in the middle there is this kind of like value of death of trying to understand variant effects on cellular and tissue level functions. So let's think about the, the cease regulatory effects of, of genetic variants which dominate the genetic architecture of complex traits. This is also a question that is actually a whole set of questions where when we have identified our associated loci. First, there's the question of what are actually the causal functional variants versus their LD proxies. A big question of what are the proximal epigenomic effects of variants, for example, transcriptor factor binding and such effects enhancer activity so forth. And then probably the most important single question of what are the target genes in the in the locus for this non coding regulatory loci. For all of these there's the question of what is the relevant cell type or cell state where causal disease mediating effects are actually taking place. There is a lot of questions. Luckily there is multiple approaches as well to tackle the variant to function challenge, and I tend to divide them to three groups. There is the, the oldest, a large scale approach of characterizing regulatory variation which is molecular QTL or EQTL mapping, where you associate genetic variation in a population sample to molecular readouts from those from those individuals. And then there is somewhat more recently a large scale approach using different types of experimental engineered perturbations of the genome, applied to in vitro cellular or other model systems and then then seeing how that affects molecular phenotypes. So the third approach which I won't talk about much myself is something that I call kind of like the encode approach where you don't actually assess genetic perturbations of any type directly but you have very rich molecular profiles from different cell types, and then you can use that data to infer putative variant effects. So if we think about the first two a little bit, they could be a little bit deeper into those first to the molecular QTL landscape. So a place where during the past 10, 15 years we've created amazingly rich data sets of especially expression QTLs and splicing QTLs, for example in projects like CheechEx, where we now identified tens of thousands of common regulatory variants affecting basically every every single gene in the human genome across multiple different different tissues. This is very rich data that has given us a lot of insights into the regulatory architecture of genetic variation. But more recently there's been quite a bit of discussion in the community that that despite the richness of these of these catalogs, they have some limitations in capturing the GWAS effects that that many of us are actually interested in and that has been doing a lot of these, these analysis. So just to put a couple of numbers there, QTLs have been estimated to account for 11% of GWAS heritability, and even when matching autoimmune GWAS traits with relevant immune cell types, QTLs were confident to co-localize with just 25% of GWAS loci. So these numbers are not nothing, but it's also not exactly sort of solving the entire question of how should we interpret regulatory effects of non-coding GWAS loci. And there is currently a lot of debate in the community whether whether kind of like what are the what are the conclusions that what you draw from this. Is it a question where we should actually kind of just pursue other approaches, or whether we should double down on the EQTL approach. And there are indeed still multiple very, very important gaps in the current EQTL or molecular QTL data that we have. One big caveat is in the lack of cell type, cell state specificity of the associations that we discover, primarily thus far from bulk samples now with single cells arising as a very important novel data source. There is important cell types and cell states, especially thinking about developmental trajectories that bulk TCEQTLs don't capture very well. Whether the steady state transcriptome is always the right readout is a question. And probably the most sort of principle fundamental kind of concern is that molecular QTLs discovered in sort of typical sample sizes of some hundreds of individuals are depleted in constrained disease relevant genes because of power issues basically the architecture of the genetic variants that we find in EQTL studies versus GWAS studies ends up being different. However, there is some very interesting reason work that indicates that when we increase the power of EQTL studies. In this example, in having a few thousand samples with adipose data allows us to discover additional independent EQTLs increasingly in the enhancers where GWAS signals are as well. And indications that this improves sort of informative co localization results for GWAS by quite a bit. I think that when we think back to the history of human genetics and then the early history of GWAS, which didn't actually work that well in the beginning when sample of the sizes were limited power was limited. And now when people double down on that and actually got sufficient power really amazing insights and applications have emerged that type of experiments have haven't really been done for it for molecular QTLs yet. We have very well powered QTL data with cell state resolution. And I think that this is something that study design that that we may want to pursue as a field. But there is more than EQTLs in the field. Today, many labs, including my own lab have been pursuing CRISPR tools to understand the effects of GWAS loci. And for example in this in this paper from last spring with Neville St. John and John Morris and others. You used CRISPR-I to silence GWAS enhancers and then use single cell RNA sequencing as a readout to detect the effects of these GWAS loci in cis and trans and finding quite a high yield of loci where we detected at least one significant gene in cis. And I think that this is an this approach and other similar approaches are very, very fruitful and then provide very nice data that is that is very rich and seems to be very informative of what is going on in these loci. However, to be able to really apply these these approaches with success in a way that is actually informative, particularly for GWAS there are still many outstanding questions that we that we should check out as a field. So one of the interesting things that we sort of a little bit was over in the in the paper is that there is this the loci that or the effects that we detect with with our stingsy CRISPR approach does have an overlap with these EQTLs but the overlap is relatively modest. And we're currently digging into kind of why that it is why that is the case and in what kind of effects are best captured by let's say EQTLs versus CRISPR. We see some signs that some of the lack of overlap is due to cell context specificity. And that is of course a little bit of a bigger question for the CRISPR tools that can be often applied at scale in a relatively limited set of cell types that we can actually actually curl in vitro. Then there's also the question of what is the what is the best combination of enhancer silencing and other more crude genome function perturbations versus actual nucleotide level editing that often have very subtle effect sizes that are difficult to do at at scale. So there is there is many questions to answer there but but also a lot of a lot of excitement and I believe a lot of potential for discovery. And indeed when we think about this sort of natural versus engineered variation to complementary approaches to go after functional effects of genetic variance. We have both many benefits and several caveats that are interesting quite orthogonal. And I think that sort of thinking about how to use these these tools jointly, maybe a way to sort of cancel out some of the caveats and then be able to distill to the shared biology in the best possible way. When it comes to the C's regulatory effects, some of the lessons learned just to highlight. I do think that we need a diverse toolkit and a user manual for that toolkit to know what which tools give which types of insights and are best suited for the particular question that you have. And while there is a lot of experimental and computational work to do we're not done with the C's regulatory challenge at all. We have a relatively good understanding of what are the right questions to ask. And what are some of the approaches and methods how we can actually answer those questions and I think overall the challenge seems relatively tractable. And I think the bigger challenge actually lies in kind of in the question of then what. So, even with the current approaches and the work that has been done thus far we likely know the cause of teams for like some thousands of she was lost. But for many of these, these low side for these genes we have actually actually no idea what they do or very little idea. And this kind of comes down to the sort of value of death that I've been pointed below, where we have relatively limited insights on on what do genes and variants do at the level of cells and at the level of tissues. And then here we can think about this in terms of the sort of fundamental approaches that can be used and that people have used to try to address this question. We can use different association approaches to molecular phenotypes in the sort of trans EQTL type of setting that has proven to be quite challenging but but potentially still fruitful especially with single cell, very large single cell data. We can also associate genetic variation to in vitro cellular phenotypes from large numbers of human donors either individually or in cell villages. And one can use sort of cellular phenotypes that are inferred from molecular data such as pathway activity or transcriptor factor activity and such phenotypes. And then of course there's perturbation approaches CRISPR screens working model organisms etc. So we have some questions here regarding the best approaches and basic biology. And one of the basic biological questions that that I've been thinking about quite a bit is to which extent do we need to characterize variant function versus gene function. And actually I think that there is a way to kind of think about how do we sort of put these two things together and how does variant function relate to gene function. So, when we think about how do we do this work so so we've identified some variants that associate to diseases or other sort of higher level trades, and then we've identified how those variants affect the target gene and insist. And then so what mediates that that the relationship of those effects insist to the downstream effects in trans shoot arm in most cases be some sort of a function between functional gene dosage and the downstream function of the of the gene. And in this way, the sort of gene dosage can act as a kind of point of convergence of sees regulatory effects, and then further to downstream function. And my lab has been kind of like trying to bark at this tree for 10 years now. It's tough. More recently for the past couple of years we've been, we've been using CRISPR tools to gradually modulate gene expression up and down and linking that to settle our phenotypes that will be a story for for another day. I'll also mention that of course these these relationships are multi dimensional and context dependent. But I think that there is interesting things to kind of discover here, and the, and the sort of like rationale for for thinking about gene dosage as this kind of like a joint framework is not just sort of like my invention. But it's actually supported by what we know about genetic architecture, where rare Mendelian disease and other types of rare diseases are are quite strongly often dominated by coding loss of function variants that lead to loss of functional gene dosage. And the joint complex disease of course being driven by regulatory variation. And we know that there is, there is very strong overlap between the two, even at the level of individual genes. And that may suggest that in the gene dosage can can function as this kind of like a joint framework to bring these things to get around the same biological rationale. We have applications from precision medicine to drug development and also the basic molecular biology, much of what we know about gene function these days is based on very drastic gene knockouts, and whether that is is actually sort of interpretable in terms of the much more subtle perturbations that we see nature is I think an interesting question to address. This is my last slide. So, so if we think about so if indeed gene dosage ends up being this kind of like a point of convergence for cis regulatory effects and then kind of like how these, these cis regulation and different variants they're translating to gene function and moving forward. Are there then analogous points of convergence at the level of for example cellular programs as every as an outcome of perturbation of multiple different genes tissue phenotypes as an outcome of perturbations of different cellular programs across multiple different cell types, and then disease as a result of more more system level perturbation. And I think there is a tremendous amount of work to be done to just kind of like map those components what are these these different different things that might be somehow impacting higher level functional layers, and so a lot of work to be done in thinking about the architecture of these in terms of non linearities interactions do these convergence point even exist. And while some of these specifics are likely to be really specific to to to the systems or diseases that you're focused on. I would hope that there are some architecture and approaches that actually generalize and help us to kind of address this this this question more broadly. And there is really a lot of work to do here. And I think that to be successful in these us as geneticists really need to join forces with with people in molecular systems biology and other similar fields that are approaching this question from from a slightly different. I'll stop there and thanks. Thanks for listening. I can't hear anyone in the room. So now I can thank you with the microphone on. So we have a few minutes for questions. And I do have mine but first let me see if there are questions in the audience. Yes, please. I'm going to need you to repeat the question I can't we can't really hear the microphone. There is the microphone. Sorry, Michelle George. And it's to follow up on the Jonathan's and most of our paper on the sort of the mismatch between the eqtl and the disease law side, just sort of to make to sort of discuss together and make us do envisage that as being the fact that what we have missed so far is due to genes for which at the present time, no eqtl at all have been detected. You know, maybe we could expression level is very lower they don't like or they sort of resist perturbation in most cell types, or can it be specific effects. And genes that are in addition to that subject to eqtl that have been detected already. Yeah, so so the way I think about it is that I mean we do have quite comprehensive eqtl catalogs but it's just that those eqtl sorry they very easily pick the sort of farm, very strong often promoter driven driven that are sort of like the lowest hanging fruit in terms of the full spectrum of all regulatory variants that the gene may carry. And those variants easily then sort of mask other independent regulatory law side that may exist further away for example in enhancers, and then also that the power of many eqtl studies is not really sufficient to cover all of these effects and then especially if they're highly context specific etc. It's even trickier so basically that the study design of eqtl studies is not very well suited to find those she was relevant effects. And at this point I love to use my power as session chair and follow up with the question so it and I think you know what I'm going to ask right so because a lot of people like there are multiple papers including how can manage most of this paper and people think that oh this is a power issue. And to me, the major question is not where are all this remaining eqtls the major issue is that we do have large effect eqtls well power to find them at relevant genes, we know the genes involved in the amount of message causes the phenotype. This do not they are mostly in promoters, this do not change any phenotypic effect right so we call them red herring eqtls this is not currently in my life call them red herring eqtls. So why do this exist right we can we can find we can increase power we can find more eqtls but if most of those are red herring, where they come from, why do they exist how we get rid of them. Yeah, no good question I mean. I don't really have good data to support this, but but I think that I mean it must come down to basically sort of army cell type specificity of downstream gene function that even when you have a disease relevant gene you perturb that in a in a different context and it doesn't actually have a phenotypic effect. And we have actually seen this in for example in there is the packs eight or some terrible Mendelian disease gene that just causes like a horrible horrible phenotype. And I think it was also like a liver liver specific like a liver disease that that this gene was was driving. And then when we look at cheetahs eqtls data we see that all there's this ginormously big eqtl in affecting this gene is like a huge effect size and we're like this eqtls should not exist. And actually then when we looked across the different tissues, it did not exist it was not active in liver, because in liver you cannot perturb that gene without getting a very very bad consequences in other tissues, you could. So probably this this eqtl variant was sitting in some context specific enhancer that that just didn't affect the relevant tissue where this gene had a functional role and that's what what allowed it to exist. Okay, context is the answer so Magnus, the last question here, please. Yeah, so. Okay, Magnus Norberg. So isn't the simplest explanation to the question you asked me that they're just new to all right that we see them as because you're precisely because they don't matter. And when we have done this in arabidopsis it's clear that the eqtl are heavily biased towards things that peripheral in all kinds of interaction networks protein protein interaction regulatory networks and so on they're not conserved between species. You know everything fits that it's just it's just noise right that's where you see them. I go to liver I look at LDL receptor there is a mass of a QTL in liver for LDL receptor. I know there is a non coding juice heat and LDL receptor this QTL does nothing to cholesterol level. That's that's my observation. I'm absolutely puzzled. I think I'm with Julie that there is a supple context I believe it's probably supple them than a tissue type. And I'm very interested in talking to Michelle George and you about what happens another organism because I don't know if it's human specific story or not. I'm sorry I we have to move on. And it's, it's my great pleasure to introduce. Nasus in it I'm strong. They hail from Fred, Fred Hodge Institute in Seattle and this will continue our discussion of context I guess and the facts. Thanks so much for being here and thank you for inviting me this is my first time at the NIH so I'm very excited. I'm going to be talking today about connecting environmental exposures to molecular traits using genetics, and there we go. Okay. I'm going to give an example today about cancer. And this is a particularly uterine or endometrial cancer. And I think cancer is an example of a complex trait that we consider a lot. And specifically we expect there to be some contribution of genetic variants these traits, in this case germline genetic variants. And in addition to that also environmental exposures, we think about the ways in which genetics and environment might be complementary to one another they might interact with one another there's a lot of different relationships that we could expect. And I think really the question that I'm excited about that many of you are excited about is trying to understand better the ways in which both genetic and environmental risk is conferred. So what is it about genetic and environmental factors that results in in disease burden. So if you think about this in the context of cancer and look historically, there are a lot of exposures that people discovered which are resulting in associations with cancer. In the midst of the last 100 years we've identified a number of molecular biomarkers that contributes to us. So I'm thinking about biomarkers in this context there, they have many different structures but things that that have an effect from the environment, which then results in increased burden or decreased burden of disease. So examples of these include things like blood pressure, where blood pressure has a very small but potentially significant association with uterine cancer. And we can understand this in the context of disease liability where there's some threshold, and above that threshold people are considered to have their disease. And so as a result of that we can look at individuals based on this blood pressure risk and understand the ways in which that might be associated with other traits, or environmental exposures or genetics. In this case we can understand genetics by looking at genetic variants, developing something like a polygenic risk score for blood pressure, or looking at me dealing with randomization and understand better the ways in which individual genetic variants contribute to blood pressure and then further to cancer. And of course this is true with many different biomarkers right we don't understand this in the context just of blood pressure but there's others as well including things like progesterone and inflammation which contributes to differential risk of this disease and many other diseases. I think this model is great it helps us understand ways in which genetic variation environmental variation and to interact through these intermediate phenotypes that help us give it hopefully give us a better understanding of the mechanism underlying the disease. But I think it's also somewhat limited in some ways. And I'm going to give you an example of the ways in which this model might be limited through a very particular type of exposure. So I'm going to first this environment that I'm looking at as a pathogen Lyme disease, and the ways in which Lyme disease might contribute to risk indirectly and directly to in the context of this disease. So Lyme disease is caused by ticks carrying this bacteria Borelli upper go free. And that and Lyme disease is something that's very common. And in particular it's rising in frequency so if we look at this is a map from the USDA of in 1996 where cases of Lyme disease were reported in the northeastern United States and compare that to 2018. We see a much larger range and also many more individuals being affected. This is a trend that's been continuing as a result of a variety of different things I'm happy to talk about in questions. This increase in risk for Lyme disease is driven by differences in exposure to ticks potentially. And we want to understand better whether or not we could use genetics and other tools to look at the mechanism underlying the development of Lyme disease and use that for developing interventions to help prevent the disease. So is there some sort of genetic predisposition to find these things that we. So Hannah only line Satu Strauss have ran a genome my association study and discovered in a strong association on chromosome 11 with Lyme disease. So individuals who have Lyme disease had a much higher burden of variants in this particular gene SCG B12. And this is a weird example because it's actually a common trait, a common variant, and SCG B12 that is a coding variant and this coding variant affects the structure of the protein by changing the alpha helix. And so as a result of that that helps with the confirmation of this like small secreted protein. And by as a result of that the individuals have higher rates of Lyme disease. And this is really interesting because we actually don't really know what secreted global events do the entire family there's 11 of them in humans and it's kind of very poorly understood about their mechanisms or how they act or what their purpose is all we know is that they're secreted. Some of them bind to some hormones or maybe also to PCB is probably about females, and there's like kind of not really good understanding what's going on. But we found this association with Lyme disease. So we can do is we can take this protein and actually see whether or not has an effect on Borrelia, but for go free itself. It's exactly what we did. We can start by looking at what mechanism that care about there. And, SCG B12 itself when we look in GTX is very highly expressed in the skin, which suggests that maybe it's acting directly through the exposure to Lyme disease. Maybe this not just in skin but specifically in single cell RNA seed data set that was published recently, which shows that SCG B12 is actually a marker gene of sweat gland cells, both the clear cells and the dark cells. And so as a result of that we kind of thought that maybe it be acting as like a bacterial defense barrier for the skin, but we didn't really know exactly and we wanted to try that out. We've been elaborated with, with a number of individuals here in Mickey Toss lab page Sarah and Grace, who helped run these experiments where we looked at Borrelia for go free growth in vitro. So we can take a control sample here, which is just growing Borrelia in the normal conditions it's this anaerobic condition of the very specific media but we look at the expression of, of GFP over time and we can see that it grows pretty nicely. So if we give Borrelia an exposure of the wild type SCG B12 variant, it actually grows substantially slower and almost gets completely wiped out in the first day of this experiment. And then, in addition to that if we compare that to the variant version of this that's present as a GWAS Association, then there's definitely a little bit of a growth loss but it actually grows pretty well in the presence of this, this modified SCG B12. And so individuals with that allele likely can't clear at Borrelia as quickly and we think that probably reduces the efficacy of the immune response or takes longer to have an immune response. And as a result of that, these individuals are more likely to be affected by the disease and have a more severe disease. So we thought this was really cool because basically what we're doing is we're taking this gene environment association humans, we're actually making it more about a gene environment association in Borrelia right Borrelia from the perspective of them, they're like living fine and then all of a sudden they get exposed to this weird human protein and they are not able to survive anymore. So a complex interaction between these host pathogen systems allows us to discover ways in which the environment and exposure for different organisms is relevant in different ways. And I think this is super cool. And one thing that we want to do is trying to understand better why or how this might be going on. So if we go back to the, the measurement before and look at the expression data. We want to understand the ways in which SCG B12 might be acting in other tissues because we know that chronic Lyme disease is impacting multiple tissues and there may be something going on with that. We also had been talking to community members who have been affected by Lyme disease, and there was a large burden of individuals who have had high rates of miscarriage, and many other diseases of the female reproductive tract. And we noticed that, in addition to being highly expressed in skin, SCG B12 is also very highly expressed in the uterus and cervix. We thought this might be related in mechanism because if you look at the human protein atlas for expression of SCG B12 in the uterus, we actually see that it's primarily expressed in the glandular cells which are secretary within that organ as well. And so that supports the idea that maybe SCG B12 the secretive protein is acting in similar ways between the tissues, providing some sort of protection against Borrelia and potentially other spirit keep bacteria or other bacteria. What do we do with that? Well, is this potentially related to other diseases and Paige actually started looking at this more and found that if we look at the map, geospatially across the United States of where Lyme disease cases are occurring. And the map geospatially across the United States in the same time period of where uterine cancer cases are occurring. They're like weirdly overlapping with one another in a way that we definitely did not expect. And of course, this is kind of like, you know, hand wavy and like not being super secure about all these findings. If we run this as like as like a, you know, covariate adjusted model across the different counties within the US we actually see as significant association. But all of that is done on like, we don't really know exactly what's going on level. Let's, let's look at this within individuals. So we turned to the all of us consortium. And together with Monica Perez, a student of mine at UW. We started looking at the odds ratio of individual associations between people who have been exposed to or reported Lyme disease in their electronic micro record or survey data, and whether or not they had then also reported in the mutual cancer. So we see a not significant association with endometrial cancer in the surveys, but there is an association with a number of other outcomes associated with the funeral director of tract, including uterine fibroids, and then PCOS and endometriosis. And these are like pretty large effects like we're talking about 1.5 fold or something like that. And when we turn to EHR we see an even stronger association almost two to three fold is stronger and higher rates of menorrhagia miscarriage uterine fibroids within EHR, and then a slightly stronger association and a mutual cancer. We're trying to see whether or not this holds up and we're actually expanding this analysis now we just got like an a collaboration going with this TruVetta program which has an electronic healthcare data from 90 million Americans. And so we're going to be starting to look at this in a much larger scale as well to see if we can have even more associations. But at this point, I think we have pretty good evidence that even if there's not necessarily a strong association with uterine cancer there's a lot of other treats traits of the female tract that are associated with with Lyme disease as an outcome. And that's really cool but we also don't really know what's going on even if we have this association right so why do we do understand that better. Well we can look at other data sets so for instance in TCGA, we see that individuals who have high scgb1d2 versus low scgb1d2 individuals with high scgb1d2 expression have much higher survivor rates. The p value here is 0.01 something like that, but the individuals are surviving for much longer, and then we can consider this in the context of endometrial carcinoma and observe again that scgb1d2 is primarily expressed in these glandular cells. So it's acting through some secretory mechanism and potentially protects against cancer or other traits of the endometrium. Okay, that tells us a little bit about what's going on but we want to look more at mechanism, and I think understanding better what's happening in tissues is super important, and that's easier to do in the context of individual experiments and model organisms. So again we worked with Mickey's lab on this, and we're page infected black sex mice who are thought to be not very responsive to to Lyme disease. They don't have very many symptoms, and mice are also really interesting because they actually don't have scgb1d2 at all there's no homo log. So they're a pretty good model system for testing this. So compared to uninfected mice which have this nice endometrial structure, they have a very like kind of standard uterine morphology, the infected black sex mice have this extreme cyst phenotype where there's a large number of cysts throughout the uterus. There's a lot of like gross morphological changes and it seems like there's a huge amount of inflammation going on within the uterus. And we also found that the Borrelia itself is present within within the endometrium. So what we're doing now is following this up in human experiments we're trying to see whether or not we can observe that. But overall I think this supports the idea that Borrelia might be acting directly on the uterus, and that could have an effect from an environmental side from the human perspective. So effectively what we did is we take this model that we already had of genetic variants and environment contributing to biomarkers, which then results in cancer risk, and then breaking that model slightly, or instead of just having genotype and environment separate from one another. We can actually have genetic variants like scgb1d2 that affect the environment that cells are exposed to. And then environmental difference in this case change in rate of Lyme disease is actually going to contribute to differences in the biomarkers of endometrial cancer and result in differential health outcomes. So overall, what this tells us is that Lyme disease risk is modulated by these alleles of scgb1d2. And it's a weird example of a coding variant that's had high frequency but it supports the same model that we've been talking about throughout the rest of these talks and I'm sure everyone else will discuss later today and tomorrow. We also learned that Lyme disease increases the rate of uterine cancer and endometrial cancer. And finally, we also learned that genetic variants can confer these differences in environment. So by understanding genetic variants in the context of environment and environment in the context of genetic variants, we can learn much more about what's happening in the structure of these traits and hopefully better understand the ways in which these different mechanisms vary in different contexts. With that, I'm happy to take any questions. Thank you. Well, thank you very much. Now, so this is amazing this this ball brings context to the floor and highlights complexity. So, I mean, it has the first question. I'm kind of in the chakravary from NYU. I mean, there's a really a cool, cool example and it's. So, there's an ignorant question, not knowing all the details about the bug. So, do you know it's really the bug that travels to other sites, or is it some secreted protein, because your mouse experiments, you know, you just use the bug itself, it may be or something else because if that's the case. In other sites you could find just simply from it could be, you know, a CGB one. Yeah, that's a great question. I mean, Lyme disease, particularly chronic Lyme disease is extremely multifactorial and includes many different symptoms and many different organ systems. And so I think it's likely that some of those are driven by different sorts of effects, but in the mouse model we can actually see that there is direct infection we have a we have a both a luminescent and a fluorescent beryllia bacteria, and both of those are active and present in the uterus, which suggests that at least in the context of the uterus. It's, it's pretty relevant. In 1996 somebody previously reported this and it's just been kind of ignored for 30 years, which, you know, might tell you something but at the same time there's been a lot of evidence that the beryllia infects the bladder and so there's probably potentially some cross talk as a result of that. Okay, so. Next question. What wasn't clear here, the 53 pro-intelucent variant. Is the GWAS heat for for the cancer GWAS or it's not? It's a sub threshold association for uterine polyps. It's not a high frequency. And the problem is that Lyme disease is not super common. So it's like you're capturing probably no more than one to two. In my estimate I got from the county level data is about one to 2% of uterine cancer cases might be driven by Lyme disease, but of course finding cases of uterine cancer from areas with high rates of Lyme disease is something a future study so if anybody happens to have large numbers of genotyped individuals from areas of high Lyme disease very interested in talking more. Okay, Magnus Norbert lands under. Okay, so what do we know about the biology beryllia I mean is the genetic variation what's the main host of your sequence beryllia from lizards, mice, deer, and whether they affect the whole thing. Yeah, totally. So, I mean, yeah, so mice and so dear are thought to often not be super affected by by beryllia. I think the mouse and deer kind of like model and kind of like tick biology there's like a whole ecosystem going on there. One thing that we've been talking about collaborating with is trying to do more like population level survey of this on something like Martha's Vineyard where around 40% of all ticks have beryllia. And so as a result of that we'll be able to get a better sense of the ways in which those are related to one another, but right now that's not done at a huge scale. There's like, there's a few projects that are trying to scale that up right now and of course that's like a growing area of concern as Lyme disease becomes. So you're sequencing the beryllia as well right to. Yeah, exactly. So there's a kind of the citizen science approach right now where people who have ticks can send them in for sequencing and then test that and then we can sequence the beryllia and so I think scaling up that sort of thing and involving more of a community in this project would be really helpful, because we want to capture a large geographic range of large rural areas and it's just really hard to do that at scale. We have a question from Neil Rich online in your epi studies, can you include antibiotic treatment effects. Ah, so yeah so this is just looking at an association of Lyme disease versus not. So those individuals who have been diagnosed with Lyme disease frequently are given doxycycline treatment. So this isn't necessarily looking at chronic Lyme disease cases, the chronic Lyme disease there's like kind of the post treatment Lyme disease syndrome PtLDS which is like if you've been received doxycycline and then there's also a lot of individuals who just don't ever receive doxycycline because they don't know they have Lyme disease. So at this point those two aren't separated but that's certainly an active area of research in the lab and would love to talk more about that. Well thank you very much NASA. Thank you. It is now my pleasure to invite Francesca Luca from Wayne State University. And this is the last talk of the session. So I think we'll learn more about the same set of questions and we'll transition to discussion. All right. Good morning everybody. It is really an honor to be here today and contribute to this exciting conversation. As we just heard, we've been, we started talking about context that, you know, early this morning, and it doesn't progress the slides. Let's see if we can do it from here. All right. It's, it's fine. All right. So, so this is a brief outline of what I want to share with you today. So, so we define context which may seem obvious, but hopefully it's not. And then go over how we go about analyzing genetic effects or molecular function across context, how we translate this finding function from molecular to organismal level and, you know, some conclusion and perspective. And so what I wanted to start with is this idea that to predict phenotypes and genotypes, we cannot ignore environmental and non-genetic effects. And it's my opinion that this has become already obvious from the conversation we have started having this morning. But it's also based on a significant value of literature from, you know, several groups are present here, and also what we have done. All right. So how do we define context? So if we think about the organism, the first thing they come to mind are chemical exposures, are pathogens, diet, psychosocial factors, age, development, sex. But now as we go down into the biological cascade, and so we look at organ tissue cell and even sub-cellular level. In addition to this organismal level context, we need to consider other contexts. And Tulli was mentioning, for example, cell type and tissue type. So I like to think about it, you know, as kind of a broader definition of context that also includes like the metabolic state of the cells, differentiation state, and at the sub-cellular level things like the intracellular environment, the epigenetic landscape. And so then when we think about a context, so for example, exposure to stress which might come from neighborhood violence or low socioeconomic status, which is something we're studying in our group. This is perceived at the organismal level, but then when we go at an organ or cell or sub-cellular level, there's additional context we need to keep in mind, and that exposure is actually combined with what is the endogenous context. And so kind of the point I'm trying to make here is that the response of the organism reflects both exogenous as well as endogenous context. And that this endogenous context, we can capture them with molecular studies. But there's also another point I'd like to make is that this endogenous context can also reflect past exposures. So while the organism is exposed in this moment or being chronically exposed, when we look at kind of a sub-organismal level, we can actually also capture this past exposures. So how do we go about studying these genetic effects than on molecular function across context. This is sort of going deeper into some of the things that DeTuli was introducing earlier. So I'm using this terminology here of in vivo, ex vivo and in vitro, and I know, you know, some of you may be familiar with this concept, kind of indifferent, context is a not intended term. But I want to kind of bring everybody on the same page of what I mean here. So when I'm talking about in vivo exposure, so in all cases we have a cohort that we're interested in. And so we have genotype information from this cohort. But now with an in vivo setting or study design, we are actually able to measure the organismal level exposure. And then we isolate organ or tissues or primary cells and we measure endogenous phenotypes or intermediate phenotypes at that point. So with an ex vivo study design, we isolate the cells and then we actually perturb the context or modify the context in the lab. And then finally in vitro studies, which I'm defining here as in vitro studies are studies where, you know, we are using immortalized cell lines or IPSE derived cell lines, we then exposure that happens in the lab. And so there is advantage and disadvantages to each of these approach. So with with with an in vivo approach, we certainly can measure complex context. We are preserving what I'm calling here the epigenetic landscape and what I mean here is, you know, whatever modification that are due to the environment or to the context and remain in the tissue or in the cell in the biological sample that we're analyzing. We can preserve the tissue architecture with special trust with Thompson special genomic approach we can then study that the disadvantages that we can only study this these samples. If they are inaccessible tissues, such as blood, for example, or post mortem. Here are some examples of how we're using this approach in my lab we've been working with a cohort of over 250 children with us mind Detroit, and we are measuring in addition to, you know, collecting the genotype data we are collecting a wide range of psychosocial developmental and immunological data as well as clinical data. And then we are looking for this gene environment interaction, which I'm when I talk about the interaction I'm essentially talking about genetic effects that change in different contexts or different in different context. So here's an example of some of the results that we've been able to collect and how they can impact our understanding of complex straight variation so this gene gas eight is a gene that we know from other studies that low expression of this gene is associated with an increased risk of asthma. So, what we also find in our data is that the allele at the SNP, which is an interaction in QTL increases the expression of these genes, but only in individuals that have perceived high socio economic status so there's this combined effect of the social environment, as well as genetic genetic risk. So with X vivo approaches, the advantage is that we can expose the cells to a specific context in the lab so we can better study the cellular context, but also that they can be used to perform experimental validation of this context specific effects that we might have been able to to detect within vivo studies. Now the limitations are that, again, we can only work with accessible tissues, and that you know with there's limited ability to focus on the tissue architecture. You know there's a little bit we can do with organoid system but but not as much as you know when when we actually have access to all organs and tissues. And so this is an example of setting where this approach might work well so we're interested for example in study the effect of plastic components on cardiovascular disease. And so it would be really hard to perform this type of study in vitro because we would need both the exposure, as well as the relevant cell type of tissue type so here we're working with endothelial cells and we have a bio bank of endothelial cells, these are primary cells and we can expose them in the lab to plastic components and so that then allows us to look at the molecular function of genetic variants and how they interact with this exposure both of the chromatin accessibility and the gene expression level and connected it to complex trait variation. Another example is using the single cell genomic approaches and this is in again in the asthmatic children cohort we're working with. So using the single cell genomic approaches what we've been doing is being able to look at the PBMC's response to immunomodulation in the context of prior exposure to psychosocial environments, as well as genetic background and so, you know, important important key points here are that, you know, we're preserving when we treat PBMC is we are using the cell cell microenvironment and the other the other advantage of using these technologies is that we can look beyond like the mean gene expression molecular phenotype and we can look at genetic and environmental effects or context effect on the variability of the gene expression phenotype. And so finally, you know, with in vitro approaches, I think it's quite obvious that, you know, we're losing that ability of having a memory of past exposure. But what we're gaining is that we can virtually study any cell type and that's because we can differentiate them in vitro to a variety of different cell type. And again, we have limited tissue architecture that we can study. However, you know, one advantage is that now we can at the same time study context specific effect of genetic variants from different cell type from the same individual and this is a study where we have exposed the LCLs, B cell, immortalized B cells, induced pluripotent stem cells and cardiomyocytes derived from IPSCs to 28 different treatments. And so what we've found is that if we consider multiple cellular context, we can actually identify this complex G by effects and so again I'm sorry the pointer doesn't work, but you can see that we can identify the QQ plot shows the same type of G by that we can identify. So we can identify cell type by genotype effect, we can identify treatment by genotype effect, but we can also identify cell type by treatment by genotype effect. So this is an example of, you know, even with an in vitro treatment there's complexity that can be analyzed at the cellular level and the other major point of this work that we've done that are like to make here is that we find that genes with gene interaction are less likely that we find that in our study are less likely to be found in GTX or as well as in prior studies of gene regulation where the environmental context is not measured or you know not not considered so it's ignored. And so that points to the discussion we were having before, you know, following tool is talk that it's not just a matter of, you know, and it's not just a matter of having the same genes or being able to find eqtls or non cutting regulatory variants for genes for which we don't know them, but really finding a facts that are context specific. Briefly, how do we think about translating genetics to function then. So I wanted to show with you two type of approaches that we're thinking one is partitioning the ability for complex traits using molecular G by annotations. And so this is one of the first studies we've done in my group we looked at 250 context cellular context so this is 50 treatments that we apply to five different cell types. And so what we found is that for certain traits, the G by contribution to it ability is other than the genetic contribution so I think one important point to make here is that one we can use molecular G by annotation to characterize complex complexity and second that you know we have to keep in mind that the clinical phenotypes or the complex traits are complex, but the environment is complex as well. And so, depending on the specific environments that we're looking at, and specifically we're looking at we can find the sometimes the G effect is stronger and other times actually the G by effect is stronger. Approach is an approach where we can actually use, you know, all we've learned so far from G was and here specifically T was to then understand about this context specific effects so you know in T was we're all familiar with the approach, we're collecting genetic variation to complex straight to genetically predicted gene expression so in other words we're finding genes where variation in gene expression is associated with the trade. One thing to keep in mind is that once we have these genes, the expression of all these genes also varies as a function of the context so the environmental exposure, as well as gene environment interaction. So in other words we can leverage what we learned from T was to understand what are the environmental and the G by risk factors for complex traits. And so here's an example. This is again work we've done with our asthmatic children cohort and, you know, we're focusing on how this body is in urban environments. We know that urban minorities are more likely to be exposed to psychosocial and physical stressors. And so what this image here is showing on the left, there's a column that shows the psychosocial variables that we are considering in the middle that shows the genes and on the right, the complex traits and you can see these lines that connect the three columns. So the way to interpret this is that the expression of these genes is associated with complex traits based on transcription and association studies data. And these are as minor as my related field that we're looking at. And on the left, the connection between the psychosocial variable and the gene expression is sunny the windfall from G by e qtl mapping but it could also be, you know, another set settings we use gene expression and so essentially we can draw this connection where once we've identified the genes and we know that the mechanism that connects the gene to the complex trait is variation in gene expression, we can actually now connect which environmental risk factors also contribute to that disease. And so in conclusion, hopefully I've convinced you today, you know, kind of following up with what the other speaker has said that genetic effect to human molecular and complex phenotype by across context. I've also, you know, try to make the point that molecular studies allow us to dissect the mechanism underlying organismal response, and also to consider cellular and subcellular context. And so kind of a broader implication of this, you know, area research is that if we think about it, even if genetic risk is not directly actionable, we can modify individual risk by reducing both environmental risk and also genetically through gene environment interactions. And so if we think about, you know, perspective where we are now. So I think we're in that little stuff there in the middle. So there's like three major areas of work, you know, if we think also like if we go so beyond, you know, human genetics right there's three major areas of work that have a tiny little overlap. One is like the GWAS field where we collect large core to collect clinical data, we have the genotypes. There's the molecular QTL studies where the sample size is generally smaller, we have the genotypes with the molecular phenotypes and we have the biological samples. Sometimes we look at context, but you know, usually not in a very extensive way. And then there's what I call the exposome field, which, you know, think about exposome a little bit kind of broader than the usual definition is not just chemical exposure, but it's like the broader environment, right. And so, in that field, organismal phenotypes are collected, but many, many contexts are collected and nicely characterized. And so what I, what I think that we want to be is actually in this other figure where that star becomes much bigger because actually the overlap of these three fields becomes bigger, so that we actually have at the same time, genetic data, phenotypic data, context and environmental data, as well as the ability to do molecular studies. And, sorry, my very last and very important slide is that so it takes a team to, and you cannot read that, but what it says is that it takes a team to actually do this work. And so most importantly, it takes an interdisciplinary team. So, you know, that this work can be done really because I work with amazing colleagues Roger Picarage is one of them and some zeolias well some zeolicide psychologists. We actually do this interdisciplinary work that allows us to, you know, go beyond genetics and beyond molecular mechanism and look at these complex environments. And thank you very much. So thank you very much, Francesca. So we're have time for a couple of questions. So right in the chair of artists first. Thank you. I'm fairly ignorant question that is, at least for genes now, we have a fairly good full catalog. So when you speak of context or environments. Is there some, I mean what's the current thinking of how complete we might be in this, you know, because you're finding widespread interactions with almost most things you study, and I'm not saying it's not real. But how do we evaluate that I think one way would be if you could study at least have an indication of how broad the environment is so there's some ways to figure that. So, I think it's, I think the question is actually two questions right one is how broad is the environment and which is the environment that matters, if I understood, you know what what you were asking. And so how broad is the environment. It is broad is very broad right and, and from a human genetics perspective, we are only kind of scratch the surface of analyzing, you know, environmental effects on human phenotypes, but there's other fields that are, I think more advanced in that in that area. Which environment matters. I think one way to think about it is that, you know, if you think about what the environment does to human body and how we respond. We, I cannot, you know, I cannot imagine that to every environmental stimulus corresponds a different molecular response right there's going to be some molecular responses that are shared. On the other hand, studies we've done you know we've done gene regulator network studies and we've seen that when you expose cells to a variety of different contexts you don't always hit the same genes. Okay, so there is, there's environmental specificity or context specificity in the response. But that doesn't mean that every single environment is going to activate a different pathway. So does that make sense. So we need to look at diverse environments but not, you know, potentially not to all of them. So what is the problem that we don't know which environments are similar to each other until we actually study the response right so we need to be able to classify environment group environments. And for some we can use kind of biochemical data to do that but for complex environment like psychosocial environments right, we need to be able to know which ones activate or modify the same, you know, biological processes in the body to know that and maybe I don't need to look at all of them but I can look at these two or three. And I think collaborating with, you know, experts in kind of the environmental side of things whether it's psychology sociologists, or, you know, toxicologist is really going to help because they've been thinking about this from, you know, an environment phenotype perspective and so you know we can contribute the molecular aspect to it but yeah it's it's I think it's a very important question and strategically not sending to think about that maybe that should be the first right deciding how many different environments do we need to characterize. Francesca said, I think this is very important this is all conceptual, my understanding, at least the subtext of the question was that in genetics, we have this is a number of variants there's a number of genes. This is a clear statistical process. We have the finite catalog we have multiple test correction 510 minus eight whatever FDR. We know what that we're dealing with a finite set of hypothesis like in I got a Christie novel right so there is corporate among like this set of principles in this castle. And there is a very clear statistical work around for for this. So I think what the, what what Irving discussion implied is here you have an open field of possible hypothesis, and I think the question was, it is my mind is the the question was that it's not only about conceptual but partly about methodology, how do you address the statistically being in this open field. That's, that's, that's a very good question. So, so the way we've addressed it is, you know, the same way we address multiple test correction, for example, in, you know, in genomic genomic data. So we don't look at one environment at a time. We usually look at it, you know, in the big studies we've done we had a variety of different environments and then you know at that point. You would apply multiple test correction the way you would apply it to like a variety of traits or a number of snips you're testing now, you know, because of the limited resources right and because of where the field is out at this point. Was that an extensive survey of environments. No, I was not, you know, but it was a study on psychosocial factors. These are the psychosocial factors that makes sense to study from a psychosocial point of view. Now when I test their effect on human phenotypes or on, you know, gene expression. Now I'm applying, you know, multiple test correction approach, for example, but you know, it's, it's, it's a huge field if you think about it. Right. So I think, you know, and I don't have necessarily an answer. So, but I think it's a very simulated question and something we should be discussing. Thank you. I guess the last question if there is any. Yes, just a second. Greg Gibson. Thanks a lot, Francesca. It's beautiful. I'm just wondering if you could take some thoughts on causality. So you see G by E, and then you see interacting with the trade. When do we know that it's causing that response and this and alternatively a response to the disease status itself. Yeah, that's a very, it's a very good question. So there's two answered that, you know, I like to give you and, you know, they're not necessarily the right answers, but you know kind of brainstorming on it. One is being able to actually perturb the environment with an, you know, in the lab allows you at least to say, okay, I see that phenotype and it's necessary like an endophenotype right in vivo. Now I'm actually perturbing the cells, you know, say I see an endophenotype that has to do with the pathogen I'm actually perturbing the cells with that pathogen and seeing the same effect. So I'm establishing causality right so that's one. The other is, you know, when you go at your organismal level and that's, you know, I think that's that's where things get a lot more complex and, you know, in the conversation is probably gone for a long time but but that's, you know, that's a place we're thinking about smart ways of how you established causality is really important, you know, I think specific study design like one study design that comes to mind that we've been discussing with my colleague is, you know, thinking about something like COVID pandemic as a natural intervention. So if you have samples before and after COVID, right, you can now see, you know, you consider COVID, you correct for the immunological aspect of COVID and you focus on the psychological aspect of COVID or social aspect of COVID can you actually see, you know, the changes that you would predict based on association studies that we've done in the past. Yeah. Thank you very much so let's thank Francesco let's thank all the speakers of the session. And with great pleasure I pass the baton to Barbara stranger who would lead the discussion. All right, can we have all the panelists come up here please. So I do have one announcement from the organizers lunch will now be at 1245. We have at least one speaker, I believe who's going to be virtual so I think we're just getting that set up. I can use this one. All right, so in these previous talks today. This is kind of addressed to everybody here. We heard a lot about trying to. Okay, trying to connect trade trade associated variation to function and identifying causal genes and pathways. And we heard that some of the challenges are perhaps related to context specificity of genetic effects. And, and so I guess I would like to ask each of the panelists, really what they see as the key questions we need to address in this area. Is it is it more environments is it finer scale we heard about different levels of organization here. So I think just first what are the scientific questions and then maybe we can talk about how we best go about answering those questions. Welcome Brandon and Alkes we can see you now. Yeah, so I don't, I don't know if we're going in any order but I'm sitting next to you. So I'm happy to start this off. So I think that the is this part can you do not hear me through this mic. Okay, so yeah, I think I can, I can start a little bit by thinking about some of the key questions. I think that the presenters argued pretty clearly that context specificity which includes cell type environment, developmental trajectory and things along those lines are key to look for some of the genetic factors that may be trait associated that we don't currently see any mechanistic explanation for I think that studies beginning with some of the infection response single cell cell type specificity etc have demonstrated that we do indeed identify GWAS co localizing variants that affect very diverse context diverse cell types etc. And I think one of the big questions facing us is how do we actually begin to measure that and which environments context etc do we actually measure that? So I think there are a few guiding principles you know some just coming from what do we know affects gene regulation we know gene regulation changes dramatically during development and differentiation. We know gene expression and gene regulation, you know, very greatly during certain environmental responses we can see this just in the small scale experiments on gene expression, and both computational experimental studies have demonstrated that the sort of big context that gene expression tend to correspond to the big context that modulate genetic effects on gene expression. So we can use that as a guiding principle on small scale studies to then decide where to look at large scale studies. We also can use what's available in terms of technology to guide this so we know that now we have availability and better and better resolution of single cell techniques of trans of spatial transcriptomics and others. We have the opportunity to look at cell type and spatial data and interaction between cell types much more than we had previously. And I think you know I have many more comments but maybe I'll let some of the other panelists jump in first. Maybe we just go down and order the table. I mean I agree with everything Alexa said I mean, one thing that I kept thinking about this morning and trying to prepare for this was what truly called these points of convergence. So I think in a way called this counting of very how do cells count how many risk variants, there are in a genome in some sense I think those are the same thing. Right so because it's, there has to be some integration of, you know, dozens hundreds of risk variants pulling at processes but then it's ultimately the processes that go ahead and do change cell biology and cell function in ways that that matter so there has to be some sort of integration. I truly call this sort of the value of death that maybe we don't know enough about at this point. So, so I agree with that and I think it's going to be interesting to see how one can go about doing that I guess one one open question my mind is, can one to what extent can one back out the seller the activities from molecular systems so if I told you the abundance of all the genes that make up the ubiquitin proteasome system in a cell does that tell you how active that system is. Or is there other things you, you have to measure do you need actual reporters off of that system and selfishly I just said ubiquitin proteasystem because we work on that quite a bit. So we decided to go to the genetics recently and it turns out to be genetically quite complex, but but again so the question is, you know, does one need dedicated activity reporters or can one look at abundances and, and you know, figure out what the cell is up to. So we decided to go the order of the table there are two copies of Brandon and alchies one is left of me on one screen one is right of me on the other screen so Brandon alchies would you like to go with the screen which is left of me or right of me. Okay, I will ask Brandon to start us off. Great, thank you really really tremendous talks to this panel really raising all of the relevant issues. I feel like in the field right now what I'll briefly say is I think about this problem, I feel like we are from what I learned from the talks it seems like there's kind of two competing perspectives on how to kind of think about complex traits. And one of them kind of this kind of how we were all trained from the path dependence of kind of looking at predictive trying to predict you know phenotype these complex phenotypes from genotypes. And I think that's kind of how we've all been raised and how we continue to think about it and how we all work on it. But as I think about this last talk about, in particular, some of these vagaries of how G by E operates. I think we're, we might be running into a point where the science of the molecular underpinnings of the pathophysiology of a lot of these diseases is a much different science with a much different answers that and that are, you know, maybe not predictive at all, right at the population scale. And kind of what I'm dealing with from these from the talks this morning is, are we comfortable with the science, where we're learning a lot about things kind of at the individual cellular level you know I study protein folding. But that set tells us nothing virtually about the way the disease is operating at the population scale. Right. It's like a particle duality type of split in the way we think about right complex phenotypes of disease. Are we comfortable with those two types of understandings being completely different with the hope that maybe eventually will kind of be able to walk them together. And so it's a little bit nihilistic, but it's also in some ways realistic based on what I've heard. All right, well, I'll jump in with a few points. I agree with a statement that learning about molecular traits is not exactly the same thing as learning about disease and we all want to learn about disease of course. And I mean, this is something that of course truly really referred to as, as missing regular ability where, you know, we have methods that can tell us how much of disease heritability or snip heritability is explained by molecular trade qtl. And the answer right now is maybe only a fraction, and we could view the glass as half full and say well that's great that we're explaining a lot of disease heritability with that type of molecular data, or we could view the glass as half empty or more than half empty and say we're thirsty for we're getting more of what's going on into disease GWAS. And I also want to repeat the point that truly made about distinction between, you know, different types of functional data as we think about sort of getting more granular into cell types and cell states and cell context. We have a truly called the encode type of data where we have a very rich compendium of different types of functional data with very low sample size, versus on the other hand, molecular qtl data what we're going to need, you know, 100 or maybe hundreds of samples to kind of gain qtl traction. And that's obviously going to limit, you know, the kind of dimensionality of the types of data we can collect. So with those points in mind, I think, you know, maybe stake out a few opinions about what the root forward might be I think probably that the one statement we can probably all agree on is that we collectively the statistical genetics and functional genomics community are definitely going to make the assay more fine grained cell types and cell states and cell context, and in doing so we're going to learn more about disease and specifically we're going to explain more disease snip heritability I don't know how much more but definitely some more we can probably agree on that. I think that maybe something we could do besides focusing on gene expression is we could look at other molecular traits which have been maybe not as much well studied. You know, that might be a function of the technologies but certainly, you know chromatin qtl. And there's other types of, you know, molecular traits, you know, the paper of dong it all 2022 nature genetics looked at enhancer expression qtl recent work who adult 2023 nature genetics has looked at chromatin qt, his own qtl and the GTX samples at sort of ramping up to larger sample size. And then of course we have the trio of single cell qtl papers published in nature and science last year which is sort of exciting to bring it, you know, bring the single cell into the qtl world. And so I think that, you know, looking at qtl other than expression other than bulk expression but even other than single cell expression is, you know, potentially a rich root forward, both because, you know, other molecular traits may provide complimentary information to expression, and also because we know that other like especially chromatin qtl maybe more cell type specific than expression qtl at the blueprint consortium paper going back to Chad at all 2016 cell has a nice figure illustrating that point. And then the other final point that I want to make is even though a ton of, you know, great statistical methods work has been done in this space dating back at least 13 years. I think that when it comes to sort of small sample sizes if we have, if we have barely powered molecular qtl studies, I think there is more to do in the statistical method space to tie that to these disease GWAS as best as we can. Did I reach my end of the table. I think what I want to when I think about what's being said here right. It's and I think about history of the field as we learned today. It's the architecture of complex traits has been elusive right it's coding done. Oh, it's not coding and then it's all bulk homostatic expression. No it's not. And I always are thinking goes where there's no data. And now the slogan of the day is context, and I'm part of this I also believe it's context must be what what else, and we're facing this infinite dimensional infinite space of context suddenly and we're thinking about collecting data including molecular traits including including environmental including cell intrinsic and extrinsic so the problem with this is that in all previous steps of this. There was an idea this is our model. There was a simulation there was understanding of statistical power and what data. We're going to collect so here at an age right. I'm thinking that is that is what's needed because you can collect in the infinite space of context you can collect infinite amount of data and create a multitude of underpowered studies and learn nothing. Right, so, so I think what we need we need in addition to the idea that we're going to pursue context and the following what Francesca said it's a complex and multi dimensional and we don't know what's relevant. It would be good to have some sort of understanding what is our model, how much data we should collect on how many contexts do we have statistical power to find the effects and so forth so I think this is what's what to me was like ever I agree with everything what's been said but this element has been missing in this discussion. Yeah, well I wanted to make one comment about the sort of very high dimensionality of this sort of space o temporal context space we need to explore which is that we do need to lean on both experimental and computational methods that can make that more efficient so experimental methods meaning village in a dish multi cell type cultures including organoids embryo bodies and others, you know ways to explore multiple perturbations and very efficiently. And we also then need computational methods that can handle sort of deconvoluting these complex data single cell multi cell type systems where you know in some cases we're going to have to be inferring the context from the data as self that we're thinking you know how to sell type or context modulate genetic effect so we need development on the both experimental and computational side to make this plausible. Just to follow up on that. And there was a question in the chat that that addresses this a little bit or just as an interesting question so are we going to be entering the realm of candidate environments here. And, you know, maybe for tractability. This is something that we would have to do so we can learn. So we can learn, you know which kinds of things are potentially learn patterns right as to where these context, you know, features of context specific effects but I think that's kind of an interesting question. So Barbara I see Michelle George with his hands up so are we accepting I know this is a panel is at the close panel or we invite questions from. Let's have some, let's have some questions. Yes, please. Maybe one context that could be brought into the discussion. And so I think we all do is the G was for disease predisposition actually so we map genes or loci which we think there's a perturbation there that is affecting the chance that we can develop the disease. And when we look at functional correlates through a QT analysis for a number of reasons we study them into healthy individuals which is nice because it's generic and you probably have less noise due to the disease process. So a parent disease is kind of interesting to we all say in grounds that we do that because this will be interesting targets for the pharmaceutical industry sort of an interesting logical lead to say that you the mechanisms that determine how to develop the disease can be targeted to reverse the disease process if you have it putting that sort of an interesting jump. But I think what is possible is that part of these effects that we see actually only come into play in the context of let's say an early disease. The what we map cannot only be what at least I thought I was initially mapping that these things that are really predisposition but it could actually also be things which determine the course of the disease once it is initiated. And so I think one of the context that we should probably bring into the table when we look at these functional follow ups is to actually take into account the disease process as a context. Yeah, I would agree with that panelists anybody want to follow up on that. Just with agreement that that's one of the when we think about temporal context the two that I always think about our development differentiation and then disease process. So, yes, definitely. So I wanted to come back to a few things that were brought up by our vendor and Brandon. And when we think about age as a context. There's been beautiful work showing the importance of develop early developmental sort of cells that we don't see again later that you know the huge role that that some of the autism genes play in early development. You really can capture that only if you're you're looking at that sort of genes expressed in fetal development. But a lot of our data really come from diseases where aging is, I mean you know their diseases of later life cardiovascular disease and cancer where the context may not be about cell type but big biological processes like inflammatory biology and the sort of declining immune system, biology that that may make cancer more, you know, immune surveillance is failing cancers are growing the, the ways that inflammatory biology and brain biology talk to each other is also creepy and one of Tully's questions on. Is it gene function we need to understand our variant function. The, the, our ignorance of gene function is illustrated in so many ways over time, the same gene families, and with different names when neurologist discovers them, and when an Indianologist discovers them. So it's really working the complexity of the system in fine cell cell communication is clearly a part of this that we, that we don't capture well. And I saw a, I think they have to think of the context as both yesterday really important cellular context, but they're going to be really important life contexts that are about exposures that are about. You know the whole set of disease processes that we have. And then to come back to one of the things our event has said, we clean up debris from disease in complicated ways proteins turnover. And that whole process probably contributes more to lots of diseases, then we've appreciated. So we probably do have some good context candidates that we can probe better than we could probe before. Because we are learning more about those things. So, but we all got burned with candidate gene studies. So, we probably need to think through the modeling of candidate exposures and candidate contexts going forward. Yes, I'd like, I would probably just reiterate what I what I said following Nancy's comment is that if, if there is a guideline or approach, this is how we approach context right so this is how many context you should do in a single experiment this is how do you analyze them jointly this is how you figure out what is the statistically significant this is how you collect the data this is our requirements. So I think this is missing this is completely missing in the field at this point and and I. So so we don't end up in the same candidate gene situation. Again. I see Alkes has his hand up. Alkes would you like to add to that. Yeah, I mean, as we think about this infinite dimensional space of possibly relevant, you know, a fine grating context and also environments. I don't have all the answers but a couple points here are that when it comes to context, we can, you know, probably assay them in very small sample size. So it kind of what to Lee called the encode style approach and try to get a sense of, of, you know, what is important when we integrate that with disease GWAS, and then invest more resources in a more molecular QTL type analysis of those that seem to be important or important for a particular disease or trade I might be stating the obvious here, but I think we're saying, and, and in the content in this, you know, in the world of environments, you know, we do have these amazing, you know, large bio banks or perhaps even larger, you know, EHR cohorts, where we can certainly, you know, get a sense of which environmental variables are important for disease and perhaps not definitely but probably those might be the ones where we might want to start to look into G by E for disease or maybe G by E for expression as in Francesca's talk. Thanks, Elkies. I'm Brandon, I see your hands up. Yeah, yeah, briefly and really in agreement with those comments, but I just wanted to challenge myself and all of us with the notion that the space of possibility environmentally is infinite and cannot be understood. I think I hear that a lot. And I think given give a disease say for example take take the uterine cancer story that was so beautifully told in this session. And we have many, many, many, many decades of kind of contextual and environmental things that actually do inform right disease progression in that disease. And that goes for many, many sort of diseases. So I think I mean I think part of the problem with the way I was trained. Right, is that we have I was not even trained to bother right to even think about that on the way into the problem. Right, I was only taught to think about that on the opposite end of the problem after I have uncertainty around my inability to find a main effect that I'm happy with. I mean, there's different ways to answer the same sort of questions that I think shrink the space of environment of possibility into a tangible set of things that I think could be much, much more responsibly get after some of these questions. A question in the back I see. Peter Fisher. So, if we have an infinite number of environments and context and context specificity is so important. Why is the correlation between relatives for trade so high, like for identical twins their correlation in liability schizophrenia is about point seven or so. So I don't understand how this, this complexity suggests to me a lot of opportunities for noise for stochasticity, and yet relatives are very for a lot of common diseases as well or liability actually quite high in the correlation for the panel. Thank you. Can I answer this real quick. So, I'm very sympathetic with the idea that we know that h square and small h square. We can explain a lot of that with the epidemiological studies and now with with us right. So, so the, I think there are two separate questions there is question of environment in the classical sense pollution smoking. I don't know exercise and stuff like that. I think the reason this community so obsessed with context in the session is is not that is molecular trades is the inability of us to explain what we're seeing in G was in in explaining each small square with molecular data in hand. And that's a different question and that question. I don't know what the context is the right answer. But if I'm thinking about the context of the cell and the context of specific molecular interaction. So it differs from cell to cell it doesn't differ from from individual to individual so if the familial study as you mentioned, are only answering the component of broad environment for organisms but but don't pinpoint to the molecular mechanism so this is I think this is the issue the context comes forward, not only the broad environment. Otherwise, I fully well at least I fully agree with with what you said. The reiterate one thing I've been thinking is that there are many contexts and environments that we all are exposed to we all go through development and differentiation we all have multiple cell types we all are exposed to infection. These are still relevant potentially to, you know, explaining heritability. Quickly jump in and I mean some sense environment also has to be integrated into cell biology right so maybe it's actually not not so distinct from thinking how, you know, 1000 variants in 500 genes pull that cell biology I mean at the end of the day maybe environment ends up doing the same thing right just through signaling cascade. So, so, so maybe at the processes that matter it maybe it's not so distinct I mean Nancy brought up diseases of aging versus developmental disorders. Maybe those obviously it's going to be very different for different diseases but I wouldn't be surprised if sometimes the way in which things break down at the center level or maybe actually the same. The difference is which cell it affects and when it happens preventing it from from perceiving its function which I can for development of course lead to all kinds of downstream effects. So maybe there's sort of some hope there right that again if we know what to integrate to and measure that may maybe that's where we need to be. And that can maybe also inform them the kinds of environments we want to test right because we know how to, you know, stress certain cellular pathways to the exclusion of others or preferentially compared to others. Thank you Alkes, would you like to weigh in. I understand correctly, Peter's question was, if narrow sense heritability estimates from twin studies are so high that in the context of disease and complex trade G was how can G by play such a major role. And my answer to that would be, we do have to be a little bit careful with these twin based estimates. The paper of Zuck et al 2012 PNAS showed that G by G interaction can inflate twin based estimates even though twin based estimates are intended to be estimates of narrow sense heritability. And I would hypothesize it in the same way G by E could inflate twin based estimates even though twin based estimates are intended to be estimates of narrow sense heritability. So if you look at height, but trait we all, you know, know and love, you know, for a long time twin based estimates came in at point eight, and then we have some more recent estimates young adult 2018 nature genetics using a polygenic TTT type test which you don't have to agree with it, but the estimate came in at point five five with a tight error bar. Why is that so different from the twin based estimates one possible explanation is it could be G by E. Okay, we've got some questions in the back. Okay, thank you. So, in, I think this possibility was raised in a couple of the talks but I, I wonder if the panelists or others could comment on that so there was this notion that maybe we have a certain hourglass type behavior where many genetic effects of different kinds are kind of like a focused into specific mechanisms and pathways leading to specific diseases or categories of disease or traits and so on. And then on the other hand, there was a lot of talk about context in the context of environments, and I'm wondering if we're expecting or what's the thoughts of whether the environmental effects go through the same hourglass. That sounds a little bit like what you were just talking about. Right. So asking whether the environmental effects also converge through a set of pathways or particular cellular readouts and, and something along those lines, you know, I think that's very likely if the disease disease process are actually mediated at the cellular level. That's not from Tully that says the only environment the genome sees is, you know what it sees in the nucleus and to some extent you know if the disease is present to the cellular level I think it is passing through the same pathways gene regulatory networks and sort of key changes that the genetic effects are passing through as well you know there is some we haven't talked very much about like systems level. And when I say systems I mean like organ system level effects you know hormones etc we haven't discussed that very much but at the cellular level I would say yes. Question. We've got a question. Two points if I may the, the answer you just gave and earliest comments that the environmental effects and the genetic effect should pass through the same pathways. It's positively embarrassing you don't know the answer. People have been studying how environmental effects affect physiology and disease. Surely, you should have had enough data to answer that question. Do they go through the same pathways or not. Sorry, I don't want to answer this. There are many things that are embarrassing. So I was just thinking of two examples I think there are many examples where we know the environmental effects act through the same thing classical example is the effects of Toledo mind on embryopathies. You know the mechanism wasn't known few years ago somebody showed it actually that the drug affects the same transcription factor that have mutants that create genetic copies. So that's exactly through the same explains a lot of other related facts with it but the most common example is for example is is oxygen metabolism. It affects the same network. Okay, there are genetic mutations. So ones that what's his name I forget anyway, you know who found, you know common variants that lead to high altitude adaptations. And now the biochemistry has been figured out, you know, with hip one alpha, they go through exactly the same pathways and this is one of the broadest environments. So I think my earlier question of, I think it would make it would be of some value to think of what we mean by environment what we mean by context and trying to, you know, not a museum collector but trying to classify so that we know how much of it we cover. So maybe some things we cover in great detail others. It would be useful to take very specific environmental facts and see whether they go through that same meaning other successes we know exceptions or are they really rules. I actually like someone to do a systematic analysis to say, you know, here's something that's been well studied here are all the mechanisms we know from environmental studies or. Here are the ones we know from genetics, see how well they marry up. But if I could move on to a second point. Julie's last slide had a diagram showing a cascade of things from mobile serial level to data time and the diagram could have been in a book on artificial intelligence showing the way in which deep neural networks are used where they propagate a single from one layer to another. Has, has anybody tried properly to use artificial intelligence to integrate these different layers of phenotype from, you know, gene expression through biochemistry to outside phenotype. I want to comment first to your earlier question, because my natural intelligence is not great enough to comment on artificial intelligence question. The, the analog of our gloss that I don't see from here I assume guys sell a brought up is is the this idea that there are disease related pathway or phenotype related pathway environment affects them genetics affects them. There is some convergence on some mechanism and if we target this mechanism as we can, we can cure disease and I think what NASA stock demonstrated today. The reality is very complex and if we think about cancer there's lung cancer we know we want to do GWAS and figure out the pathway that causes cancer and how we target the pathway the biggest hit in GWAS for lung cancer is nicotine receptor. How many cigarettes people smoke do smoke right that's environment that's a genetic predisposition to the environment. The fact of the, or this environment is on mutagenesis it's not on cancer progression, you cannot address a smoking behavior or even get a genesis to cure individuals from cancer right so. So this mental models I think this is the problem. We, we work with mental models in the space of ideas in the situation where all this effects are much more complex and outside the box and as soon as you look at real genetics example in the line disease example today. Right. It doesn't feed this. This simplistic mental models and I think we just have to accept that. And I let the others to respond to the book artificial intelligence. Yeah, I think we're just about out of time. What did you want to say something about artificial intelligence. I'll just say, you know your question was have people tried properly so I would say people have tried, you could argue about whether it's or you know there's been a lot of artificial intelligence applications on on the genetic sequence there's been a lot of applications on modeling gene expression but sort of the genetic variation and heritability part I think it's less well developed and I'd be happy to talk more about this when we have time. We're being cut off. I believe there was one more question. Yeah, well, sorry. So this is a looking go University of Queensland. I'm just going to make two quick comments. One is about my points which I love to have a very provocative statement at the end. I guess that your point is very related to some of the discussions we had about polygenicity. And when people think about people achieving the same polygenic scores with different combinations. And in a way we're comfortable thinking that somehow those variants will converge in some similar pathways and I don't think it's a big leap to think that the environment will do the same but I agree that we should be able to test that empirically. And secondly, my point about the dimensionality of the space of context and environment. I think it sounds very daunting, but you know I was thinking that we've achieved similar. We've managed to answer the similar problem using a concept like sleep heritability. We've have been able to quantify how much information there there is without actually before identifying the specific variance and I think we can achieve the same thing for that large specific space or context space if we sample it large enough we can we can search it later. Okay, with that, we will stop. Thank you very much for your questions and thank all the panelists for their insights here. Thank you.