 Hello, everybody. My name is Carolyn Hutter. I'm the Director of the Division of Genome Sciences at NHGRI. And I'm really honored and excited today to be moderating the second session in our bold predictions for human genomics by 2030 seminar series. I want to thank everyone who's currently watching us live on Zoom. We had well over 100 participants with the numbers going quite up when I started talking and I think it's continuing to go and we know we had over 400 people registered and I also want to welcome those of you watching this in the near or perhaps 2030 distance future on the archived version of this. If you haven't already, I do want to encourage everybody to go back and watch the first sex session. Not only will you get to see excellent talks by Karen Miga and Evan Eichler, but there was also a great introduction from our NHGRI Director, Eric Green, that provides sort of details on the inspiration and context for this seminar series. But for today, I will just want to say the key point that this seminar series is one of a number of activities that we're doing in relationship to last fall's publication of the 2020 NHGRI Strategic Vision. As shown here on this slide, this publication which was in nature outlines a series of elements that really define the cutting edge of human genomics over the next decade with an emphasis on human health applications. And that's really what we see and are defining as the forefront of genomics. A final section of the Strategic Vision document presented a set of 10 bold predictions for human genomics that might be true by 2030. These predictions came out of our multi-year Strategic Vision planning process and were really intentionally crafted to be inspirational and aspirational with the hope of provoking discussions about what might next be possible at the forefront of genomics over the coming decade. And it was really in that spirit of provoking discussions that motivated us to have this bold prediction seminar series. As shown on this slide here, we already have the dates and speakers lined up for the first five sessions, and we'll be releasing the names for the second half of the series, those bold predictions six through 10 in April. All sessions will be like today, held by Zoom, are open to the public, and they're video recorded for later viewing on the Genome TV channel of YouTube. You can find more information at genome.gov slash bold predictions. There's a bold hyphen predictions. And really, we hope that people not just watch today, but come and join us for the full series because we're pretty excited about it. The format for each session, which we'll be showing today, is what is shown on this slide. We have two speakers who will give 25, 30-minute talks followed by a moderated discussion and a Q&A. To that end, I really do encourage everybody to submit their questions and comments via the Zoom Q&A box. You can do it during the chat ahead of the session or also at the session. We will be holding questions till the end, but we do welcome you to write your questions in real time as our speakers are speaking. So today's session will be diving into the second bold prediction. And what this bold prediction states is that the biological functions, parentheses S, of every human gene will be known. And for non-coding elements of the human genome, such knowledge will be the rule rather than the exception. As is true in all 10 of our predictions, we really feel there's a lot to unpack to really think about and understand and talk about what these predictions mean and how we may get there. And our goal is to do so for the second prediction over the next 90-ish minutes. And on that end, we really have two fantastic speakers that are here to help us tackle this prediction today. They're listed on the slide in alphabetical order, but I'm going to be introducing them to you in the order they'll be presenting. So our first speaker is Dr. Neville Santana. And although I had updated my slide to say that he's at the New York Genome Center and NYU, that somehow doesn't seem to have come through. But he is a core faculty member at the New York Genome Center and an assistant professor in both the Department of Biology at New York University and in Neuroscience and Physiology at the New York University School of Medicine. He did his undergraduate work at Stanford, has a PhD in Brain and Cognitive Sciences from MIT, and was a Simon's Post-August Girl Fellow at the Broad Institute of Harvard and MIT. Neville has a number of awards and accomplishments, being that this is an NHGRI seminar series I decided I'm just going to highlight that he was an NHGRI nominee. And we were excited, ultimately, recipient of the Presidential Early Career Award for Scientists and Engineers, or P-Case in 2019. For those of you who don't know, P-Case is the highest honor bestowed by the U.S. government on outstanding, standing scientists and engineers at the beginning of their independent careers. And it was really exciting for us that Neville was recognized for that award. Our second speaker will be Dr. Nancy Cox. Nancy, I have to say, has been a role model for me throughout my professional career. She's currently the Mary Phillips Edmunds Gray Professor of Genetics at Vanderbilt, as well as being the Director of the Division of Genetic Medicine and Professor of Medicine at the UMC and Director of the Vanderbilt Genetics Institute. Nancy studied biology at Notre Dame. My aunts, uncles, and cousins would all want me to say go fighting Irish and as an undergrad and then receive her PhD in Human Genetics at Yale University and post-doc at both Washington University and the University of Pennsylvania. Nancy is a highly accomplished and renowned quantitative human geneticist and similar to Neville, I thought I'd just pull out one thing for the audience today and specifically highlight that she was the 2017 President of the American Society of Human Genetics and winner of the International Genetic Epidemiology Leadership Award in 2010. I really can't think of two better people that we could have gotten to discuss this bold prediction with us today. And so without any further ado, I will turn it over to Neville. Great. So thank you so much for that introduction. I really appreciate that. And it's, yeah, thank you for the invitation to be here, both NH from NHGRI and also from Caroline. So today I'm going to try and play some of our work in the context of this extremely bold prediction. And hopefully it'll help us kind of think about what 2030 might look like for functional genomics. So as Caroline mentioned, I'm both at the New York Genome Center and also at NYU. And just to get to the heart of it, you know, Caroline just read this, this bold prediction. I think in my reading of this, there's really three key elements to this. And just so it says the biological functions of every human gene will be known. And for non-coding elements, things outside of genes in the human genome, such knowledge will be the rule rather than the exception. And this was part of this, the strategic vision paper that was published late last year, strategic vision for improving human health at the forefront of genomics. And as I said, I think there are three things here that really form the core of this prediction. One is every human gene, so that's very bold, and it's like very complete in the number of genes, and non-coding elements. So not just every gene, but other things in the genome, knowledge will be the rule rather than the exception. And if you look at where within the larger paper, I think you can see from the title of this perspective that this is not just about basic science, that's an important part of it to understand how the human genome works, but it's within the context of improving human health. So to me, these are kind of the three key points. And I'm going to try and explain how some of our research might inform this and think about how these three points are going to look in 2030. That's the goal here over the next 20, 25 minutes. So 2030, we're going to be at this amazing state where we can know the effect of any sequence variant. But before rushing into the future, I think we have to think where are we now, and what's different about where we are now than where human genetics and genomics has been. And I think one of the clearest differences, and certainly a clear difference with where I sit and what I work on, has been the recent advances in genome editing technologies. And due to these recent advances, we can now perturb all genes in the human genome, and at least some large proportion of non-coding elements. And after changing human genomes or genes and non-coding elements, we can measure different phenotypes that are related to biological traits and diseases, important diseases like cancer, neurodevelopmental disorders. And that's really been enabled by what happened over the past decade. The 2010s were really a transformative decade for human genomics and very specifically genome engineering. And I don't think I really need to introduce CRISPR to this crowd, because I'm sure virtually everybody has heard of CRISPR at this point. It really has changed what we're able to do with DNA and living cells. And it started to permeate out into other areas like medicine and diagnostics. I think just to get us all on the same page, I think the most succinct explanation I've seen really of what CRISPR is and its impact is this quote from Hank really at Stanford, said the Model T wasn't the first car, but it changed the way that we drive work and live. And similarly, CRISPR has made a difficult process cheap and reliable. And I think the reason because of that has to do with programmability. So if you look at CRISPR versus the kinds of gene editing tools that came before it, like zinc finger nucleases or tail nucleases, these are also programmable genome editing tools. The specifics of how they work here are not the details don't really matter so much for this talk. But the key difference is that every time you want to go to a different place in the human genome or any genome, you have to create a new protein. And that's challenging. That's difficult to scale up. And the quantum leap forward of CRISPR-Cas9 is that the protein component is generic, it's reusable. That's this Cas9, this yellow blob, but it's programmable. It can be programmed by a short piece of RNA, and that's very easy for us to make. So the RNA specifies where it goes. And I've used this word a few times, programmability. Why is programmability so important in genomics? Well, we can look at an example for something else from information technology from computers. If we look back 50, 60 years, the only way to really get at a lot of information was to walk into a place like this, a library, and access some books. This is kind of the central place to get information. And of course, today, we're all walking around now with these supercomputers in our pocket. And we can access really almost any information we have these encyclopedias in our pockets too. And anything that we can't access, we can program, we can create some sort of new application and find a way to get that also into our pocket. And I think there's also been tremendous developments in the programmability of genetics and genomics. And that stretches back a few decades. But we're really, even though there's been great advances, things like RNA vaccines, genome browsers, we're really still at the beginning of programmability with genomes. So why is programmability so important? It's important because the genome, human genome is a big place. It has three billion bases. And amongst those three billion bases, a small fraction of them contain 20,000 genes that encode proteins. And so if you think about some of the problems of precision medicine, how do we efficiently identify which regions of the genome have a particular function or drive a particular disease or make a particular disease worse or make it better? And the tools that we've started to develop in the last decade have been taking advantage of the programmability of these CRISPR systems to send them all around the human genome. So as I mentioned, this can be programmed by this single guide RNA, this small piece of RNA. And we have existing synthesis technologies like oligonucleotide array synthesizers that can print out thousands up to millions of these guide RNAs to guide CRISPR Cas9 to different regions in the genome. And if that guide, guides Cas9, say to a protein coding gene, an exon or protein coding gene, it cutting at that location can result in gene knockout. But if that guide happens to guide Cas9 to a non-coding part of the genome, that cut can result in a small indel mutation that mutagenizes those bases. And let's us learn something about the function, perhaps, of that non-coding region. So now about seven or eight years ago, when I was a postdoc in Feng Zhang's group, along with Ophir Shalem, who's now at U-PEN, we started to print out these CRISPR libraries targeting CRISPR to all protein coding genes in the genome. We call this genome scale CRISPR knockout or GECO screens. And the key idea is just that target Cas9 to different genes to create loss of function mutations and then take that large pool of mutations or human cells with different mutations and let something about the disease that we're interested in select out what particular mutations may survive and are selected from that pool. And that kind of phenotypic selection lets us ask big questions with many, many genes and then zero in on the right hypotheses. And so that's kind of an abstract thing to say. I'm talking about 20,000 genes, there's three billion letters. So I want to try and make this a little bit more concrete with a high level example. So imagine instead of A's, T's, C's and G's, imagine the human genome, every gene is a word in a very long book. So when on board the HMS Beagle, I was struck with certain facts in the distribution of the peoples of South America, this sentence, which is the first sentence of on the origin of species, it goes on from an entire paragraph. So I'm just showing you part of the first sentence. So imagine each word here is a different gene. What these programmable scissors let us do is go after a certain gene and pull it out. So now we have a new human genome in this cell that's missing this one gene. And the programmability lets us do it not just for a certain gene, but to do it in parallel for all genes in the genome, and then to apply some sort of selective pressure that narrows this large space of genetic hypotheses down to one or a few hypotheses that are relevant to the disease or the phenotype or the trait that we're interested in understanding the genetic basis of. This is fundamentally how pooled CRISPR screens work. And in the years since there's actually been many, many groups that have applied this kind of technology to many different diseases and traits. And here I'm actually showing you a very biased sample. These are just projects that I've been lucky enough to be a part of. And some of the first work that we did with this technology was looking at drug resistance and cancer. What particular mutations determine whether somebody's cancer will be well treated by a drug or maybe resistant to a targeted therapy or a chemotherapy. And with cancers, can we also try and figure out new therapies? Can we look for specific genetic vulnerabilities of tumor cells that don't exist in normal cells? And these are not just things that we can do with cells in a dish, phenotypes in a dish. We can also look at organismal phenotypes. What are the genes that are important for metastasis for the movement of cells from a primary tumor to spread to different organs in the body, like the long-earth brain? And we can look at multicellular interactions. What genes are important for this immune synapse so that a T cell can kill a tumor cell? How does a tumor cell evade T cells when immunotherapy doesn't work? And all of these so far, I've been talking about our genes, but this technology, these pulled CRISPR screens, this programmable enzyme, it works just as well when we're not targeting genes, but we're targeting non-coding regions of the genome that maybe control or regulate genes. You can think of them as knobs and dials on the expression of genes. How do we find where these knobs and dials are hidden in the non-coding genome? And I'll show you a little bit more about this later on in the talk. But today, I just wanted to address these two points of coding and non-coding areas of the genome and how CRISPR-based functional genomics have been able to get after them in the last few years. And so I'm going to use two recent examples from our work. The first one is about genetically informed drug discovery for COVID-19 using a genome-wide CRISPR screen, the technology I just described to you. And the second part is going to be about taking advantage of a different kind of CRISPR enzyme that doesn't target DNA, but instead targets RNA to start to learn more about non-coding regions of the genome. So the first work, this COVID-19 functional genomics, was really led by, in my lab, a talented postdoc, Jarko Danalowski, and in Bentonuver's lab at Mount Sinai, another really amazing postdoc, Tristan Jordan. So Tristan and Jarko, in what is a very difficult year, really did some amazing science to get at COVID therapies very, very quickly. And so this has affected all of our lives, but there's been a lot of recent cause for optimism with COVID-19. There's now several vaccines that have emergency use authorization, but even with a vaccine, we still need more effective therapies for COVID-19. And so when we set about this project, we really had three goals in mind. The first one is that viruses like SARS-CoV-2, they're not free living organisms, they have to have a host. And so we wanted to know for the virus, what are the key host genes that are required for this virus to go through its life cycle to infect cells and to spread disease? And then could we take those key host genes and translate them to actionable therapeutics? Could we block those genes once we know they're essential? Could we block them with perhaps existing drugs? And for genes where there aren't drugs that already exist, is there something else we can do to use this information to develop new therapies for COVID-19? So those were the three, the three major goals. And as I just described to you this technology, we used a second generation CRISPR knockout library as GECO-V2 and knocked out every gene in the genome in these human lung cells. So a relevant cell type for SARS-CoV-2 infection. And after creating this diverse pool of knockouts where you have all the 20,000 genes in the genome individually knocked out, we went ahead and exposed these cells to this virus, SARS-CoV-2. This is taken from a patient isolated in Washington state, and this is full on replicating virus. And what we want to see is what genes in the human genome, when you take them out using CRISPR, when you edit them out, enable the cells to survive in the face of SARS-CoV-2 virus. What's this yellow gene here? And we are, you know, from the beginning of the pandemic, we knew already some things about this virus. We knew, for instance, that it requires ACE2. That's the entry receptor for SARS-CoV-2 to get into cells. And only when you have ACE2 expression, do you get to see this, the sustaining for the nucleocaps protein of SARS-CoV-2. And so when we do this genome-wide CRISPR screen, we get a list of 20,000 genes, all genes in the genome ranked based on how essential to least essential they are for the virus to get into cells. And so here I'm showing you some of this list, the top, some of the members of the top 50 genes, so the most essential genes. And one thing you can see right off the bat is in those top 50 genes, in fact, rank number eight is ACE2, that receptor that we know the virus needs to enter cells. I'm not going to go over the specific gene hits, but you can start to notice that many of the genes have very similar looking names. And that's because we found many genes that cluster into protein families, suggesting that those particular functional groups, those protein complexes, are really essential for either entry or replication or exit of the virus during its life cycle. And we took some of these top hits, we took amongst the top 200 hits, we selected a group of 30 genes, and we individually validated them, we want to make sure that this pooled screen really worked. And so we went ahead and knocked out these genes using new CRISPR guide RNAs, and then we looked to see, does the virus, is the virus infection reduced by staining for, again, this nucleocapsid protein, the virus. And you can see these top 30 hits, all of these genes, when we ablate them, reduce relative to the control the ability of SARS-CoV-2 to get into cells. And we did other validation using RNA interference approaches, too. But focusing on the goal, we really wanted to get after what of these top essential genes for viral infection? Which ones of them might make good targets for therapeutic discovery? So, Jarco just crossed this list of genes with an existing database of drugable targets where we know that there's already inhibitors that exist, that we can buy, that chemically inhibit these genes, copying the activity of CRISPR, but without doing gene editing. And what we found is of the quite modest number of drugs that we tested, we found that nearly 50% of them show a greater than 100-fold reduction of SARS-CoV-2 in these cells, as measured by QPCR RNA test. And what you might notice is that these are not as effective as Remdesivir, one of the approved drugs for treating SARS-CoV-2, but Remdesivir also has pretty significant toxicity. So, thinking about the model for something like HIV, where we have a cocktail of multiple drugs that target multiple different mechanisms of the virus, here we can use these different compounds that target different essential genes for the virus and perhaps combine them into a similar cocktail where we can lower the dose of any particular drug and also prevent escape mutations from really affecting the performance of a single drug treatment. Okay, but what about for genes where we can't find a drug, where we don't have an inhibitor ready to go off the shelf? So here we used a technology called ExciteSeq, something we developed a few years ago with the New York Genome Center Technology Innovation Group. And what ExciteSeq does is it couples CRISPR screens with transcriptomics, so looking at gene expression within single cells. And so here in this very dense matrix, every column is a different single cell with a particular CRISPR gene knockout, which is labeled here on the top, the control is on the left. And every row is a particular expression of a particular gene. And you can see that there are these kind of chunks that you see a differential gene expression. To make sense of this, we actually used a pathway-based analysis to collapse this, to look at pathways instead of individual genes. And you can see for these six genes, which we didn't really associate together before, they all result in an up-regulation of cholesterol biosynthesis. Different aspects of the cholesterol biosynthesis pathway are up-regulated after we knock out these genes, according to this transcriptomic assay. So we went ahead and validated that. We went ahead and knocked out these genes. We just measured cholesterol. Indeed, they all increase cholesterol after the knockout. So we thought, is there something we can use, a drug that might phenocopy this, that also increases cholesterol? And of course, there are many cholesterol-altering drugs that are commonly used for heart disease and hypertension. And so amlodipine is a drug that's been approved for 30 years by the FDA, commonly used, taken over long periods. It's an antihypertensive that reduces circulating cholesterol but increases intracellular cholesterol. So we put this drug on our human lung cells, and you can see it increases cholesterol, and it also decreases SARS-CoV-2 viral entry. Whether you test by QPCR or plaque assay or RNA sequencing, you can see an orders of magnitude decrease in the virus' ability to infect, showing that finding this common mechanism, even though we don't have an inhibitor for one of these genes in particular, we can use the mechanism to guide us to another drug tart. And so these are complementary approaches that we've been led to by doing this genome-wide CRISPR screen. One thing is we can directly take the hits from the screen and we can inhibit the top-ranked genes. And the other thing is we can do this kind of multidimensional phenotyping that's enabled by ExciteSeq, where we look at different omics readouts to look for common mechanisms and common cell state alterations and then target those mechanistic bases. So, you know, that's a lot about the coding genome. So the other part of this bold prediction is about non-coding elements in the human genome. And I think this is just much more of a frontier. Here, you know, we've been over the last few years using DNA targeting CRISPRs to do things like look at cis-regulatory elements that are adjacent to genes of interest. So doing saturation mutagenesis across hundreds of KB nearby genes that we already know are involved in a certain disease phenotype, like cancer drug resistance. Are there knobs and dials nearby these that can modulate their activity? And this has been very helpful, but there have been some limitations. For example, when looking at non-coding RNAs, it's unclear whether a single mutation might actually inhibit a non-coding RNA. This is some work we did a few years ago in collaboration with Julia and Feng. And so here you have a non-coding RNA that lies in close proximity to a protein-coded gene. And it's not clear that an individual mutation can really disrupt a non-coding RNA, nor if you do something like CRISPR inhibition or activation, can it really shut down this non-coding RNA without affecting a nearby protein-coded gene? So the question we wanted to ask is, can we instead target non-coding RNAs directly? Can we do this by targeting at the RNA level, not the DNA level? And so for this, two postdocs in our lab, Harman-Alex, turned to a different CRISPR enzyme, CAST-13, to do something similar is what I just showed you, but at the RNA level. And I won't get into the details of CAST-13, but just know the difference between it and CAST-9 is it's a programmable device that targets RNA. And the setup here is very similar to what I just showed you. So here we have CAST-13 expressing human cells, and we can try to learn the rules of CAST-13 guide design by tiling genes that are easy for us to read out, like GFP, and look for which particular CAST-13 guide RNAs result in efficient knockdown of transcripts like GFP and endogenous cell surface proteins. With how getting too much into the weeds, I'll say that we found some critical regions where even a one-base mismatch in the guide can alter the ability of CAST-13 to knock down its target. We also were able to make a model to predict what are better guides and worse guides amongst guides that perfectly match their targets. And this is important, as we showed, because with these better guides, these predicted good guides, you can see when you target essential genes that cause depletion, drop out of cells from the library. But if you use guides that are low scoring with this kind of classification, you don't see the same phenotypic effect. So it's important to know the rules of this CAST-13 design. And this is important not just for our intended use, which is to map things like non-coding RNAs, but also for many other applications of CAST-13, virus detection, A to I editing RNA, epigenetics, alternative polyadenylation, and splicing. All of these depend crucially on knowing what makes for an efficient CAST-13 guide. So this is how we're trying to develop this newer kind of CRISPR tool. Okay. So that's a taste of high throughput coding and non-coding genomics. What might functional genomics in 2030 look like? Well, in 2030, we want to know the effect of any sequence variant. So here are three bolder predictions for 2030 that I had. One is just that the impact of any variant on any phenotype. What does that look like? To me, in my mind, this looks like a large atlas, a multidimensional atlas connecting genes with phenotypes, non-coding elements with genes, and not just looking at proximal phenotypes, but also, or not just looking at disease phenotypes, but also proximal phenotypes like transcriptomics and proteomics. And no matter how large these atlases are, will they ever really be complete? What's every phenotype? What's every non-coding element? So I think we also need to develop predictive models. And it could be something simple like matrix multiplication to go from non-coding elements to phenotypes, or it could be something more complicated like deep learning approaches. And this is, I think, something NHGRI is already moving in this direction with things like the new IGVF consortium where they're looking at impact of genomic variation on function. Another thing that I think we need is a multi-scale understanding of variant effects. So we can't just look at how does a mutation affect a disease. We need to know how does it affect very proximal phenotypes of transcription and protein production, cellular phenotypes in a single cell, multicellular and organoid phenotypes, phenotypes across an organism, and then across an organism maybe over different time periods, different environments, and of course population scales. So I think there's people working at all these different levels, but one really, I think tough question is how do you integrate this information? How do you bring it together? And the final bolder prediction that I wanted to offer is how do we take this variant function map and actually make it useful for human health and disease? And this is something that's tricky. You might remember from a decade ago, a lot of folks questioned, what's the value of the human genome? It really hasn't changed medicine. We don't see the doctor in a very different way. And I think the promise of the human genome is really arriving now. It's something that just has taken some time and is not an overnight transformation. But I think with functional genomics, we're already doing it. It's something where, for instance, from our own work, we can see screens where not only we identify a region of the genome that's important, but we also get hints at what a therapy might look like for something like sickle cell disease or beta thalassemia. And this is some work we did a few years ago with Dan Bauer and Stu Orkin, where we looked at non-coding regions that can turn on fetal hemoglobin. And fetal hemoglobin can be a therapy for diseases like sickle cell and anthalassemia. And so this is a picture of Victoria Gray who was one of the first people to get a therapy, a CRISPR-based therapy that came out of this library that I designed several years ago. And you can see that her fetal hemoglobin, after this genome editing procedure, is much higher. So we're starting to see not just the development of therapies based on CRISPR screens, but the CRISPR screen itself can get you to the therapeutic target. And you don't have to take my advice or my opinion on this. Also, folks like Francis Collins have said the progress that we've seen for sickle cell disease, including Victoria Gray, has made it clear that we need to get busy and figure out how to take this really to the next level. And I just wanted to add, since one theme of this has really been about programmability and what that gives us for making these kinds of large scale atlases, that if you look back at what happened with the personal computer revolution and the people who, like Alan Kay, who pioneered PCs and graphical user interfaces and programming, he made this kind of very interesting statement in 1971, the best way to predict the future is to invent it. So really smart people with reasonable funding can do just about anything that doesn't violate too many of Newton's laws. And I think that's really it. This is the challenge for us with reasonable funding that we really need to invent it. That's really the way forward. So with that, I just want to thank the folks in my lab and our collaborators who did some of the work in the CRISPR screens that I showed you today. And of my mentors, I just want to particularly single out Mike Payson, who's at NHGRI, who's not only been a great mentor to me now in my current position, but even starting from the very earliest stages of my postdoc has just been a fantastic advisor and shaper of ideas. So thank you so much. And I think there's not going to be a question and answer right now, but it'll be later. So I think the next talk is just going to be Nancy. Yeah, so if we can put the spotlight on Nancy and she's pulling up her slides. And again, reminding people to put questions as you'd like in the Q&A box. And we will go from there. So Nancy, over to you. Great. Thank you all. And thank Neville especially. That was just a perfect introduction for some of the things that I want to talk about. And I'm very excited to really think about these bold predictions and think about the future. And what I'm going to try to sell you on is that we're doing high throughput human phenome screens now. And we need to think of them that way as really high throughput phenome screens. So when we consider this bold prediction, the biological function of every gene will be known for non-coding elements in the human genome. Such knowledge will be the rule that rather than the exception, that is indeed a bold prediction. I think for the second half, it's particularly bold because I think we're still in the process of discovering all of the RNA-based world. And so thinking that it will be the rule rather than the exception, actually understand how that works, is very bold. But I also think because transcriptomics and now the ability to do transcriptomics in single cell and in a very systematized way with knockouts, actually I totally buy that that will be true even though it is an extremely bold prediction to make right now. But I think the other part that's very bold is the biological function with the S in the brackets of every human gene will be known. Function comes at many levels as Neville noted. And I think getting to all levels of function is also makes this a very bold prediction. And we really need to strive for that at all level of endofenotype, those intermediate phenotypes that we think of as omics, but a good understanding of how the omics can help us then predict the human phenotypic consequences and having the data at scale to actually see those human phenotypic consequences. All of that combined to make this one of those peeling back the onions, the layers in the onion kind of prediction. There's a lot of biology in this one sentence. And Neville talked about high throughput, high technology, CRISPR systematized screens of molecular function that I think will absolutely be critical to those early layers, all of those endofenotypes. We need to understand what every functional element is doing first at those levels. But I think the CRISPR systematized screens will be supplemented by natural variation. He's cars, I've got flowers. That's the difference between the technology and using natural variation. And, you know, of course, I'm seeing this in the context of biobanks, where we use the electronic health record as a way to get human phenome. And we have genome variation and can measure or impute a lot of the and more and more of these omics that get done at scale with in enough samples that we can actually impute them into biobanks, at least for the genetically determined part of how those things work. And this concept of biobanks being a high throughput human phenome screen really comes from the longitudinal nature of data in the EHR, which now is accumulating in more and more places. And the fact that genetics is coming to medicine way faster than anybody is really ready for, which actually was part of the bold predictions in the last strategic plan. And now instead of biobanks, you really have to be thinking about every hospital becoming effectively a biobank because of the existence of electronic health record data and that over the next 10 years, genetics will be part of medicine, just part and parcel of medicine. And to the extent that we can begin to see these data federate queries across this vast volume of human data, we can really think about high throughput phenome screens, human phenome screens in a completely different way to just give you some foretaste of what I think is coming down the pike. I'm going to use examples from BioView, the Biobank at Vanderbilt University with about 285,000 DNA samples. We have on average 10 to 15 years of electronic health records, but we have electronic health records going back to 30 years on some patients. And we have imputed transcriptomes, our imputing metabolomes, and expect to be imputing various kinds of proteomic measurements and additional omics, our efficiency, et cetera, as they become available. And we have that in, as I say, 285,000, but another 2.8 million with just the phenomic data, just the EHR data. And I think we're also just beginning to scratch the surface of how we can iterate from genetic discoveries to their implications in the large-scale phenomics or to observations in the large-scale phenomics, connecting those to the genomics and the other omics through samples with some level of interrogation at the level of the genome, more and more at the level of the transcriptome, et cetera. So this virtuous cycle of something that we will learn better and better how to exploit to get at that the multi-layered meaning of function with respect to human health and disease. So we do this now at the transcriptomic level. GTEX was one of the early investments in getting to critical endophenotypes between the clinical traits and diseases that we want to understand, and omics biology that allows us to understand these intermediate key traits. So we use GTEX to build out predictive models and then apply those predictive models in the biobank genetic data. So even though we haven't measured transcriptomics on people, we can impute the genetically determined expression of these genes. And that's a kind of Mendelian randomization experiment. It certainly facilitates Mendelian randomization. And our latest models, recent publication in Nature Genetics, formalizes that with a Mendelian randomization direct test. But this allows us to create a kind of a gene by medical phenome catalog that allows us to see for each gene what medical phenome is associated to the natural expression of that gene. And we have a query, and we have a database summarizing all this, and it is publicly available. But the constant barrage of attempted hacks of hospitals is causing us a lot of difficulty. They keep trying to put this behind a firewall. So we're migrating everything to a public cloud of the results database for this so that everybody will continue to be able to use it unhampered by the hacks on hospital systems. So apologies if you've been trying to get at it recently. Recently you've had to be behind the VU firewall in order to access it. But we should have everything moved shortly. So anybody can query this catalog and see what amounts to a gene-based phenome-wide association study for the human genes that we get high quality prediction performance on. So the question we're asking here is, what does this gene do? Or more precisely, what is the natural variation in the expression of this gene associated with across the medical phenome? And so for the purposes of this talk, I looked through the whole set of our latest results in about 70,000 samples of recent European ancestries for interesting and significant results for a gene that we don't really know the function of. And this gene is a particularly intriguing example. So ANAPC4 is part of the anaphase promoting complex subunit 4. But in all of the descriptions of the gene, it is explicitly noted that nobody's really sure what this gene does within this complex, which is, as it says, promoting the transition from metaphase to anaphase. And what we see is that too much of this gene, and we have really high quality prediction performance. So nearly half of the measured variation in gene expression can be predicted from cis SNPs. And this gene is expressed in all 49 G-tex tissues. And so basically, we see this association at essentially the same level of significance across all 49 G-tex tissues. Not surprising that something involved in the transition from metaphase to anaphase would have to be expressed everywhere. And it's associated with the abnormal pap smears. So you see the top two associations here. But then just down the list, you see cervical cancer and dysplasia. So the failure to, perhaps it may be rushing through this metaphase to anaphase transition if you have much more of this enzyme leading to what's clearly detected as abnormal cells. Now the sample size for these is similar to what we see for cervical cancer and dysplasia. And the fact that this is so much more significant than the cervical cancer and dysplasia suggests to me that the biology is really here, that some subset of those then go on to develop the cervical cancer and dysplasia. And so part of what gets you to this stage is this biology driven by apparently unusually high expression of this gene. But what's really intriguing is to see that reduced expression is associated with chromosomal anomalies and genetic disorders. And these include aneuploidies. And in particular, this association to perinatal conditions of fetus or newborn is related to fetus's newborns with chromosomal anomalies. So that's another intriguing opportunity to think about the biology, the biological function of this gene. Not having enough of it seems to be associated, so possibly also acting in meiosis associated with chromosomal anomalies and genetic disorders. But it's just a great example of an opportunity to take what we can learn through high throughput human phenome screams and go back now to think about to use that to understand better the functionality of this gene. But I also want to show you how you can enrich your understanding of function just in follow-up. So I'm just going to show you the other phenome associated to the two top signals for alcoholic liver disease. So one of them is PNPLA3, which is a well-known gene for liver disease, non-alcoholic fatty liver disease, but also certainly precipitates, helps to precipitate alcoholic liver disease. And you see all of the consequences of the liver damage. So this gene really seems to be about the tendency of the liver to undergo damage in the presence of precipitating factors such as alcoholism, hypercholesterolemia, fatty liver disease, any and all of those conditions. And this gene puts you at risk for that liver degradation. Whereas the second top association with alcoholic liver damage, you see the strong association to alcoholic liver damage, but you see alcoholism, tobacco use disorder, alcohol related disorders, mood disorders, depression, persistent mental disorders due to other conditions. This may be more about the reward system biology that gets you to liver damage, although again, it's a totally testable hypothesis. So examples of how high throughput human phenome screens can help us delineate finer functions of genes. And I want to go through a quick story to illustrate that biobanks and the kinds of data that will be available through hospitals can really help our understanding even of more Mendelian type conditions. So the story has been published and it's the intro to just one take off on it. But the idea was colleagues from the zebrafish core came to us with a couple of genes where they knew sort of the proximal mechanism that genes move fibroler collagen out of cells. They had cloned the rick one gene out of a zebrafish model for craniofacial abnormalities. And they just wanted to know what other phenomes should they look for in the zebrafish? And we saw a number of neurological and so this is the publication in Nature Medicine last January. We saw a number of neurological and neuropsychiatric conditions, ADHD, convulsions and epilepsy associated with the brain. And they were able to show in the zebrafish a really pronounced decline in the quality of brain extracellular matrix. And fibroler collagen are really important in strengthening and crosslinking extracellular matrix. And you can see much less in the way of connectivity in the brains. So they were able to take information that we gave them on the human phenome and then find correlates of that phenome in zebrafish. We saw gait disorders, amblyopia and strabismus. They saw consequences for muscle detachment that could contribute to those phenotypes, for example. And then colleagues in Saudi Arabia who reported a recessive Mendelian disease in a large consanguineus family due to mutations at the same rick one gene. We actually asked them if they'd be willing to do a guided reevaluation of the patients now knowing the phenomes that we had seen associated in the Biobank to just reduced predicted expression of the gene, not to mutations in this gene, just to the reduced genetically predicted expression of the gene in tens of thousands of subjects. And so these were the original things reported. These are all the things new that they found by reevaluating the patients based on the phenome that we had seen in zebrafish and in the humans in the Biobank. So we've now gone ahead and used this concept of phenome risk scores to ask another kind of question. So phenome risk scores were the brainchild of Lisa Basterash. Working with Josh Denney was he was still at Vanderbilt. Now you guys at NIH have co-opted him. But the idea is very similar to a genetic risk score. You basically sum the components of a phenotype defined from OMIM, a mapping of OMIM to OMOP right into the fecodes with some weighting that reflects how common or rare that phenotype is. But it's a really nice way to encapsulate a quantitative similarity of the patient's observed phenome to what's described from Mendelian condition and applied in Biobanks. You can see a strong relationship of having high scores with having the disease except with phenyl ketone area where we screen at birth for that and put people at risk on special diets so that they don't have these phenotypes. But what we wanted to know is how many other genes have this property that we saw with Rick1 where the predicted expression of the gene would be associated to so many of the cardinal features of the Mendelian condition. And so my postdoc, Tyne Miller, has taken that on and made a systematic study of this. So we're using just the best model sort of approach in modeling the predicted expression of genes from the original predict scan. Hongyu Zhao's utmost and our joint tissue amputation approach recently published in Nature Genetics. We take the best one because these make somewhat different assumptions about the truth of the model for how this variation predicts gene expression. And so we just let the best athlete model win because it'll be different. The biology is different for different genes. We screened 2,300 unique genes but that's 3,100 disease gene pairs because some Mendelian genes are associated to more than one Mendelian disease. We tested just over 66,000 subjects of Europeans ancestries and about 14,000 subjects of African ancestries. It's separately with a simple model that had the predicted expression, principal components for genetic ancestries, age and sex and then a covariate that helps us control for the number of visits. And basically, you know, we see a substantial number of genes. So about almost 10% of Mendelian genes showing association, significant associations to their, to the VRS that describes the Mendelian disease. And that was really robust, so not, not dominant versus recessive. It was robust to how many, whether there were cases or not, you could control for music covariate on affection status with the Mendelian disease. Results were virtually unchanged. It, we're still learning about what kinds of genes this may work for. You see, there's not really, it's not really enriched for loss of function intolerant genes in the Europeans, but the pattern's a little different in the subjects of African ancestries. And of course, that's a much longer lineage and some of the Mendelian genes there really have been subject to recent natural selection. So I, there may be some, some interesting differences here that we're still going to be following up. But about 10% of unique Mendelian genes are significantly associated to the phenome risk scores for those Mendelian diseases. That's robust to conditioning on affection status. And frankly, OMIM based phenome risk scores definitely underperform EHR based phenome risk scores based on our, our studies. And, and so the, the opportunity to develop even more refined scores may increase the number of such genes. Certainly we can use this to help find modi, modify our loci from Mendelian diseases. But I want to draw attention to the fact that it really reinforces the value of developing drugs from Mendelian conditions, because it is not just people with Mendelian diseases that would benefit from drugs targeting these very important genes in, in human health and disease. And, and as I say, 10% is the, is really the very minimum, given the power issues for, for these and the fact that I think we'll have much better metrics with any EHR based phenome risk score. And that brings me to the final bits of speculation. Because I think we will make so much progress on this part. I think as Neville documented with really high precision, we are making this kind of progress for non-coding elements. And because of that, there will be substantial progress in drug-based modulation of the functions of every human gene. And I, this is probably, I'm probably impinging on other important bold predictions, but I think it follows very logically from this sentence that, that we will make substantial progress in drug-based modulation of gene function that will completely change how we think about medicine and the opportunities for therapies. And with that, I will thank our funders and the, the group in my lab that's, that's worked on these studies and happy to take questions. And wonderful. Thank you so much, Nancy. And Neville, can you come back on? Wonderful. And so again, folks could be putting their questions into the Q&A box in Zoom. And in just a few minutes, I'll be turning it over to my colleague, Chris Gunter, who has joined us on screen and really is the person who came up with the idea of the seminar series to start with who will be walking us through some of that Q&A. But before we get into that, I just wanted to have a little bit of a discussion with the two of you kind of at a higher level in terms of what, you know, what this prediction is really getting us to think about. And so, you know, one of the things that I, oh, sorry, I just got a weird notice on my computer. Sorry about that. One of the things that, you know, I really think about is, you know, understanding, increasing our understanding of the biological function of genes and non-coding elements is sort of interesting in and of itself for a genomics perspective and what, you know, sort of aspirational. But it's also to think about what would that enable? You know, as this prediction comes through, true, what will we be able to do in 10 years in 2030 that we're not doing now? And sort of when you guys think about that, what is it that you're most sort of excited about to think of, okay, this prediction comes through, it's true, it's 2030, this is what we're going to be able to do that we're not able to do now. So I don't know, Neville, if you want to start with some thoughts on that. Sure, sure. I think that's great because that focuses again on really like what, what is this isn't good for, right? You know, is this just for the sake of knowledge, which is an important thing? Or is there something beyond that? And I think that's what's so fun about this field about entering into this field is that, you know, there's, I think both aspects are just so incredibly equally important. You know, with the example that I gave about sickle cell anemia, which I think is a really nice example of how non-coding genomics is having an impact already on human health, on coding functional genomics. But if you think about that, that's kind of an interesting example. Even though the regulation of fetal hemoglobin is this very complex thing, we're talking about a transcription factor regulating another transcription factor regulating fetal hemoglobin is the full story. But if you think about a beta-globin, it's the first genetic disease. We've known about the basis of this genetic disease since, I don't know, the late 40s, I think, early 50s. It's really contemporaneous with Watson and Crick. And so in some sense, it's like, oh, wow, finally, you know, this, we have something here. So I think what's going to be more provocative or more, that's not the right word, surprising about the functional genomics of 2030 is not about these kinds of diseases where we know it's a genetic disease or like this is a disease of the genome, here's the mutation, you know, let's figure out how to fix it, that kind of thing. I think what's going to be different about 2030 is the unsolved mysteries. So the folks who come in, who meet with these clinicians, these fantastic clinical geneticists and other folks, and that, you know, it's really just, you have some constellation of symptoms, it looks like a genetic disease, but really, you know, hard to pinpoint what exactly is going on. And I think, you know, this kind of atlas having it is going to, you know, it's just going to inform decision making across the medical spectrum. And so for, you know, for people of sickle cell disease, we know exactly what's going on. But I don't think we really have a good conception of what other kinds of genetic diseases are there. And I saw there's one of the questions is about like, in the, in the chat or in the Q&A is about, you know, can you do combinatorial CRISPR screens and things. And I think all of that is going to make for richer atlases where maybe, you know, just like, synthetic lethality or two hits in cancer, right, where one, one mutation doesn't do much, but when you can see both mutations, then it gives you some real strong clinical phenotype. And so I guess just to put it in a compact way, I think it's the unsolved mysteries, not the, you know, silver bullet beta globin kind of mutations, but really the unsolved mysteries that we're going to are going to blow our mind in 2030. Nancy. So I'll elaborate first on, on some of the things that Neville mentioned, because I, I think the, the sickle cell story and fetal hemoglobin is a great example of how more knowledge gives us more insights into different possibilities for treating things than we would ever have imagined. You don't have to fix the gene that's broken. You can maybe fix things that are downstream or upstream of that, manipulate them in a way that evades the consequences of, of what you're concerned about. So then more biology we learn, the more opportunities we have for those kinds of alternative therapies. But I also think I would never think of, would never have thought of chromosomal aneuploid, aneuploidies, for example, as, as even a complex trait, it's, it seemed, you know, I think of it as one of those accidents of nature. But, but no, we know there's, we've always known there's some level of heritability to that. We know a little bit about that biology. But just imagine, by 2030, if we're able to recognize those people most at risk of pregnancies with chromosomal aneuploidies and have a simple therapy that, or, you know, multiple appropriate therapies that much reduce that risk, it's so much, it's so much easier to imagine that level of prevention than, than, than some of the other kinds of things. So I think we'll think differently even about the parts of biology that, that we can be preventive in. But I'd also say, on the, the sort of unknown knows that the, I think we will, the process of doing this will teach us a lot more basic biology that we didn't know that we didn't know, didn't know was there to learn. And some of that will be in the RNA space, some of that will be in the RNA to protein space. And some of it will be in the management of proteins, which still are a bit mysterious when it comes right down to it, the, the regulation of protein at the level of protein. And so I'm, I'm very excited about the next 10 years because of what we will learn about very basic biology that will then further change how we do the next strategic plan. Yeah. And Chris and I are super excited for the next strategic plan. No, I'm just joking we are, we actually are, but it's, it's always good to start by, by coming through with this one. So I mean, I, I, one of the things we spent a fair amount of time thinking about when crafting this particular prediction and Nancy, you, you talked about this a little bit at the beginning of your, your talk is that S in biological functions. In other words, you know, what does it really mean or what are we saying when we say the function or functions of genes or non-colonial coding elements would be, would be known. So I guess my question to both of you is what would your response be to someone who asked, well, what is meant by non biological function and how do you sort of see that answer changing over the next decade as we hopefully make progress on this. So Neville, you went first last time. So maybe Nancy, I'll, I'll put it to you. And again, I, I do think you touched on this a little bit in your top, but maybe to extend a little bit more. Yeah. So I think understanding the function at the level, for example, of the transcriptome. So if we lose this gene, what are the compensatory changes in other gene expressions? And why, you know, learning that whole piece of it, when this gene is, has increased expression, what are the downstream consequences of that in sort of the whole cascade of biology, the transcriptome, the proteome, anything that we can learn about the metabolome, that's part of the S in functions. But I also think we're learning about the develop, about how the developmental ontologies of different cells play out in really interesting ways. And I think that the fact that microglial cells in the brain are so important in Alzheimer's are macrophages. And, and there are lots of other macrophages. And in fact, one of the known Alzheimer's genes is also a gene for Nassu-Hakola disease, which is involves both early onset bone fractures, but, but also bone osteo necrosis, bone infections. But later, shortly after that, a dementia is telling us something important about the shared role of osteoclasts, which are macrophages and microglia. And when you monkey around with the biology of that whole cell lineage, there are pleiotropic consequences. And, and as we learn more about those pleiotropic consequences, it's going to open up a whole new world of thinking about, you know, potential earlier onset phenotype that become markers for better drug development, the context of something like Alzheimer's. It may be too late to, to start treating by the time people have symptoms of Alzheimer's. But, but if there's earlier onset pleiotropic phenome attributable to some of the same genetic variation, we could be in much better space for, for knowing who's at most risk and for doing better designed clinical trials. Neville, thoughts on your end with that question? Yeah, I think the S is very perceptive. But, and I think it's actually kind of counterintuitive because I think today we're still in a stage or a state where we say, oh, that's a cancer gene, you know, P53, that's a cancer gene, or, oh, that's an autism gene, or that's a schizophrenia gene, right? This is just, you know, whatever is the first entry on OMIM or something is kind of like the lingo for it. And I think, I mean, the writing is already on the wall that it's going to look, I still like this kind of idea of like this large, you know, huge matrix of genes by, by, by functions. And I don't even know what's in the cells. Is it going to be a single number? Is it going to be another matrix as each cell or something? But, you know, I think that kind of representation, that kind of data structure, even just as kind of a shorthand is very useful because it doesn't say, you know, schizophrenia gene or osteoporosis gene. It says this gene has these impacts on these phenotypes and that list can be literally endless. And that's, that's where I think a lot of the prediction comes in, right? Because we can't measure everything, everything you could possibly do to a genome for every phenotype at every level of analysis. But there will be a state and I'm just very impressed with the field and the speed. I think there'll be a place where, you know, you can start to reasonably impute or predict what most of that matrix looks like. And that'll just be, you know, it's like AI now versus AI, I guess 10 or 15 years ago where it was just, oh, voice recognition, that's all it does, right? Or something. And now it's like, oh my gosh, we can't trust videos anymore. Everything could be a deep fake, you know. So it's, I think it just sneaks up on you and I can't wait till functional genomics sneaks up on us. So Chris, I would say that's the asset test of when we know that we've actually learned quite a bit when we can make those kinds of more accurate predictions from the basic data that get generated. So Chris, I'm going to turn it over to you to bring in a couple of the audience questions. I may try to sneak in one or two more questions at the end, but I do want to make sure we get some of the audience questions in. So go ahead. Yeah, sounds great. Thank you so much again to Neville and Nancy and of course to Carolyn for moderating and setting us up here. So you each got a number of very specific questions on your talk. So I'm going to try and maybe combine a few of them and get you to answer a combined answer when they're related. So Neville, I'll start with you very early on. You got the question about, as you mentioned, what about testing combinations of genes in your first CRISPR screen, not the later CAST 13, but would you be able to see combinations of screen genes, excuse me? And then also you were asked, did you think about the background effects? So the effects that the gene that you knocked out would have on cell fitness when you were taking into account what you saw in your results? Yeah, those are two great questions. So the first one about combinations, I think this is a really active area both of kind of on the technology development side and the functional genomic side. So you're seeing a lot of that actually where people are looking at two genes and even higher order combinations. And the great thing is log is CRISPR enzymes, they're kind of naturally very multiplexible because remember these, I didn't mention it, but folks might know that these are stolen bacterial immune systems. That's what CRISPR originally is. And the bacteria's immune system doesn't have to just remember one phage or one previous infection. It tries to remember several of them. So because of that, actually the programmability of these enzymes is naturally, whether it's CAST 13 or CAST 9, naturally has is easy to target multiple genes. The problem of this kind of screen is that already these genome scale screens are kind of stretching in terms of the physical implementation for at least one person stretching to a lot of cells, human cells to be cultured. And if you imagine doing every 20,000 by 20,000 genes, you just have this combinatorial explosion. So one thing that I think is in exciting areas to think, how do we constrain the space of these genetic hypotheses appropriately? What other datasets can we use? Like some of the ones that Nancy mentioned, to more intelligently design these instead of saying, I'm just going genome scale or every gene, right? Is there ways that we can integrate previous genomics knowledge into this? The other question I think is something that actually is pretty well covered in, you know, people are very aware of it. So the idea of essential genes, you know, what is the interaction of essential genes would say resistance to COVID genes, right? Or host genes for COVID. And, you know, we do measure this very carefully in our screens. After we introduce the CRISPR constructs, we actually kind of take a snapshot of the cells at an early time point right after we introduce this in. And there, what you'd expect to see is all of the different CRISPRs are pretty equally represented, even if they target essential genes, because, you know, you can't even if you knock out a gene at the DNA level, because of the central dogma, it takes a while to go to RNA to protein. And so we can measure this depletion over time. And so actually, for a lot of our non coding screens, we're really interested in seeing that because, you know, is really there's, you know, there's people who have very provocative views as any of the non coding genome even essential. And, you know, certainly our data indicates the answer is yes. And so that's a very important measurement to make. And it also lets you separate it is important to take into account when you're looking at cells surviving a virus, right? Maybe they didn't die because the virus infection and they died because, you know, you put you got rid of RNA polymerase. Well, that's that's essential. Some something like that. So hopefully, that covers that. What that person was like, or they'll let us know if not. Thank you so much. And then Nancy, I'm going to try and combine it what I think are two related questions for you as well. So you first were asked, how do you envision we address racial health disparities through your high throughput phenome screens? And then an anonymous attendee asked a longer version, but boiling it down, they asked, what would be the best way to do this without falling into scientific colonialism and helicopter research, sampling into other places and countries and then bringing those samples and per sequencing in the US. So, so I wanted the things that that I'm particularly passionate about now is is really looking at the way we use lab values now. And the health, there's we are institutionalizing some health disparities in the current way that we use lab values. People are sometimes astounded to learn that two thirds of lab values are heritable significantly heritable and half of those have significant differences in means and or variances across EHR defined race ethnicity categories. And sometimes people have known about those differences and tried in various ways to accommodate differences, but most of the time those differences have been ignored. And, and what what that's done is to create real health disparities in repeat testing that falls disproportionately on minority populations, but also downstream over and under utilization of the medical healthcare system. And this is something that we we need to address yesterday. We can't have institutionalized health disparities. I think we need to tackle this head on and and really get some insight into first the scope of the problem, because as I say many physicians are shocked to learn that tests that they think of as being dynamic measures of disease and disease risk and progression are heritable and not trivially heritable 60% heritable 80% heritable. It makes them think differently about how about what all this means. And so I do think at this stage we have some really unique opportunities to address health disparities in more comprehensive ways and first undo damage that's already been done. But but then go forward with eyes wide open in not creating not having genetics create more health disparities. And on that, and I think one of the current challenges I see is that there's a lot of tendency to to try to protect especially rare variant data from us minority populations because of the concern that there might be some greater risk in reidentifiability because they are minority populations. And yet we're already creating health disparities in a sense for understanding rare variation in minority populations because of a lack of data. And if we don't get more data, we're going to perpetuate those health disparities further African Americans and women of Asian descent are much more likely to go home with only the US's after a breast cancer genome screen. And and that needs to change. But if we can't get the data, if we can't get the information on rare variants, it can't change. So so we need to find an equitable way forward to make sure that doing genetics doesn't create more entrenched health disparities. And the helicopter thing, I think everybody's is much more sensitized to that. But clearly investing in and partnering with people in other places is is a good way forward. And I think one that NIH embraces and and certainly many universities are trying to work out for seeing that that more of the benefits of genome science get into all parts of the world. Yeah, thank you. And I would add that we have one of our bold predictions as well is that we want everyone to benefit equitably from advances in genomics. So we'll be coming to that later in the series. So let me ask a question to you both together. So when we talked about prepping the seminar, I told you I thought that we might have debate over the word function. But instead, we have a question over the word gene. So the Daniel Stryner asked before functions can be identified for every human gene, a complete list of all human genes is required, which means we need a definition of a gene. What is a gene? You want to start Nancy? Yeah, honestly, I like a functional unit because I don't I don't want to just look at genes. I want to look at every element in the genome that we think has function and understand the function of those elements. And I think in this context, the word gene was a shorthanded way of saying functional element because it was allowing for the possibility of long noncoding RNA genes. But we could just say functional units and essentially mean the same thing. I think it's hard because a lot of the functional units are not genes. There are things that may regulate genes. And in some contexts, even mobile elements are functional units. So I mean, I think but because this was about genomics, strategic planning for genomics, that's a really good target. And that's a really ambitious target from the genomic perspective. Yeah, I think Nancy I think said it pretty much perfectly here, which is that from the perspective at least of genome engineering, CRISPR doesn't know what's a gene and what's not a gene when we're making these kinds of edits to genomes. And so I think actually that is a really nice view to have that we just really consider what things have function and try to as comprehensively as possible find all those things and elucidate their functions. And of course, I think there's going to be so many interesting things that might really test the boundaries of that. I think in the last decade, we've learned a lot about 3D genome structure and how the genomes folded might impact its function or what proteins specific regions are associated with, nuclear proteins that are hanging out, their transcriptional hubs. So I think there's going to be a lot of richness that's going to continue to flow beyond the question of genes. I think the question of genes was perhaps most exciting a few decades ago when it was a wide open question of how many genes code for individual proteins that are floating around the cell. I think at least that one seems to be more conclusively nailed. So Chris, I'll give you one more question and then I'm going to move us into wrap up. Great, sounds good. So let me, given that this is bold predictions, we have someone who is going out on a limb here and asking, if we anticipate knowing the function of every coding G by 2030, at what point do we need to begin bioethics discussions on potential embryonic genetic editing techniques? I mean, I would say that obviously those have already been, have already started. That's a different talk, I think. Yeah, exactly. That is a different talk, but go ahead. But I think it's, I'm happy to see that people are really captivated, I think by the idea of genome editing, but that they also recognize that it brings with it some new things. And I think many new technologies, whether it's whatever, Twitter and Instagram or something else, they're not just all positives, right? That there's potential for misuse with certain technologies. And I think the thing that the genome editing field, what I'd say to the person asked the question is, I think the genome editing field is so entirely focused right now on somatic cell editing. That's really the focus of the field right now. Things like the example I showed with Victoria Gray. So there's somebody who's had a massive improvement in her quality of life, suffered for decades really with this disease. And then through the editing of somatic cells, meaning cells that are not sperm or egg, they're not a heritable change here, but really just something that's going to affect that one person has been a game changer for her life. And so I think for technologies for like embryonic kinds of things, we've had for serious genetic disease, we've had screening technologies, pre-implantation, genetic diagnosis, PGD for three, four decades, three years, decades at least. We have great carrier screening techniques that are already used to prevent serious genetic disease from being passed down. I don't think this is a target area right now for these new genome editing technologies. I think where we need to focus is the people for somatic cell gene editing who are suffering with diseases right now that can be cured within that person. And I think that's kind of perhaps the most responsible thing. And there's just so much to do there. It just makes sense to focus there first. So I would only add, I mean, when should we start those discussions a few years ago, obviously. But in truth, these discussions have been had now for several years and need to continue to be, this is an area that needs continued discussion. I think, you know, this concept of pleiotropy is one that has to remain front and center as people imagine a time where, gosh, we'll just edit out all of these imperfections and take the human species where we need to go. And you know what? You know that you never want to be in the tails of any distribution. And I mean, I think a lot of this is not, it's not going to be fixable. You can't monkey with this without affecting that and vice versa. And so the notion of when we understand everything, designer babies, I think is just pie in the sky. Because there will be pleiotropic consequences of any of the kinds of changes that people might think they would want to make that nobody's going to want some of those pleiotropic consequences. So I think part of what we'll learn is how to be amazed at and glad about the diversity that we have in the human species. We probably will need every bit of that diversity going forward for the foreseeable future. Well, in that note, I mean, I think Chris and I could both sit here asking Nancy and Neville questions all day, but we have hit the end of our time. So I said at the beginning that I couldn't think of two better speakers. And I just really feel like that was shown in spades with the talks and discussions that we had today. So thank you again to both Dr. Nancy Cox and Dr. Neville Sanjana. We appreciate your time. And I want to thank everybody. Again, we had over 400 people joining us and to those of you watching live for joining us today and reminding you all to come back on April 12th. Let me make sure I have that date correct. Yes, on April 12th, where we will be discussing the bold prediction that the general features of the epigenetic landscape and transcriptional output will be routinely incorporated into predictive models of the impact of genotype on phenotype. So we look forward to seeing you then. Thank you again and goodbye.