 Hello everyone. It is my honor and great excitement to be able to introduce everybody to the NHGRI seminar series. This is item number or issue number seven for those of you who have been watching live. These are also available in recorded format through the website that you see there on your screen now. And as the title says, this is the bold predictions for human genomics by 2030 seminar series. Next slide. So this idea came about as part of the NHGRI strategic vision, which was published in nature last year in 2020. Next slide, please. As part of this paper, we put together the bold predictions for human genomics by 2030. So these were a set of predictions and these are intentionally supposed to be aspirational and exciting. These were bold, not tepid predictions. So things that could be a little bit controversial, but things that we hope could be achievable at some point in the future are things that are really important to the clinical genomics and general genomics communities. Next slide, please. As you can see, we're at number seven here and I'll introduce our wonderful speakers very shortly. The last one was six. The next one coming up will be on October 4th. And the format for this session, as with all of the sessions, we'll start with paired speakers and we intentionally picked folks that have overlapping interests in genomics, but whose Venn diagrams don't perfectly overlap. So we want them to come at this controversial topic from different lenses and from different directions. They'll each give a 25 minute talk. After that, if possible, I'll have a moderated discussion, just do some follow up questions and then I'll turn it over to my compatriot here at NHGRI, Chris Gunter, who will do a Q&A with the audience. So we encourage all of you while you're watching, you don't have to wait until later on after the talks. You can put your Q&A in the Q&A box in Zoom and Chris and the rest of our team will be helping compile those and we'll feed them to our speakers. Next slide, please. So to be very specific, the topic today is on this bold prediction number seven, which is the clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation variant of uncertain significance or VUSE obsolete. And just as a little aside, I remember when I was traveling back when we could travel a number of years back to a big academic medical center that like a lot of them had had a clinical lab and a big clinical service and a big research group. And I met with one of the leaders of that group and she said, I think about this a lot, we shouldn't fear the VUSE. And I admit I'm a little ambivalent about this. I sometimes do fear the VUSE. Our speakers are going to explain a little bit about what that means and address how we as a genomic community can help address that issue and related issues. Next slide, please. So I promise my speakers that I wouldn't give their whole 40 page CV and talk about their high school GPA and all that. So I'll very briefly mention just a few points about our two wonderful speakers and they're going to go in the order that you see on the screen. So first we have Heidi Rehm, Heidi wears, I think, 7,000 different hats, but among her many titles, she's the chief genomics officer at Mass General Hospital and she's the clinical lab medical director at the Broad. She leads all sorts of other exciting endeavors from which I think some of her talk will be derived, some of the data from. And then to follow up again from a slightly different angle, we have Douglas Fowler, who is associate professor of genome sciences at the University of Washington. And Doug, which I assume you like to be called from our email exchanges and from previous exchanges, again, is going to address things from I think a really important functional perspective if I'm using the language correctly in the way that he likes to, but attacking this problem from a slightly different question. So without further ado, I'm going to turn it over to Heidi and then I'll follow up if there are any questions before we get to your Q&A. Thank you so much. Excellent. So thank you for the introduction and it's a real pleasure to be part of this seminar series. It's just really exciting to think about all these bold predictions. So as Ben mentioned, we're going to tackle bold prediction number seven today, really tackling this question of how we interpret variants and whether we can ever get rid of this category of vooses. So I'm going to start with a case, a real live case, because that's what my life is about all the time, both clinical testing and research testing. We see, you know, enroll patients or we test them in the clinical laboratory and we try to interpret these variants. So to put this whole effort and question into context, I think a case is the place to start and then I'm going to get into more broadly the general resources we use to help interpret variation and what it is we need to do with those resources over time and perhaps add other resources to eventually tackle and get rid of this category of vooses. So here's the case I'm going to start with. It's a case from our rare genomes project research study where we're trying to solve the unsolved for rare Mendelian diseases. And this is a 28 year old female with progressive symptoms beginning at age 14 years. She had peripheral sensory neuropathy was confirmed by EMG that included lower leg spasticity hyperreflexia with clonus, spastic gait, mild dystonia and bradykinesia, dysarthria, oculomotor defects. She had a negative family history for these symptoms and her relatives and she had had some gene testing and a few different panels that was all negative. So we enrolled her in our study and we did whole genome sequencing of both the parents and the pro band of the trio. And what we basically found and this was a case, Gabrielle LaMaire and our group worked on was a de novo miscense variant in DNM 1L and that led to a substitution of methionine for valine at position 41. And you can see here that this was due to a G to a variant found only in the pro band and therefore was de novo. So the de novo evidence itself is important when we interpret variation although as you'll hear from Doug in his talk, we all have de novo variants. So just finding a de novo variant on its own is not unexpected, although we only have about one in our gene in our coding sequence or you know, closer to 50 to 100 in our non coding sequence. So it actually matters whether the gene matches the phenotype of the patient when we think about the rarity of these occurrences and we'll get into that in just a second. But one of the first things we do when we look at variances figure out if it's in anyone else in the population and we look at our nomad database. We also look at data from top med. It's another large data set and this variant was absent from those databases. But it's important to point out that there are lots of benign variants that are absent from others in the population databases we have. So it's not on its own sufficient evidence to call this variant pathogenic. We also use lots of in silico tools to try to interpret variation and these look at conservation across species. They look at the actual structure of amino acids and the protein to try to predict whether it might disrupt the structure or the function of that protein. And these can be helpful and the red dots in this list of different algorithms suggest that the program said it was likely to be damaging when it's green, not so much. Although this one green one is actually a splice prediction. So that may not be true, but the missense impact could be. But nonetheless, these aren't perfect programs and they can. They actually are don't on their own suggests pathogenicity. So what else do we do? So we can look at the location of the variant across the gene. Is it in a site that's a hotspot for mutation? So here we're looking at our nomad database where we display clenvar variants. These have already been interpreted by others in the community. And you can see our variant here with this pink arrow I added is right adjacent to a couple of other pathogenic missense variants. However, there's also down here on the nomad line, other variation in the population in the same location. So although there's some evidence that this might disrupt something important given the pathogenic variation in this location. The benign variations say you can't rely on that on its own. So the other thing I mentioned when we think about de novo occurrence. While we all have, you know, de novo variants in our genome. The rarity happens when it's in a variant that is in a gene for which the phenotype of that gene or the disease associations matches your patient phenotype. So that matching is critical to decide if this is a rare event or in fact just one of the de novo variants occurring in all of our genomes randomly. So there we look in gene level databases like OMIM and OMIM says that this particular gene has been associated with to lethal encephalopathy and optic atrophy. Not really a great match for our particular case. However, we look in the ClinGen database where we use expert panels to curate the evidence level for these different associations often found in OMIM and they viewed this relationship with encephalopathy is definitive, but Lee's syndrome is limited. And then you can look in the GenCC database where other gene curations are submitted by other groups and other resources. And however, no one else has interpreted this besides ClinGen as well as OMIM. So that's the limit of the gene disease relationships out in the databases, none looking like a perfect match for this particular patient's phenotype. However, if we start digging in the biomedical literature, which Gabrielle did for this case, she found a paper that actually included a couple of cases that have a sensory neuropathy which actually is similar to our case. And in fact, one of these cases, the variant occurs only two amino acids away from the location of our variant. Now it's looking a little more interesting. In fact, if you keep digging through the literature, you'll find three more case reports of cases that have peripheral sensory neuropathy as their phenotype. So this shows you the amount of work that's necessary to dig up information on variants as we work to interpret them. So now if we start looking at the totality of evidence for this variant after looking at all of these different pieces of data and using a framework, which I'll explain in a little more depth in just a second. We have absence and nomad, computational algorithm support and de novo occurrence in a gene for which there are disease associations that match the patient's phenotype. And together that leads us to a likely pathogenic classification. So we will consider this case with a pretty good answer to the phenotype and that's how we get rid of the V us's by aggregating this evidence together. Unfortunately, that doesn't always work. We don't always find stuff in the literature that to prove disease association. So, so how do we go about tackling this question broadly? So I'm going to really walk through some of the existing resources that we use to support variant interpretation and then comment a little bit on how we will continue to advance this field to actually try to achieve bold prediction number seven. So first I want to start with ClinVar. This is a variant level database supported by NCBI in partnership with ClinGen, which is the clinical genome resource. And we've been working in partnership with ClinVar to really guide the development of this database, encourage members of the community to submit to it. And we also through ClinGen approve expert panels that can submit interpretations to ClinVar and help resolve when there are differences between different submitters. These are all voluntary submissions documenting the clinical significance of variants. And these submissions are largely from testing individuals either in a clinical setting or research setting. Now, most of the data is actually coming from clinical labs about 89% of the submissions to ClinVar from clinical labs, about 10% from researchers. And then the remaining 1% or so are from clinics, patient registries or these expert panels. We did work with NCBI to develop this star system that allows the user to have some sense of the level of review that variants have undergone when there's claims in there using these stars with the three and four much more reliable than one or zero. Two is when multiple submitters all agree with each other and that can be taken as a little more reliable. Now, if we look broadly at ClinVar, there's been submissions from all around the world. It's now become a global resource to support knowledge sharing as we work to interpret variation in humans. There's over 2,000 submitters from 80 countries with 1.6 million submissions on a million unique variants. So a lot of valuable data here. And as I mentioned, this is all voluntary submissions. Interestingly, 84% of the data in ClinVar is actually from the top 20 submitters. These represent a lot of the major US clinical labs and you can see them here in this list that are voluntarily sharing their data. 15 of these in the top 20 are actually commercial laboratories for profit yet recognize the importance of data sharing to improve the accuracy of variant interpretation and so are really working in this space as a pre-competitive space. So I just want to shout out and really thank these groups for sharing their data. But now if we dive a little deeper into what's in ClinVar, what you find is, and these are, this is largely been focused on germline interpretation, although there are somatic interpretations in there as well and pharmacogenomic. We're going to mostly focus on the germline interpretations for mostly rare disease, Mendelian disease. And what you'll see there is about 17% has been interpreted as pathogenic, a likely pathogenic, but 40% of the variants and their interpretation submitted to ClinVar are VUSs. That is what we're trying to tackle. It's a big problem. And in fact, I told you that we have 1 million variants represented in ClinVar. But in fact, if we look at our Nomad database, we already have cataloged over 760 million variants. So we still have a lot of work to do as a community. Now a lot of these variants are in non-coding regions, less likely to be impactful. But in fact, we're missing a lot of pathogenic variations. So it might be sitting there in that non-coding region. So pay attention. I also want to comment, you know, when we're doing clinical testing, and this is a figure from our ClinGen marker paper where I included data from 15,000 programs that we had tested in a clinical laboratory I used to direct. And what I'm showing here at the bottom is the number of times each variant was observed in variants we reported out in patient cases as either causal or potentially causal. This includes the VUSs. But what you'll see is the vast majority of variants we report out in patient cases, we only see once. And in fact, there's a hash mark here. This goes up to over 5,000 variants seen only once out of this total set. So and you'll see there's very few variants that we see over and over. So most variation that we're tasked with interpreting is incredibly rare. Sometimes we get one shot at it in one patient. And in fact, if you look at the ClinVar database, and this is a pie chart showing concordance when we agree with each other on our submissions, the problem here is that for 75%, we can't assess agreement because that variant's only been submitted by one lab further underscoring the rarity of most variation that we have to interpret. That said, we do work very hard to interpret every variant, whether it's been seen once or many times. And this framework that I showed you in that initial case, I'm going to dive a little deeper into here. So that framework comes from a standard and guideline released by the Merrin College of Medical Genetics and Genomics and Association with AMP, the Association for Molecular Pathology. And this was a group of us that worked for like two years to come up with a detailed framework for how individual types of evidence can be used and what strength it is to contribute to pathogenicity assessment. And we came up with combining criteria to how you pick different pieces of evidence and combine them into one of these categories for germline Mendelian gene variant interpretation. This guideline has now been adopted worldwide. It's incredibly widely used and is the key standard for how we interpret variation. So what I'm going to do is I'm going to pull apart these different types of evidence and talk about the resources we use to support classification and how these need to evolve to be able to scale and improve this process over time. So the first is population data. I mentioned looking in Nomad. So we use absence in these databases or presence at too high an allele frequency to contribute to variant classification. And over the years, these population databases have been growing. The largest one is now Nomad with over 180,000 exome and genome sequences. What I'm showing here is the ancestral diversity in those databases and while it has improved over time and becoming more representative, we still lack sufficient diversity in these databases and a huge emphasis has to be on gathering global populations, all the diversity out there to contribute to our knowledge base of population variation. We will be releasing version four of Nomad soon. Probably early next year, we still have a lot of work to do on getting a half million exomes and genomes aggregated together to release to the community and then we'll turn our focus to version five after that. But one of the reasons it's important to try to get more and more is some people say, well, how much is enough? We need a lot because we need to saturate the opportunity to observe variation. And if we get a large enough number of genomes aggregated, we can say that absence is actual evidence of pathogenicity. Right now, we can't say that. We don't have enough exomes and genomes to show that. And in fact, this figure that was included in the Nomad marker paper here shows that for CPG transitions, which were one of the most common mutation mechanisms, we're getting closer to saturation for certain types. The types are shown down here, reaching 75% saturation of observing all possible mutations, in this case, CPGs. But for non-CPG and transversions, we are still at the lower limits of the possible variation we could observe. So we have to continue collecting human population data, aggregating it and releasing it to the community to help us interpret variation. However, we don't just want the genomic data. If we're going to answer questions like this, prevalence in affected statistically increased over controls, that's actually supporting pathogenicity as opposed to just using population data to rule out pathogenicity. Or observations and controls is inconsistent with disease penetrance. This requires these databases to have phenotyped data that can be released and accessible to the community. This is where databases like the UK Biobank and the All of Us research program. In fact, there's biobanks all around the world being generated and are shown on the IHCC website. And then some of this data has now been publicly released and is easily accessible. We just released the genebass.org through a collaboration with several pharma companies that funded XM sequencing of the Biobank. The All of Us program will be releasing the data that we've been generating in the genome centers. There'll be a beta release of a 90,000-hole genome data set, all of that with phenotype from the participants of this program. And there's been major efforts in bringing biobanks together around the world through the GBMI effort where this data is actually accessible. These are critical projects, and we need all of them to work together to bring more phenotype data connected to genotype data to help interpret variation. Now, I will say that those biobanks are incredibly powerful for association studies, but in the rare disease space, sometimes they don't capture enough of these incredibly rare cases, and that's where deliberate recruitment and cataloging of rare disease patients, particularly in allelic data. I use the guilt by association allelic observation rule. This is when you have a pathogenic variant in one allele and the presence of a novel variant another. That mirror association, when the phenotype matches the gene, can be used as evidence. So for those types of cataloging, we really need to bring this rare disease data together. We've brought a number of these databases together through a federated network that we built back now over five years ago, and it's been growing over time, and these are rare disease databases where patients are cataloged and we allow querying across this network through a federated approach through supporting APIs. Right now, that's only to query whether you have a case with a candidate gene that matches your case with your candidate gene. It doesn't really tackle the variant matching, but over time, we need to add more depth to this, be able to query cases for variants and figure out what their phenotypes are instead of just this hypothesis-driven matching. To do that, we will need more resources that we can gain access to at the individual variant level. Resources like Anvil, a cloud-based platform supported by NHGRI that's bringing lots of rare disease and common disease cases together, allowing investigators to apply for access. Those will be critically important for this work. Now, we also, I showed you that case where DeNovo occurrence was a critical piece of evidence. Segregation data, looking at the presence of a variant across individuals in a family who are affected in showing co-segregation of the phenotype and the genotype, these are critical. How do we get this data today? It's largely through reading papers and visually looking at pedigrees or reading texts that's been submitted to ClinVar where it's present in the text of a paper. Here's that pedigree I showed you at the beginning of this talk. That is not a scalable solution for segregation and capturing DeNovo occurrence. We need these data sets with their relatives captured in these databases. And I'll actually show a picture from the UK Biobank major paper where they are capturing relatives in the UK Biobank. This will allow us to scale these related searches looking at segregation and DeNovo occurrence. So we want to not just capture programs in these databases but their familial relationships as well so that we can use this type of evidence as well. We also use a lot of computational data of different types. Some of it's gene level. So we want to know what is the mechanism for how variants in this gene cause disease so that a novel variant we can try to match is it a loss of function variant that we're seeing, the novel one, and is loss of function a known mechanism of the disease? How do we capture that kind of baseline data? Well, using Nomad gene constraint is one source of whether this gene tolerates loss of function variation for example. ClinGen has its dosage sensitivity expert panel that curates the evidence for these relationships for haploinsufficiency similarly. And then in the UK, the TGMI program has the gene to phenotype database for their capturing mechanisms and pathogenicity. These are all critical to help build this sort of evidence framework to classify novel variation in these genes. Another layer of complexity that I didn't actually show in this initial case is that many genes have different transcripts that are expressed in different tissues and at different time points in development. And so we find a variant. We actually have to go and look at which exon in that transcript it's expressed in to make sure it's expressed in the relevant tissue. And what I've circled in red here is that that variant I showed you early on is in fact expressed in the transcripts that are the primary transcripts expressed in the human body. And in fact, in terms of tissue specific expression, it's expressed in the brain and neuronal tissues that are relevant to the phenotype. So that's consistent. However, a lot of these requires visualization of this data which is less scalable. So we really need to capture this a little more robustly and facilitate the automated checking of transcripts and where they're expressed and alternative splicing. Also, most of the data we have accessible for tissue specific expression is in GTX from adult tissues. We need developmental timelines. We have to get more data to understand what these genes do early on when our phenotypes develop early on. Also, we often use the location of another pathogenic variant at the same site or a nucleotide change that leads to the same amino acid change as useful evidence towards pathogenicity. So we need to capture all of the variants in their interpretations like I described in the ClinVar database. So we need to continue to catalog more and more observations and how we interpret them so that we can use them as evidence for other variants that may be novel when we see them. And now I did plot here the data that's been submitted from U.S. And that's in part because we NCBI launched this database in the U.S. And we've had grants in the U.S. to really work with our clinical lab partners. So there's been a lot of focus in the U.S. But we need the rest of the world to start submitting to this database as well. And that will be critical to build our global knowledge base. And there are a number of countries working on this now in the U.K. and Australia and Canada. Already have started submitting but are starting to build up and really submit robustly. And I encourage all the countries out there to really work together to share our collective knowledge base. I alluded to earlier the use of Encelico algorithms. There's been a lot of these and they have been steadily improving over time. I mentioned they're imperfect but they're getting better. And this is an area of a lot of work. And I highlight one resource which is primate AI where the sequencing of primate species that allows us a wider look at the tolerance to variation across not just the human population but primate can be very useful in helping us interpret variation as well. And a lot of that data will be released on a browser that will use the Nomad browser to help share that with the community. So lots of exciting stuff coming. In addition, there's inferred functional impact we can do from looking across the gene. And I showed you that image in Nomad where we could visually inspect across the gene to look at the location of mutations. We're trying to more robustly look across the gene and define not just gene level constraint but regions of the gene because sometimes there's a small region that's constrained but it's overwhelmed by the rest of the gene that's not constrained. And so here's some work of Catherine Chow in the Nomad team where she's been developing algorithms to look across the gene in segments working with a number of others in the Nomad project. And you can see in this particular gene the five prime end is showing regional constraint that correlates with the ClinVar track of lots of pathogenic variation. So making sense. And then there are in fact thousands of transcripts that show evidence of regional missense constraint. So we're hoping to release that within Nomad version 4 so that you have access to these regional missense constraint information. Really critical is the actual functional data not inferred functional impact. And I'm going to leave that for Doug because he's got an awesome talk coming up just shortly to really get into how we really need to scale the analysis of the functional impact of variation because it's some of the strongest data we use in variant classification when it's done well and he'll discuss that. So in summary, to achieve our bold prediction number seven we do need to really focus on a number of the things that I mentioned, but in summary that includes building global individual level databases that contain genotype, phenotype and familial relationships. We need those data sets and databases to be interoperable and have frameworks for robust global data sharing so that we can enable queries of individual level data in real time during clinical genetic test interpretation and enabling that interoperability and the policies around data sharing is actually the global alliance. They didn't have time to talk about that organization but it's a really important organization. I spent a lot of time with it. Many others, thousands of others are participating in the development of standards for our genomics community so that we can really share data and have interoperable data sets done at scale. That will be critical. Number three is related to Doug's talk as we're early around high throughput functional study so I'll let him cover that one which is critical. And then finally the robust sharing of the assessment of the clinical significance of identified variants and their supporting evidence AKA submit to ClinVar is going to be key for our community. So if we achieve all of these goals we will make great headway on bold prediction number seven. That said I'm not quite convinced that that term VUS is going to be obsolete in 10 years but I do think we can make major progress if we work together as a community. And with that I will just acknowledge all of the different projects that I've mentioned in this talk that I've had the pleasure of working with very large teams. All of whom have been working together very diligently to create resources for the community to assist in understanding the meaning of genomic variation and trying to get rid of all of those boosts in our genome and also acknowledging funding from NHGRI, Broad Institute and MGH that helped support these programs. Of course there's lots of other activities not on this slide that it will be critical as well but I just at least wanted to acknowledge the ones that I've mentioned. And with that I'm going to stop there and I'm going to let Doug share with you some really exciting work of his on the functional side of variation. Thank you very much. Okay, well thanks so much Heidi. I feel like it's so much easier for me to give this talk now that you've explained how very interpretation, clinical very interpretation works. Usually when I talk about this stuff that's the thorniest part trying to introduce for me. So that's really a pleasure. And it's also a great pleasure to be here and to speak to all of you in this seminar series. I've been following along. It's been just really exciting to see all the talks and I'm excited today to sort of tell you about how weather and how functional assays and the data that they generate can help to rescue clinical genetics from this sort of pressing problem of variance of uncertain significance. So I think it's fair to start sort of by asking the question well how many variants are we going to have to deal with? Right, Heidi already talked about the fact that ClinBar has a large number of variants of uncertain significance and you know it's reasonable to ask well how many more are we talking about? And we can put a reasonable lower bound on this number because we know that the human mutation rate is about one per 100 million nucleotides per generation and that means that everyone has on the order of 60 de novo variants in their diploid genome and because there are 8 billion people alive today that means that there's something like 50 instances of every one of the 9 billion possible single letter changes or single nucleotide variants in the human genome out there in the world. Now of course that accepts all the ones that are incompatible with life which we won't see but the point is that if we're asking the question okay how many variants are we going to have to deal with? The answer is to a first approximation each and every single one and as Heidi already pointed out besides the ones we've already seen the new ones that we find are going to be initially single tens and only ever rare and that constrains the types of information that we will have especially initially for those variants and is the reason that many of them or one of the reasons that many of them end up as variants of uncertain significance when they arise in a clinically relevant portion of the genome. And this is reflected in ClinVar and I'm going to focus my talk just on this sense variation so variation that changes the amino acid sequence of an encoded protein. And the reason I'm going to do that is that it's the category of variation that's among the most likely to generate a variant of uncertain significance interpretation and that's because this sense variants can have no effect at all or they can be you know they can result in a complete loss of function of a really critical gene product. And so I'm showing you here a sort of a donut plot which is I guess just a different version of a pie chart. I'm going to show you lots of these so I'll just explain. This just is showing the proportion of variants of different classifications in ClinVar as of some months ago and among this sense variants it's like about 70% variants of uncertain significance with the rest in the more definitive or less uncertain categories. Unless you think this problem is one that you know is growing slowly or growing away. I'll just reinforce what Heidi said by showing how these categories have changed over time. And what you'll notice is that you know that variants of uncertain significance are accumulating in this sort of exponential fashion and the less uncertain interpretations are accruing much more slowly. So this is a large and growing problem and Heidi laid out you know how you know clinical geneticists and folks like Kara will bring kind of the full force of many different types of information to bear to help solve this problem. But I really want to focus our discussion on one particular type of information which is variant functional information that has the advantage that it can at least in principle be acquired for more or less any variant. And so when we talk about variant functional information what we mean is taking a variant that we observe in individual and taking in the lab and testing its effects in a model system that we think will be informative. So that could be testing the variant protein for its biochemical properties or a variant engineered cell line or animal model and making a comparison of how that variant behaves back to a wild type or reference sequence. And then based on that comparison making a judgment about whether the variant affects the function of the gene or not. And this is you know not new at all. Obviously people have been doing this for many, many decades and it's already within the ACMG framework as Heidi pointed out for in terms of being included in clinical variant interpretation. The problem though is what I was talking about earlier which is that you know traditional assays where we test one or a handful of variants at a time have a number of problems right. They don't scale. So that's one problem immediately. But also you know because a small number of variants are tested often with only one or a few controls it can be difficult to harmonize results across labs and that's been a big challenge. As well as kind of internally QC'ing the results of any one small scale assay being a real challenge. And so my lab and many other labs have worked on developing multiplexed assays of variant effect where thousands or tens of thousands of variants are tested in a pooled format all at the same time. And I'll explain how those assays work in a minute but if you execute one what you get is something like this a variant effect map and this happens to be a snippet of a variant effect map for SART kinase. So every column here represents a different position in the sequence that encodes SART kinase and every row is a different possible amino acid substitution. And just the coloring of this heat map shows you the experimentally measured effect of that particular variant. So these maps are really useful because you can use them to look up the effect of all the variants that have been observed so far the pathogenic ones, the benign ones, the variants of uncertain significance but additionally they contain all the information you need to look up variants in the future. So that's why this multiplex functional data can be in principle so useful. So the plan for today is I'm going to tell you about multiplexed assays and what they are and how they work. So we're all on the same page. I'm going to spend the bulk of my time telling you about our attempts to answer this question. Okay, now that there's some of this data out there, this large-scale functional data out there, how useful is it actually in resolving variants of uncertain significance? So I'm going to tell you about a research study that we just got done doing to answer that question and then I'll spend a little bit of time telling you about opportunities and prospects relative to the bullet prediction. Okay, multiplexed assays. Well, as you might guess from the name, the first step in executing a multiplexed assay of variant effects is to take a bunch of variants that you're interested in and mix them together in a library. And usually we express these variant libraries in cultured human cells. That's the model that my lab uses primarily, although many possible models, many models are possible and have been used including yeast cells and protein display and so on and so forth. But in any case, each cell expresses a different variant and to generate this type of library, there are a number of really amazing technologies that are available, many of which have been funded by NHGRI over the years, ranging from expression to transgenesis to endogenous genome editing. Nevertheless, once you've got your variant library expressed, you can subject it to any one of many functional assays and perhaps the simplest one conceptually is a growth assay. So here, maybe we've mutagenized a gene whose gene product is essential to support the growth of cells. And so if you just take this library and grow it out, cells that express wild type like functional variants like these red cells here will grow and cells that express non-functional variants like these blue cells just won't grow. And so then the trick is to leverage the power of high throughput DNA sequencing to track the frequency of every variant in the library from the beginning through the outgrowth to the end. And by analyzing the changes in each variant's frequency, we can compute a functional score that expresses how well that variant supports, in this case, cell growth relative to wild type. And again, what's really amazing is that there's been just a plethora of technology development that's happened to give us a really solid kind of version one or initial toolkit of assays that we can run in a multiplex format. So these include fluorescent reporter assays that we can use to read out protein abundance or stability or reporter transactivation or many other features. They include assays for cell morphology and behavior as well as some of the emerging methods that focus on single cell readouts, single cell readouts of cell state and linking those to varying effects at scale. Nonetheless, it's a really exciting time in this field because for the first time we have this initial toolkit and we're in a position to ask what it can do. And I just lastly want to make the point in this section that indeed variant functional data from these types of assays is arriving and it's arriving increasingly quickly. So this is just a plot from kindly provided by Fritz Roth and Jochenweiler that shows you over the years how many variants have been covered in a cumulative fashion by these assays and we're getting sort of more and more information as time goes by and this is sort of only accelerating. So that really motivated the study that I'm going to tell you about which was a collaboration with my close colleague and good friend, the Estorita as well as Ambry genetics. And the idea of this project was to basically take large scale functional data for some key genes and go through the clinical variant interpretation exercise to try to answer the question how often does having multiplex functional data enable you to move a variant from a variant of uncertain significance designation to a less uncertain designation. So we focused on three key genes in which the inheritance of a defective copy or a non-functional copy of the gene confers an increased risk for one or more cancers and the first of those is BRCA1. And I'm showing you here a plot that you'll see a few times so I'll explain it carefully. It's a histogram of the functional score of all of the what I call variants of known effect. So variants that are in ClinVar and classified as either pathogenic like the pathogenic likely benign or benign in the assay. And obviously this is only a small subset of the 4,000 variants most of which are missense variants that haven't yet been observed that were present in this multiplex assay. But in any case this is a cell growth assay that was executed and what you can see is that it quite cleanly separates pathogenic variants which have a low functional score from benign variants which have a higher functional score. So the other genes that we analyzed in this study are TP53 where two different groups generated functional data both from a loss of function assay and from a dominant negative assay. And then P10 where two different groups generated functional data from a variant abundance assay and a phosphatase activity assay. And so by talking about these sort of different genes and different data sets we'll expose some of the ways in which this data can be very useful for variant abundance reclassification and some of the factors that restrain its utility and which I hope we'll be able to address. So anyway, here's what we did for each gene. So the first thing we did is collect and curate the functional data in some cases combining the multiple data sets that we had. And then we had to determine the strength of the evidence generated by each assay. And I'm thankful that Heidi introduced this concept. But as you know from her talk there are different types of evidence that can provide different levels of strength, right? And strong evidence sort of gives you more points to move a variant off of the variant of uncertain significance classification and into a more conclusive one. And so the way that you, one way to evaluate a functional assay and the strength of it, it can generate is by examining how well the assay separates variants of known clinical effect into the appropriate functional categories. So in this case that's separating pathogenic variants into the functionally abnormal category and benign variants into the functionally normal category. And this was all codified in a really nice formalism and paper that came out last year from the ClinGen Sequence Variant Interpretation Working Group and that's the formalism we used. In any case, once we knew which evidence strength we could apply for the assay, then we went ahead and reinterpreted variants along with ambigenetics. Ambigenetics provided variants along with all the clinical information that is needed to go through the variant interpretation, you know, workflow and we teamed up with them to reinterpret those variants in the light of the functional data. So I'll go through each gene just quickly here. So like I said, for BRCA1, this assay nearly perfectly separates the variants of known effect. And I want you to notice that there are many variants of known effect like about a hundred or over a hundred actually. And so because of this near perfect separation and the large number of variants of known effect that were present, this assay generates strong or this data set generates strong evidence for a variant being benign and strong evidence for a variant being pathogenic based on whether it falls in this sort of pathogenic range or benign range. And when we went ahead and reinterpreted the variants of uncertain significance, we saw that 50% of them move from the US designation mostly into the likely benign designation. So this is a really exciting result, right? Half is a pretty good number to take off the top and it really motivated us and excited us to continue. So TP53 a little bit more challenging because what you see here is that like I said, two different groups, four different assays and none of the assays perfectly separated the known benign variants from the known pathogenic variants. And so and the assays additionally report on two different mechanisms of P53 pathogenicity, loss of function and dominant negative activity. And so we use machine learning to combine these four, the reals of these four assays using a naive-based classifier and trained on the variants of known effect in ClinBar. And what you can see here is the result of a leave one out cross-validation exercise where we're just predicting the functional class of the variants of known effect based on all four functional data sets. And we get all the benign variants right and the vast majority of pathogenic variants right except some small fraction which we erroneously predict to be functionally normal. Anyway, at the end of the day, this classifier based on the functional evidence generates strong evidence of pathogenicity and moderate evidence for a variant being benign. And when we apply that classifier to variants of uncertain significance from Ambrie, what we get is about 70% of the variants are reclassified. Again, primarily into the likely benign category. So this again was great and it demonstrates how you can integrate multiple functional data sets if you have enough variants of known effect. So next we come to PTAN and what you'll immediately notice so we have two assays, that's great. And what you'll immediately notice is there's no blue in either of these plots. And that's because when we executed this study and still there really are just a very small number I think three at the time or two at the time that we executed this study of known benign or likely benign variants. And that really constrained what we could do, right? Because now we can't use machine learning to integrate these two data sets because we don't have enough benign variants of known effect to train on. And we can't use this formalism for computing evidence strength because again we don't have benign variants to work with. And so luckily the expert panel that looks after PTAN had already issued guidelines on how to use this data. And so by following the expert panel guidelines we were still able to reclassify about 15% of the variants of uncertain significance. And so while that's not as impressive as 70% or 50%, it's still a pretty good result in my opinion. But it does highlight some of the limitations that restrain the utility of this data. However, as we'll talk about I think these limitations are ones that we may be able to overcome. Okay, so in summary, and what we learned was that when you have a sufficient number of variants of known effect to benchmark your functional assay and you have good functional assay data as we did here, you can reclassify somewhere from between half to the majority of variants of uncertain significance. And if you don't have those things then the data is still useful but it doesn't solve the problem to the same extent. Okay, so I wanna spend the last few minutes just talking about some of the opportunities and prospects as I see them relevant or relative to the bold prediction. So I think the 10 years from now will be in a place where for at least most of the clinically relevant genes in the genome, we will have a functional measurement for every single nucleotide, possible single nucleotide variant at every position at least within the sort of coding sequences and non-regulatory sequences of those genes. And this will represent a really profound transformation in my opinion of how we understand the human genome because we'll know exactly what variation does in all of these genes. Additionally, I think for at least some genes, we're going to have functional data for every possible variant for multiple layers of phenotype. And what I mean by that is that we'll know not only how a variant impacts say transcription but also splicing RNA stability, various levels of protein function. So protein abundance, protein activity, perhaps localization and some cellular phenotypes like in addition to growth and morphology possibly readouts of cell internal state like transcriptomes and other features. And I think that while we won't have this type of data and probably never will for every last gene, it will provide an incredible opportunity not only to make predictions about pathogenicity and to sort of by data integration do better at that task but to really understand in a mechanistic way how genetic variation exerts its influence on molecules and cells and ultimately organisms. And I think that creates exciting opportunities to think about how we might target treatments to particular variants and so on and so forth. So what do we need to get to this vision? Well, obviously first and foremost we need resources and some of those are coming and more are probably needed but we can apply this first set of assays that the field has elaborated and I think we can deploy them to good effect as I try to convince you in this talk. We also need new technologies and thankfully NHGRI is just fantastic at supporting technology development. So some areas where I think new technologies are needed are to deal with multiple variants. So when they occur within a particular gene as when you have a rare variant on the background of a common variant. And then as we move away from the diseases where it's a monogenic disease and there's high penetrance and it's sort of a simple case all of which are the examples I talked about today are to diseases where multiple loci are involved and the picture is much more complex. It will be important to be able to model the effects of variation at many loci simultaneously. And then additionally has been exposed in the seminar series. There's this amazing diversity of larger scale genetic variations, insertions, deletions, translocations and so on that we know is really important and which at the moment we can only model at very low throughput and in a limited way. Additionally, I think that it will be important to generate assays in more contexts and in particular in multicellular context, right? So we're reasonably good at reading out the variant effects on some molecular processes and some cellular processes but cells come together to make tissues and organisms and some genes and their effects will be very hard to model just on the molecular and cellular level and gain the perspective that we want. And then lastly, and I've given this somewhat short-shift in this talk, I think we'll need to make big investments in modeling to integrate all of the many data types that Heidi talked about, so from biobanks and patient phenotype data from sequence conservation and in other ways to really gain at the end of the day a holistic understanding of how genetic variants impact the many layers of phenotype and humans. And then the last thing we'll need is more clinical variants of known effect. I think our study really starkly highlighted how useful functional data can be when you have lots of variants of known effect and how stuck you are when you don't have it, when you don't have those variants and lest you think that most genes in the genome are like BRCA1 and we have plenty of variants of known effect and in fact that's not the case. So among the ACMG73, which is sort of a shorthand for clinically actionable, very highly studied, very often tested for genes, among those genes, if you had a perfect functional assay that perfectly separated all the known variants benign from pathogenic, only about half of those genes have enough variants in order in fact to generate strong evidence of pathogenicity and only about a quarter have enough variants to generate strong evidence of benignness. So this is gonna be a hard problem to solve, but hopefully it's all the forces that Heidi mentioned come into play and more and more variants are classified in a definitive fashion that will strengthen the utility of multiplex functional data and multiplex functional data sets have this property that they will contain all of those future variants that are classified, right? So we can reevaluate and update the strength of evidence that each assay provides as our catalog of variants of known effect grows. Okay, the last thing we need is to work together. NHGRI has recently announced this impact of genomic variation on functional consortium, which is bringing together some of the folks who are really interested in these very effect mapping technologies with many others who try to create the sort of initial compendium or catalog of functional elements and variation and its effects in the genome. And then there are many international organizations that are coming together around this idea as well to that I think are particularly interesting are the international common disease alliance and the Atlas are very effects alliance, both of which through different lenses are excited about building this kind of catalog or Atlas of variant effects. So relative to the bold prediction, my take and I think my colleagues at the genome sciences for helping me summarize my thoughts in two words is that I think mostly yes, we'll be able to render the VUS designation obsolete, but there will be many cases in which we can't. And I personally think that functional data will play a big role in getting to this point. So I just want to lastly, thank all of the amazing people that I work with, but especially Sean Fair, who's the student in my lab and also Lea Sterita's lab who executed the work that I told you about, as well as Jen Dines who got that work started in my lab and the fantastic team of folks at Ambri genetics were very generous with their time and their data and investing in the study that we did. And then lastly NHGRI and others for funding the work. So thanks. Okay, great. Well, thank you so much Doug and Heidi. That was amazing. My mind is awash with information here. I think you guys did a wonderful job both showing and describing the issues and also what you guys are doing from, if I might say it, slightly different angles doing to address this issue. I have a couple of questions, but for those of you who can see it, there are literally dozens of questions in the Q&A. You can keep those coming. But I just want to start with a couple of questions that arose in my mind during this talk. And Heidi, if it's okay, I'm going to address the first one to you. This is a very just practical down-to-earth question. So let's say today or tomorrow, a patient gets a report that has a voose, you know, or a doctor has a report where their patient has a voose. What should the next steps be? What would you recommend that those folks should do to try to think more about that variant and what it means for their health and so on and so forth? Sure. So, you know, it varies. And I should note that many labs classify vooses into subcategories. And we're actually working on new guidance for what those terms should be. Prevailing looks like VUS approaching likely pathogenic, for example, and VUS approaching likely benign are probably winning as the terms preferred. So the things that are approaching likely benign, likely just want to kind of ignore until, you know, but the things that are equivocal or approaching likely pathogenic, those are ones to pay a little more attention to. And there's several things that can be done. One, you know, asking the clinician, gee, this variants associated with syndrome, gene, or disease X, are there any clinical tests that might, you know, help me understand if I might have that? So that's one thing that the clinician-patient partnership can endeavor. The other thing is, you know, a lot of things are not much to do. And you can go into ClinVar and actually in the variant, if the lab has submitted it to ClinVar, you can click follow and be notified if knowledge changes on that variant. Many labs will send updated reports, but if the patient doesn't keep their address up-to-date with their hospital or their provider, that provider's not monitoring things, you know, that patient may never get that information. So the patient sometimes needs to be proactive, recontact their physician periodically, ask that they contact the lab and find out if there's updates, but they can also pursue, you know, check ClinVar themselves. The other thing is if there are family members that might contribute to segregation studies, track them down, you know, get your brothers, your sisters, your cousins, bring them in, particularly if they're affected and they can help contribute to the interpretation of the variant. So that patients and participants in research studies have a role to play and certainly, you know, enroll in patient registries and others so that you can contribute your knowledge, you can be contacted in the future, potentially for enrollment in clinical trials and other things that are be useful to these families. Yeah, thank you. That's very helpful. And I really like the idea. I don't want to put words in your mouth, but of this kind of feedback loop and the fact that it's an overall process. It's not just here's the result and things are done. So that's very helpful. So let me, I guess from today, let me project further in the future and again, intentionally, these are supposed to be bold predictions, but let's get even more bold or even more controversial. If we were to ask you if we were to fork you to say, okay, we said that vooses might be obsolete in 10 years. What would, are there any even bolder predictions on the topic that either of you might give? And I don't know if Doug or Heidi, which one of you, if any of you want to address this question, but would you have bolder predictions along these same lines that you might want to offer? Sure, I can give it a shot first. I think one of the predictions I would make is that we not only understand pathogenicity of the variants, which I sort of define as can cause disease, but you actually understand aspects of penetrance and expressivity, which is if I have a pathogenic variant, will it cause disease in me or my mother or my sibling or whatever? And some of the questions I'm seeing coming in the Q and A are alluding to some of the complexities that it's not just about one variant or for recessive diseases, two alleles, but it's actually about modifiers and environmental contributions and many other factors. And that's going to take a lot more deeper assessment using lots of patient level data and the broader population and phenotypes so we can understand penetrance. And it's also going to enable us or require us to look much more deeply at the genome so we can not only understand the primary cause of disease but the secondary genetic and non-genetic factors that contribute to penetrance and expressivity. Great, Doug, or do you have other things to add? Yeah, well, I think my answer is maybe the, you know, functional experimentalist genomics persons side of the coin that had you flipped there, which is, you know, I think that at least in many cases will have enough functional data, comprehensive functional data to say why a variant on a molecular and cellular level is actually pathogenic, right? And that shades into perhaps some of the points about expressivity and penetrance that Heidi made, but also like I said, maybe gives us an opportunity to begin, you know, really thinking about from a mechanistic perspective, you know, why people have phenotypes? Why are they sick, right? And that, of course, opens the door to maybe, you know, thinking about more finely graded genotype-guided treatment, you know, at least in some cases. So I guess that would be my bolder prediction. Okay, and we're not trying to stir up or maybe we aren't some controversy, but it's great to have experts like you address these burning questions. So I would love to keep asking questions of you guys, but I think more importantly, what we'll do now, if it's okay with you, is we'll turn to all the Q&A that have popped up on the Zoom Q&A. So folks who are watching keep those coming, but I'd like to turn it over to my colleague, Chris Gunter, who along with Eric Green were two of the instrumental folks both behind the strategic vision, of course, as well as behind the seminar series. So, Chris, over to you. Thank you so much. Yeah, thank you everyone for giving us your questions. We were warned in advance at this topic. It was going to be spicy, so everyone has delivered on that. So let me, I'm going to try and again, as we normally do, try and combine a few questions. So the first one I'm going to ask is two related questions. How do you propose to reach saturation of human variation in underrepresented populations, which Heidi, you talked about some, and then someone also asked, are there any plans to include the Sub-Saharan African population data into the public databases because every other variant in that population turned out to be a boost? Yeah, so I'll tackle the first one first, really about diversity, and this is critical. I mean, the interesting thing is that this is where we can all work together as global population because African data doesn't just inform African patients. It's actually often the reverse. Variants that are rare in white individuals, the way we can rule them out is because they're common in other populations and African populations are Hispanic and others. So we all need each other's data to interpret, and that's why diversity is so critical. To get there, we have to be very deliberate about funding the recruitment of populations that we can enroll, sequence, and work to share that data from every corner of the universe. And this is both in research studies and in some ways there's more control over the research populations because it's been very clear as I seek NIH funding that I don't get funded if I don't have a plan for how I'm going to enroll diverse populations. So there's where we have control, where we don't have control as much is in the clinical realm. And it's been very clear to me as I've looked at studies where we've done in the clinic, a lot of these patients don't get referred to specialty practices. So we can't even recruit them from the disease-specific specialties. They're not getting from primary care and sometimes they don't even get into primary care. So we really have to think about our healthcare system and equity of access to healthcare as another way that we will enable diversity. And it's going to take a lot of work and there's efforts underway but not enough. So really to build up our databases to reach saturation of variation will require all of the diversity of our human population and we're just going to have to really work at it. Yeah, no, absolutely. And Doug, let me add on one for you that just came on, which is related. Will you be addressing function of variant in cells that match in terms of ethnicity in the same ethnicity, i.e. the patient with a variant? Yeah, I think that's a great question. One advantage of multiplex functional assays is that all the variants are in there. So they're sort of agnostic from the front end. But on the back end, what context are they happening in? That's something that at the moment we're still making choices about, well, this cell line or that molecular assay. And I think particularly for assays in cells, they all be very important to take genetic contacts into account. Now, I don't know if patient-derived cell lines are the right way to do that. Like in some cases, we know where genetic modifiers are. And from an experimentalist perspective, it's probably cleaner to look at those specific population-specific modifiers and make sure we account for the diversity of them. But we're trying to dream up ways to intersect these variant libraries with big pools of different cell lines and types. And so maybe there is a world where we would have a sort of a diversity panel available to us and approach contextual diversity in that way rather than in a kind of, we know the modifiers and we're going after them kind of way. Right. And along those lines, this is also for you, Doug. Someone asked, do I infer correctly that the functional assays you advocate for will obviate or minimize the need for model organisms to define pathogenously? Okay. I see. I told you. Well, I think so. I think there's a, yeah, that there's a real questionnaire embedded in that question, which is like, is a cell line a model organism? I would sort of say that it is in the sense that it has many of the same properties of being informative, but imperfectly related to, you know, human beings. I mean, I personally am pretty bullish on all organisms. One thing that my lab is thinking a lot about and I sort of presage this in my talk is how to move these assays into multicellular models. And you know, organoids provide some, some strengths, but what I would really like to do when we're driving hard at is trying to get these assays into a multicellular model organism that is its own free living, reproducing thing, which comes with some caveats, but also incredible opportunities to study real biology that we can't do in cell lines or agglomerations of cell lines. That's my two cents. I hope I can still write grants after this, but that's it. So we've gotten that. No, that's it. That's an answerable question. I think that was good. So we've gotten a number of questions that we asked you in the bold prediction to talk about single variants, which we did a great job at. Of course, people want to go on beyond that. So we've got a number of questions about multiple variants, different types of interactions, et cetera. So here's one of them. The fundamental assumption is that a probin's condition will be caused by a single variant rather than by multiple unlinked variants, i.e. genocets. When studying genocets, that's obviously daunting, but what prospects are there to stop just only looking at S and Bs when you're looking at a patient's genome and moving on to other types of inheritance? Yeah, I think this is a really important point. And when we're doing clinical variant interpretation or case analysis, genomic case analysis, we go through all of the variation that could be causal. Now, our filters aren't perfect and we use a lot of assumptions and we have limited data sets. So we're not going to be able to detect everything or interpret everything. But there are a number of cases where we find more than one variant is contributing to the phenotype. I think there was a nice paper from Baylor a number of years ago that showed that 5% of their cases actually had at least two diagnoses going on. We see that as well. So I think it's naive to just stop when you find one thing. And in fact, a lot of the reason, part of this is an ascertainment bias. A lot of the reason some of the early cases were sent for exome and genome sequencing is they didn't have a single clear clinical diagnosis. And so there may be some ascertainment bias. But nonetheless, the primary phenotype can sometimes be due to multiple things going on. But undoubtedly, as I mentioned earlier, their expressivity and penetrance of these phenotypes are also going to be to secondary genetic modifiers and other factors. And this is where we need larger data sets because you can't have one case and then look at the rest of the genome and barely have evidence for the primary variant. But to try to get enough evidence from one case for the secondary variants, the data is just not there, the statistical evidence. But if we have a large cohort of patients with the same phenotype and the same genotype, primary genotype, then we can start to look at modifiers. But do we need numbers? So this is where we all have to share data together because we have no hope of getting out those secondary factors if we don't have larger data sets. Yeah, absolutely. I think that's where we're all going. So another question from the audience. One of the best things about this poll prediction seminar is we get a wide variety of questions. So someone suggests one way to guarantee the obsolescence of who's is to define it out of existence. Is there any popular support emerging amongst clinical geneticists for a Bayesian approach that assigns a likelihood score to each variant to be combined with a patient-specific prior? This is a great question. And in fact, Doug alluded to the fact that we, through our ClinGen Sequence Variant Interpretation Work Group, and this is work that really Sean Taftijian started, which is modeling the ACMG guideline that we developed on a Bayesian framework. And it actually maps incredibly well. And in the next iteration of the ACMG guidelines, which we have reconvened as a committee and are working on actively, they will all be predicated on the Bayesian framework. And we're using that to then define so some of the types of evidence that we have access to have a statistical basis to them. And there was, I think, a question about predictors and will they ever get beyond weak? The answer is yes. And we have a subgroup within our ClinGen SVI working group that is focused specifically on predictors and really training them well and removing the biases and validating them appropriately with differential data sets. And those are getting into moderate and even strong in some cases. So I think that we will be able to use the statistical underpinnings to contribute to variant interpretation in much more robust ways than the framework has in the past. However, we will continue to have some types of evidence and an animal model is a great example of one. There's a degree of professional judgment as to whether you think that animal model recapitulates human disease. It's really hard to put a number on that and sometimes there's judgment involved. So we will continue to try and move everything we can towards a Bayesian framework, but that will take time and evidence and frameworks on the prior probability. This is a really interesting question. And we try to separate the independent evaluation of variant evidence in aggregate collectively from whether we think of particular variant we have found in classified is actually causal for a patient's phenotype. That is where prior probability often comes into play. Although that there is elements of that that do contribute as we aggregate cases together and prior probability can have some role. But it is something that we are starting to think about in terms of the baseline of any given variant because we started in the first ACMG framework assuming everything's a VUS at the start and then moving it up or down. Truth of the matter is if you look across the whole genome statistically speaking, most things started benign. So that's where the prior comes in and we're starting to think about it's a little complex to separate case level interpretation from variant classification and all the various variables here. But those are concepts that we are really thinking heavily about less v-seeker and I have long discussions about this particular topic and many others on the group. Heidi, can I, Chris, if I could jump in and just ask Heidi a follow up on that? So again, I'm thinking with my clinical hat on. So with this Bayesian framework or kind of the future way we might think about this, what would a, again, I'm going to say the same question a patient or a doctor get in their hand that might be different than today. I was hoping you could explain a little more concretely what the effect would be. Yeah, so one challenge here is we can't return reports to patients or even physicians that are really in the weed stuff that they don't follow. So what we've done is we, and this is with Sean Teptigens, some of his work and less v-seeker and others in our team, take the basic Bayesian framework but then convert it to a simple point scale and then take our evidence framework and use it to apply points. And then you can add and in this case subtract also if you have competing evidence points. So what we anticipate on a given patient report is in addition to using the terms and the terms are still critical because sometimes professional and clinical guidelines are based on if you have a LP, likely pathogenic variant do this, if you have a pathogenics we still need these buckets but we can also particularly on the VUS scale where there's a big range we can give the actual point value additionally and it may be not right up front and center it may be in a little bit in the details but we can say your VUS is at plus five versus minus four versus zero and those clinicians over time will come to sort of see where things are on that scale and decide how much energy and effort to put into following up and digging into the details because they actually see that quantitation. So that's where you can anticipate this coming from is a totality of points that maps to the Bayesian framework but we don't want to lose the individual evidence types that we can track in machine learning algorithms at this type of evidence, that type of evidence. So we'll still try to use codes so you can know what kinds of evidence contributed to that point score but we will try to use the points if that makes sense. Yeah, great. As you said, it's very complicated there. So Doug, let me ask you two questions that came in semi related. First one was do you think multiplexed assays could be used for variants with more subtle impacts on disease probability? For example, A, B, E, 4 in Alzheimer's disease and the second one was will these functional B assays be available for the rare disease communities? Yeah, I think those are both great questions. And so in terms of subtle phenotypes, I guess, I mean, I think the answer is yes, right? You can construct an assay, it is possible to multiplex like most cell-based assays that one could find in the literature. And so for A, B, E, 4, like A, B, E, like you could, I could imagine a couple of different, like you couldn't ping me and we can talk about A, B, E, but I think there's possibilities there, right? That for most genes, we already have a type of multiplex assay that could be applied. And for most of the other ones, if you're willing to put it in the work, you can probably come up with one. Will it be perfect? Maybe not, but you know, I think the real challenge in that case does come back to this mapping between like, okay, I have this result and maybe it's a clear result in in vitro assay, but how does it map to like what I should expect in the human phenotype, right? And I'm not an expert on A, B, E and I'm really not an expert in clinical genetics either, but it strikes me that, you know, we come back to this problem of like if there's not that many example clinical variants, then it's very hard to say, okay, maybe my assay like really perfectly separates, say, you know, nonsense variants from synonymous variants. Like in any vitro sense, it's internally consistent, but how do we know how much predictive power it would have for, you know, a human? We only know that if we have a lot of example variants. Otherwise we have to default to what Heidi said about, about, you know, I think the example was a mouse assay, but it's a judgment. And I think that's like where things get very problematic and where I'm certainly not qualified, right? If you just have a small number of variants known to fact and you have to make a judgment, that's a tough call. So I guess I'm hopeful that we'll end up in this world where we have many more variants that, you know, do something we know about and then we can calibrate these assays in a systematic way that does not require us to assess like, oh, well, it came from yeast, so I don't trust that as much. No, if the yeast assay perfectly separates a bunch of known clinical variants, then that's great, right? And conversely, like I cooked up this amazing organoid system that is so elaborate and incredible and looks so good, but, you know, if it doesn't, if it doesn't actually separate clinical variants, like it may, it could be useful for other things, but like probably wouldn't want to use it if you were a clinical geneticist as strong evidence. And then as far as the rare disease community goes, I mean, I think that, I think that the answer there is also yes, right? So as I said, it's possible to devise assays for most genes and the investment required to generate one of these data sets is now not extreme, right? It's in the like, just gonna throw out a random number and other people can disagree in the comments if they do this kind of stuff, but maybe it's somewhere between like 25 and $150,000 to saturate a typical gene with, you know, many, not all, but many of the possible variants. And so, you know, I don't see why rare disease, you know, genes shouldn't be targets and they should be, right? We'll come back to the same problem of like, well, how many example clinical variants do we have and how do we know if assay really, you know, can be used as strong evidence? Those questions will be harder to answer for rare disease, I imagine. But in terms of getting the functional data, I think that, you know, it can show it and will come for a rare disease, these genes. Yeah, absolutely. Yes, exactly. That's great. I know people are really looking forward to that. So we asked you to go bold. This question is asking you to go even bolder. Will you be addressing the role of RNA data in boost interpretation? So on our end, absolutely. And actually a lot of clinical labs are starting to have RNA splicing, you know, assays as part of their clinical workflow when they find particularly variants in the flanking intronic regions, for example. So I think, you know, in terms of direct follow-up of individual identified variants, there is a lot of data now going into that sort of workflow. In addition, in our rare disease, genomic discovery efforts, we use RNA-seq, you know, assays taking patient tissue. And here, different diseases are more or less amenable to that. So muscle diseases where we can get muscle biopsies are easier to do RNA-seq and look at the entire transcriptome where we can look for changes in transcription that may be due to variants that we never would have like come out of our filters and been obvious because they're in intronic regions or even regulatory mutations that change levels of transcription. And it's, you know, those data sets in the beginning were very dirty. They're getting cleaner over time and a little easier to detect changes. So yes, the answer is absolutely RNA data from my standpoint is being used both in the research realm as well as the clinical testing realm and increasing frequency. And Doug, anything you want to add to that about the rule of RNA? I mean, I think there's lots of assays that people have published and many that people are working on that are capable of, you know, that they rely on endogenous genome editing so they can capture things like transcription and RNA stability. And I think that, you know, the answer is yes, in some cases we're already there and we'll get there. Yeah. So we have a more general question. So we've been talking a lot about databases and Heidi showed us a whole many, many different databases. But how can we address the issue of user accessibility of these databases? Many of them have high level information and require you to be familiar with the interface in order to use them. How do we make them more accessible to everybody? Yeah, it's a great question. And, you know, there's a balance here because for instance, in Nomad, people want more and more features and things. The more we add, the more complex it gets and the harder, you know, sometimes I even go in Nomad like, I didn't know we had that in there. Like, when did Nick put that in? So like, there is a challenge there. And actually, ClinVar is an example of a database because I, you know, answered Ben's question and I said that the patient could go in and follow their variant. Well, that's not straightforward. To even find their variant is actually not straightforward. Like, can you know the HGVS nomenclature? What was on that clinical report and what piece of it do you go type into a database? Not actually straightforward. So we've actually, through ClinGen, we have the Genome Connect patient registry. We take patients reports and we capture that data manually, unfortunately and put it into, we submit it for them into ClinVar and make entries for them. We alert them when their knowledge changes. And we also now have a little tutorial that's on our website that teaches a patient how to use ClinVar. So I think it's a combination of things like that where we assist and we help, you know, give user, you know, guidance, but also creating interfaces that are catered to patients and their types of questions that make this much easier. And those haven't, you know, we're already just trying to build the basic databases for the community that mostly uses them, but these are going to be important things that we need to support in order to, you know, engage the community and teach them the utility of these types of things that we're trying to do because that is actually how we'll get them to participate as either research subjects or seek out their clinical care in the realm of genomics is if they can engage themselves and understand and realize the value of it. Yeah, absolutely. And Doug, any thoughts about your databases? I think we're in a bit of a different situation, right? I mean, the MAV data or I'm sorry, multiplex assay data is, I mean, it's, you know, there's a set of people, you know, who like really want to dig into it. There's questions about ML in the chat and like there's a database. You can go download many of these data sets and play with them like you will, you will enjoy that perhaps. So there's, you know, data generators of like really hardcore data analysts, but we have a database that I think serves them reasonably well already, although, you know, there's lots of work to do. But then I think, I think for us, right, we have this sort of really broad set of possible consumers of the data, right? And that ranges from protein biochemists to clinical geneticists and everybody in between. And so my opinion on where we should go and what I'm advocating for in our field is a model where we really try to federate the database that we have insensible ways. So like we're working with ClinGen to try to ask the question not like how can we make a clinical version of our database, but like, how can we make sure that the metadata that needs to be in our database is in there for you all to use the data and show it to your content, you know, your sort of cohort of folks in a way that makes sense. And we anticipate doing the same thing with other databases like Uniprot and, you know, I mean, there are special that already have these channels that already have users like we don't, you know, so I think it's a very different type of challenge where I think our field is more in the like, we're doing this and now we need standards and we need to have a, you know, dissemination mechanisms that are not specialized and that's, I think that's the moment that we're at. Great. I want to, Chris, if it's okay, I want to follow up with just very quick questions. I will not commit the cardinal sin of keeping us over. So maybe this is going to be a hard one and we knew this was going to be hard, but I want to turn to you and you touched on this already, but in just maybe a few sentences at most, if you had a magic wand, what would you want? And you can say money, I guess that's fair, but maybe more specifically, what would you want to be able to solve this problem knowing what you know of the field? And then I'll, after each of you respond in turn, I'll conclude with our final remarks. You want to go first? You want me to go first, Doug? You can go first. I'm contemplating. So, you know, from my standpoint, what I want is for really it comes down, like data sharing is going to be completely critical to learning the things we need to learn. And so it's really about can we set up, you know, the standards and the frameworks and the collaborations that allow us all to share with each other. And it's hard enough within our own countries, but crossing country lines and the legality and, you know, all sorts of issues are really challenging even when somebody wants to share their data. And this is going to require a combination of really close engagement so that we all treat each other as colleagues and recognize how we can share and solve problems more quickly together. It's also going to require creativity around getting around legal and other barriers so that we can actively share data. And some of that, you know, in the Global Alliance, we're working on a lot of federated approaches so that the data doesn't have to leave the country, but you can still query it and get answers back and then aggregate your multiple queries into a common data set that those are more challenging than if all the data is sitting in one nice pile. But it's going to be those approaches that, you know, that allow us to bring these data sets together that are going to be critical to answer the questions we have to answer. So for me, it is that we all can work together as a community and we build the environment to do so. I'll just keep it brief. Besides money, I think the two things we need are new technologies that scale and give us sort of the context I talked about and then community, right? There's a lot of people generating this type of data. Right now we're loosely connected, but I think if we come together around the goal that I articulated, we can get there. And I'll say money helps, too. Yeah, I can do it with money. Well, that can be the last word with that. I want to thank you guys both so much for your wonderful talks, the wonderful Q&A. Also want to thank Chris, of course, Susan, William, Alvaro, and Gerald for all the behind-the-scenes work that went on to make this happen. I have to say, listening to this, I'm so excited to follow what you guys and all the folks you work with do. I think you really both underlined that this is team science, right? This takes government academia. This takes industry. This takes the public. This takes everybody to get us to a better place. I think you really made that point very, very strongly. The very last thing I want to say is to make sure our audience looks out for the next in our seminar series, which is bold prediction number eight, which is really a bold one. Again, a person's complete genome sequence along with informative annotations can be securely and readily accessible on their smartphone. So you'll see the schedule here. Again, all of these talks are recorded and will be available through the website. So please disseminate this information. Again, thanks so much to everybody who joined us. Thanks, special thanks for our speakers and we'll see you next time in about a month or so. Bye.