 Hello everyone. Thank you all so much for joining us today for our second seminar in our Genomic Innovator Series here at the National Human Genome Research Institute. Our vision for the series is pairing an early career researcher funded under NHGRI's Genomic Innovator Award program with a more senior researcher whose work relates to the research area. And we believe that this pairing will give us insight into the creative endeavors that are accelerating work at the forefront of genomics and I'm really excited about today because I think both of these speakers have taken that assignment seriously and I'm really excited to see what they've done. So my name is Chris Gunter. I'm the NHGRI Senior Advisor to the Director for Genomics Engagement and I'll be moderating the introductions today and then my colleague Lisa Chadwick who is a wonderful program director in our Division of Genome Sciences will moderate the Q&A portion of our seminar after our two speakers. So please feel free to put your questions into the Q&A button via Zoom at any point in the seminar and then at the end Lisa will lead us through them. So now that we have the mechanics down I am very pleased to introduce our two speakers for today's topic which is data driven approaches to define rare genetic diseases. They're going to alternate speaking but Dr. Melissa talking too fast. Melissa Handel will start us off. She is the Chief Research Informatics Officer and Marsico Endowed Chair in Data Science at the University of Colorado Unshutts Medical Campus. She's also the Director for the Center of Data to Health and the Principal Investigator of the National COVID Cohort Collaborative in 3C is what that's called. Her vision is to weave together healthcare systems, basic science research and patient generated data through development of data integration technologies and innovative data capture strategies. And then our second speaker again they're going to alternate who was the recipient of our Genomic Innovator Award is Dr. Jessica Chong. She is an assistant professor in the Department of Pediatrics at the University of Washington School of Medicine. And she's also co-investigator in the genomics research to elucidate the genetics of rare diseases consortium or Gregor for sure. In addition, she is highly committed to encouraging opening open data sharing in science and especially in rare disease genetics. So I'm going to turn it over to Melissa to start us off. Thanks very much. And Dr. Chong is going to share the slides and first I just like to thank Chris and Lisa and the whole team for putting Dr. Chong and myself together we had an absolutely wonderful time. With this idea of partnering and more senior investigator with one of the genomic innovator awardees and so we're really pleased to present this shared presentation to you all today. These slides will be available at the Bitly as well as the recording on the NHGRI website. And if you want to tweet about us, we'd love to see what you think. Next slide please. So I just like to start with kind of thinking about how we define diseases and this is really important from a conceptual perspective in order that we may develop computational approaches for diagnostic for prognosis and for treatment discovery and selection. So if we think about a disease state every person fundamentally is an end of one disease. This is really the promise of what we mean by precision medicine. And we are the collective outcomes of our genetic endowment and our environmental influences over time with the phenotypic outcomes being the readout of that over time. And so the question, you know, really about how we define diseases given this sort of notion of every person having their own end of one disease state is what are the most meaningful groupings of patients and by, and what I mean by that is what are the meaningful groupings of patients that we can use for those same effective functions. So for diagnosing patients for developing prognostic tools and for mechanism discovery and treatment selection. So a lot of what we're going to talk about today is really about how do we think about this in the context of rare diseases and in particular Mendelian diseases, which are not the same thing. And really taking a computational approach to defining diseases. Next slide please. So, approximately 1 in 10 Americans or 400 million people globally are afflicted with a rare disease. This does not mean that all rare diseases are necessarily life threatening, but they all are debilitating in one way or another. And each it's the zebra is often uses a symbol for for rare diseases because each patient's characteristics are akin to the variation and zebra stripes. And so it's often said that clinicians refer to the patient sitting in front of them that may have a rare diseases. I have, I have a zebra. But this is really potentially. I think misleading because if 1 in 10 Americans actually have a have a rare diseases, then we should be thinking about horses and not zebras right so that the fact is that rare diseases that collectively are quite common. And so that's really kind of a different perspective that allows us to develop more holistic approaches to defining disease and to the diagnostics and mechanism discovery. The other thing that I would say is that back in 1983 as part of the orphan drug act. It was declared that I believe it was 7500 rare diseases. The number has been quoted since 1983. Well, we fundamentally know, especially with many thanks to NHG for funding Gregor and a whole variety of disease gene discovery initiatives that this number is demonstrably wrong. We, we find new rare diseases every day or every week, OMIM as constantly updating records. And so we really need to think about, you know, what that number really means in that context and it really does a disservice to the patients with rare diseases to have that number never changing. And so, there's an article that was written in collaboration with NLM. The case for open science rare diseases. So I'd urge folks to to read that it really focuses on how do we as a global community share data and create open science initiatives that allow us to understand, you know, who are the patients with rare diseases. And how can we leverage each other's work to better diagnose them and better care for them. Next slide. So why is the number wrong. So as I already mentioned, the criteria for rare around the world varies from the orphan drug act, the rare disease affects fewer than 200,000 people in the United States. In 2000, the European Union considered a disease to be rare when it affects fewer than one in 2000 people. And also just to note that as I mentioned, rare disease just because it's a rare disease does not mean that it's a Mendelian or even a genetic disease. And so we have worked hard to try to reveal environmental causes of rare diseases, for example, as well as Mendelian ones or genetic ones. So new diseases are discovered all the time but the number is never updated as I mentioned. Another thing I really also want to highlight is that there's there are for as many patients as get diagnosed with the rare disease there are just as many patients who don't. And so those patients are all those end of ones unmatched and, you know, get sort of logged hopefully if they're lucky in systems such as the matchmaker exchange, where we can find the second patient potentially in the world which would be a diagnosis for that patient. So those those are all rare diseases as well. And they're, you know, essentially never counted. There are also different disease definitions around the world for rare diseases. So there are dozens of terminology and disease registries. These these terminologies are not often included in clinical terminologies such as international classification of diseases which are commonly used in electronic health records. And fundamentally the definition of a rare disease and how to model it computationally has remained more of an art than a science. Next slide please. So the prevailing clinical diagnostic pipelines tend to leverage only a tiny fraction of the data. So if you think about a patient or a family that comes into a clinical genetics clinic. They might have a whole exome or whole genome ordered and that would be compared to in a variety of different pipelines to a genomic reference sequence. If you hit the forward button please. But these data types that are really characteristics remember our end of one patient is a is the culmination of their genetic endowment in their environmental influences over time with the phenotypic characteristics. As the read out. Well, there's a lot of clinical phenotype information, multi omics phenotype information, socio economic factors. Environmental factors that are not generally leveraged in the diagnostic context and the resources for references for those individual elements about the patient are also not very robust. So how do we define diseases leveraging phenotypic character a characterizations. How do we correlate this with population frequency. A rare disease such as sickle cell anemia might be rare in the United States, but it's can be quite common in regions of Africa. So these kinds of disease definitions and incidents matter. We perform population based statistics and disease risk factors and understand how do we surveil for rare diseases rare disease patients in general. So there's a lot of data both at a personal level, as well as a reference level that is simply underutilized in the diagnostic context of a rare disease patient. Next slide. The challenges start with the basics phenotyping. So, unfortunately, computational phenotyping, you know, as has as a still emergent area of research and computational biology and clinical informatics. A lot of what we do in the electronic health record still looks a lot like this. Love old clinical note. And you can see things like always examine ears and scarlet fever. The clinical notes remain a place where a lot of clinicians will create phenotypic information that is not already or otherwise easily recorded in the electronic health record. So, so we really have to think about how do we improve the electronic health record environment to support the documentation of the patient's characteristics as a biological subject in addition to the other uses that electronic health record serves. Next slide. So the human phenotype ontology is an ontology developed by the monarch initiative and led by Dr. Peter Robinson. And it's a it's a it's a clinical terminology, but it differs from standard terminologies such as the international classification of diseases in the sense that it is a logical graph. And it consists of about 14,500 terms or so. And terms are represented as nodes in that graph such as hypostmia or deeply set eyes. These phenotypic features that are captured in this this graph structure are associated. We have around 190,000 associations of these terms to to disease entities. So, really, and I'll get into this in a little bit really. So, we're characterizing diseases as a representation of their phenotypic features that are that come from this, this ontology graph. One of the really most important features of the human phenotype ontology or HPO for short is that it's also integrated with a variety of other data sources and other terminologies such as the gene ontology. Here, for example, we have the term hypostmia, which is represented logically as the abnormality of a sensory perception of smell or the absence of a sensory perception of smell, which at the time of this query happened to also have 34,000 annotations and 22 species. So, the way in which this logic sort of undergirds this ontology connects to basic research data in a way that clinical terminologies. Standard clinical terminologies really do not allows us to do much, much more computational assessments and integration of data to hopefully reveal mechanisms of disease. Next slide. So, this is how the HPO is used in action. So, here we have curated as part of our initiative, the Whiteman Steiner syndrome disease entity. This is the blue phenotype terms in the middle and so you can see and this is just a subset of the list of terms that are associated with Whiteman Steiner. And you can see terms such as short toe, short middle phalanx of the finger, delayed speech and language development, intellectual disability, microcephaly, thin upper lip vermilion. And these are all features that you really, you wouldn't see necessarily their biological features of the patient's characteristics, but they're not managed conditions for the most part. And therefore they're not necessarily largely found in the international classification of diseases or any part of the HR. We simply don't, you know, bill for having short toes, for example. So, so by characterizing the patient as a biological subject that allows us to compare those phenotypic features with patients and known gold standards as well as a cross species. And we're not going to talk about the cross species part today, but suffice it to say that it's a relevant component of how we build our diagnostic tools. So here we can see that these two patients came into my colleagues clinic within a few weeks of each other. One was a three year old girl and green on the left and a 14 year old boy on the right. And the features are non exactly matching to our gold standard disease definition of these phenotypic features. So for example, the three year old girl had cone shaped epithesis of the phalanges of the hand and a human would be able to tell that that's related to the short middle phalanx of the finger. But what we want to do is have the computer tell us that and the computer can calculate the degree of similarity between those two terms. We don't see that term as represented for our 14 year old boy and in fact we even see terms that are the opposite of the standard profile so long toes versus short toes. And so the algorithm takes into account the overall similarity of the set of features for each individual patient compared to our whole corpus of more than 7000 disease gold standard disease annotation profiles. So if you click the next button please. You can see this is basically what we're trying to get after and in building these computational tools is for diagnostics to work like this the reference knowledge for this information needs to be systematically query ball. And this is really where it's really exciting to see Dr. Chung's work because that's in fact one of the major contributions that will be useful to us in our diagnostic tools. As well as our work with the Global Alliance for genomics and health which I'll talk about at the end to build standards for individual phenotypic characterizations. Next slide please. So this is just a quick overview of the diagnostic tool. Eximizer that's led by Dr. Damian Smedley in the UK and essentially our patient comes in as I mentioned they standard practice is to get a whole genome sequence or whole exome sequence. There in the middle. What we have added here is the HBO annotations as I mentioned and also that cross species comparisons to disease reference as well as protein protein interactions to make a very long story short we essentially leverage standard genomics methods to remove off target variance common variants and benign variants etc. But then come really combining that with this phenotypic similarity that I had showed on the prior slide to come up with a prioritized list of variants that's much shorter hopefully than one that you would get with just using the genomic side and we've had great success in leveraging the Eximizer tool in the genomics England project and in a variety of other settings and it's open source and available to anyone who wants to incorporate it into their pipelines. Next slide. So what is the most clinically useful way to define and group diseases so one of our challenges that we came across in doing this work in developing Eximizer and the phenotype similarity tools is that we we had a lot of different disease entities. From across the world and different in different databases and different registries so these different disease concepts span multiple categories and we needed a systematic way of relating them all so that we could have, you know, kind of a unified wide open steiner syndrome, for example, entity that would collate the information coming from different sources that might have a wide open steiner entry. But there are many terminologies and ontologies and lists, as well as many many mappings between them. These are both a good thing and a bad thing they can be used to crosswalk, but the problem is those mappings between all these different issues and I'm sure there's almost no one in our audience today that hasn't. For example, Mac mapped omen to orphaned or vice versa is that the mappings are often mutually inconsistent. There's n squared minus n set of mappings and there we often in the mappings don't really know what's a one to one equivalent. And so this makes it very computationally challenging for us to leverage, you know, all these different resources as the sort of disease entity handle by which we would hang those phenotypic profiles as I showed with vitamin Steiner syndrome. Next slide please. So this is just an example of some of the reconciliation that we've done. And some of the challenges that we that we face. So here, for example, we have three different sources orphaned. The national cancer institutes the source and the disease ontology where we see, for example, paroxysone biogenesis disorder. You know, is a subtype of rare hereditary metabolic disease with peripheral neuropathy, whereas in the NC it's a subtype of leukodystrophy. And then, you know, over on the right we see that Zell wager syndrome itself is a is a subtype of paroxysomal disease. And so you can see that computationally, if we're going to try to reconcile the knowledge associated or data associated with these entities and these different resources that we have a really big problem because there are different labels different parents different children different synonyms different text definitions and different clinical usages even. And so how do we determine whether or not these entities are equal and clean up the associations between these very complex and not very well aligned disease resources. Next slide please. So in comes Mando, which means for the world, which is an evidence based curated merger of equivalent disease concepts from all as many different resources as as as we have that are relevant to rare and Mendelian diseases in particular, although as I mentioned, we are curating additional environmental diseases and other infectious diseases as well. So some of the sources there are others are shown on the left OMIM orphaned NC IT guard from NCATs, EBI experimental factor ontology and many others. And essentially, an algorithm developed by Dr. Chris Mungle cave boom is the Bayesian owl ontology merging algorithm was used to initially seed the reconciliations that essentially equivalence cliques of these different disease entities from across different resources so basically trying to examine for example a Zelliger syndrome in OMIM, the same as Zelliger syndrome that's in the NC IT and we can use a variety of different algorithmic approaches to determine that and then a curator looks at whether or not that grouping of equivalent classes from across the different resources are actually truly equivalent. And for the ones that the algorithm cannot understand. And so that helps everybody who's who's using mesh as well as our work within Mondo. Next slide please. So this is what that looks like. So for example, we found that in mesh, there was a set of duplicated disease entities that had both Roman numerals and alphanumeric representation. So for example, diabetes type one with the Roman numeral one versus diabetes type one with the number one. And so that was able to correct that and so that helps everybody who's who's using mesh as well as our work within Mondo. Next slide please. So this is what that looks like with a real example so in the middle we have exact matches here for adult rough some disease. There's a Mondo identifier and you can see in the in the column with all the identifiers that have been determined to be equivalent. What the term IDs for those sources are also referencing the full provenance of the synonyms. So if we get synonyms from OMAM or UMLS those synonyms that are over on the left we track the provenance of every synonym. So we determine this equivalency through the through Mondo reconciliation process that I mentioned earlier but then we also provide that provenance for the information coming from each of the individual sources for both rigor and reproducibility but also for attribution. We also look at how we have non equivalent term IDs for cross references. So for example, here adult rough some disease is really not equivalent to phytanic acid storage disease but some of the sources contain that cross reference and so we track that as well. And so this really aids cleaning up of this computational challenge that we have of all these different disease resources really not agreeing with each other to try to understand what are the equivalencies and what is not equivalent and keep track of the provenance over time. Next slide. So, in order to sort of kind of test out this concept we looked at, you know, how can we update the number of rare diseases. And, you know, it's a fundamental premise that if rare diseases are not counted that rare disease patients will not count either. And so we just took five commonly utilized sources of rare diseases the NC it the disease ontology guard orphaned in OMAM, and we look to see how many diseases were in in one source versus multiple sources and interestingly enough we found that just five sources and granted we have many rare in in in Mando at this time. We found over 10,000 unique rare disease concepts. So that's, that's a pretty big difference from the original 1983 orphan drug act number of 7500. But even more interestingly, we found only 333 diseases that were shared and all five of these resources, and many many many disease entities that were only in one resource. So what that means is that by reconciling this disease knowledge computationally and through clinician and curator assisted sort of mediation of the equivalence determinations. We have a much better opportunity to create improved diagnostic tools that leverage all of the world's disease knowledge in those diagnostic profiles that I showed you for wider men's diner syndrome. And we can reconcile the knowledge that is associated with each one of these disease entities. Next slide please. So defining diseases is is really the subject of Dr. Chong's work and so just to kind of give you some perspective of why her work is so critically important at this time. And I'm going to give you a little bit of an overview of just some of the work that we have been participating in in collaboration with with Clinton Jen and Omen in particular. And what this is about is really how do we define and and and name diseases. And we worked really closely with Clinton Jen for a number of years to on what we call the lumping and splitting group, which basically created disease. Including guidelines and if you're a birdwatcher, you're probably a splitter so you can have more counts on your on your bird list. But for diseases, it's really, really important for diagnostic tools and for mechanism discovery to really think through what does it mean to be an individual disease entity. And so does the does the disease entity have a distinct molecular mechanism that would tend to lead towards lumping those disease entities. It's a reputable assertion of difference distinct clinical management distinct phenotypic profiles. And really, this is all about cure the Clinton Jen's efforts to curate the gene disease associations. And these are, these are essentially yes no decisions for lumping and splitting. And it's, it's a little bit subjective in some cases sometimes some of the characterizations are in conflict and would lend towards lumping and some would lend towards splitting, and the curators work very hard to document that those decisions and the rationale. This takes a lot of effort, there are an enormous number of people on the Clinton Jen team that goes through this process was great detail so that our diagnostic tools can be accurate and we have enormous gratitude to them for doing that. That has really provided a lot of fodder for the kind of work that Dr. Chung is going to present next and and really just wanted to highlight the fact that, you know, one of our biggest challenges again is that different diseases are defined in different ways by different groups with different attributes. So we have this, this, this shared work going on in collaboration with Clinton Jen and Omen. But there's also all of the many other resources that are similarly curating different components of disease gene associations. You know, we have Clint var we have the gene was catalog we have the comparative toxicogenomics database. And there are many others and so fundamentally we need computational methods to predict what our new Mendelian disease gene associations as you can have more than one disease per gene, and more than one gene per disease. So we in order to do that we really need both case level phenotypic data and high throughput gene disease association discovery. And with that I'll hand it over to Dr. Chung to tell you about her work in this area. Thank you. So I'm just going to hop right into it. So, because as Melissa said these disease entities are divided to find in different ways, and different groups use different criteria to access gene disease validity so that is when a gene is established to an underlying a dealing condition. No one really agrees on how many Mendelian conditions are known to exist let alone how many genes are currently known to underlie and dealing conditions. And you would think that it should be easy to count genes and it's actually not. The count means that we have a very incomplete understanding of the genomic landscape of Mendelian conditions and we think this is a problem because studying dealing conditions really the most straightforward opportunity to understand the relationship between genotype gene and phenotype. And that's just simply because, you know, for the vast majority of affected individuals once you find the causal genotype, you've necessarily implicated the target locus you don't have to do find mapping for example. So, we have a new Mendelian gene disease relationship reveals variant effects as well as the effects of gene disruption on human phenotype. So, more of where this wide variation in counts makes it harder to study whether whether there are systematic relationships between genic or genomic properties and whether a gene underlies Mendelian condition. And we particularly do not yet understand the latter relationship of how perturbation of the genome governs that variation in the nature of Mendelian conditions. So summary is that we're better at identifying pathogenic variants than predicting their phenotypic effects. And the end result is that even after we find a causal variant clinical diagnosis and health implications for an affected individual can remain unclear. So, you know, nevertheless, you know, even with these flaws it's obviously, you know, there's lots of people who are trying to work on improving our systems. Can we still use a data that's available right now to predict how many genes will eventually be shown to underline Mendelian condition, which I'll just say MC just for short sometimes, and identify which features a genomic features are most predictive. And so to do that, you actually need to separate the known genes that are already known to underline Mendelian condition. The genes that are that underlie a not yet known Mendelian condition we'll call them future gene discoveries and genes that will never never say never but never underline Mendelian condition and that's because even at the current pace of gene discovery which is being shown here in red as well as the pace of syndrome delineation, which is the pace of gene discovery has is down from its peak in 2015. But the field is still reporting about 150 new gene disease relationships per year for Mendelian conditions. And so we don't want to train a model that treats all genes that are not currently known to cause a Mendelian condition or to our knowledge at this time, as genes that will never be implicated because they're really different. So in this analysis, it's, it's complicated by potential biases and circularity in a lot of different ways just to kind of give you some idea of what I'm talking about. You know, there's, it's genes that are already implicated in Mendelian conditions right now, tend to be have a condition, underlying condition that's more clinically recognizable than the genes that are underlying conditions that we haven't yet, or we are currently now delineating or identifying. We know that there's a bias in the literature so there are, there's more annotations available for known genes. And that's because, you know, obviously once someone's once we've established that there's a gene disease relationship. That's the, you know, that's the that's like the go start, you know, start your engines for all the people who are doing model organism work and, and all sorts of experimental functional experiments. And then, and then finally, you know, specifically the genes that are underlying Mendelian conditions are more likely to have a mutant, like for example a mutant mouse or zebra fish, or even fly or sometimes yeast have that generated and phenotype and so that leads to these more annotations, as well as a phenotype for the model organism. So we, we created a model to do this we put in a you know actually a sparse number of features, but we were looking at both gene level features such as number of protein coding transcripts per gene number of paralogs and the identity with paralogs, as well as some metrics like population constraint and conservation of genes. And then protein level metrics like the number and type of domains of the of the protein that's encoded by the gene. And we use something called the auto glue on, which is a machine learning library that combines many different machine learning techniques into a single ensemble model. And so to feed into this analysis since I said we have to split have to split our genes, we use, we essentially trained on genes that are known to cause or to underline and dealing conditions so that means they meet the gen CC gene disease relationship criteria of being having definitive strong or moderate evidence for relationship. So we classified our no never genes so genes that may never be, you know, we think will never be shown to underline and dealing condition as all human orthology genes in MGI that have no abnormal phenotype detected in mice, where there's been at least one into mouse that's been phenotype. And so these I'm just going to show this in the table here as the known genes are future discoveries which is what we're interested in in calculating and then our never genes. And so in total we actually our model ends up predicting that a total of about 17,000 genes will one day be shown to underline and dealing condition and that's actually 85% of all protein coding genes. And we've only implicated 20% of those genes so far so we actually have a long way to go at the current pace of gene discovery it's like maybe about 100 more years. And then this 17,000 gene figures is actually somewhat of an underestimate for a few reasons, including we know that some of those never genes are actually, you know genes that were only tested as a knockout mouse and the gene might only lead to an abnormal phenotype to gain a function for example. And we also know that sometimes their mouse knockouts are normal have normal phenotypes, even though a human with the similar variants are affected by a Mendelian condition. So I've bolded the features so far that are most that have the highest importance. So these are that's here. They kind of, if you're sort of in this area, this kind of makes sense to you right the number of transcripts the number of paralogs, how long the transcript is these are all going to be correlated with whether or not it's like you're likely to underline and dealing condition. And we're actually still working to add new metrics that are unbiased by existing knowledge of known genes so that's actually really one of the challenges because, as I mentioned, there's a lot of, once you've established the gene is underlying and Mendelian condition. There's a lot more research that gets done about that gene and so and so trying to figure out what metrics are unbiased by that knowledge is is a challenge. But we're hoping that we can use this model to provide both a per gene likelihood of of underlying Mendelian condition as well as a binary prediction of yes or no. We think this will one day be shown to be a Mendelian condition. And we have a overall AUC of 0.76 which is actually pretty good. So we also plan to explore whether we can develop a model that does these predictions from first principle so from genomic properties alone so right now we're taking advantage of conservation metrics and population constraint, but those aren't really properties of the gene or genome itself. And, you know, we don't you're obviously this is this is probably like a pie in the sky vision but you know ideally I'd love to be able to say that we can do this just from looking at the genome, and the gene itself. Alright, so we can do reasonably well at predicting which genes are likely to one day be implicated in MC, but how well do we currently do at predicting the number of Mendelian conditions caused by pathogenic variance in each gene. Mike Bamshat and I had coined this term phenotropy a while back to describe the phenomenon when one gene underlies multiple distinct Mendelian conditions and this is a play on plyotropy, which is when you observe that you know perder being a single gene leads to one condition that has multiple clinical findings or affected tissues or body systems. So the phenotropy count would be the number of distinct Mendelian conditions due to variance in a gene. Phenotropy has historically sometimes been referred to as a lealic affinity but we think the term is really confusing and not very specific it's as it's been used. It's currently being used to describe unrelated phenomenon like binding strength of a transcription factor to different alleles. So the question is, can we come up with sort of this genome wide map or assessment of prediction of the number of the phenotropy count per gene. Approximately 25% of all known genes reportedly underlie two or more conditions and so we applied the same modeling approach that we use previously except now we're trying to predict number of distinct disease entities meeting the Gen C C criteria. And when we do this we predict a total of approximately 26,000 actually Mendelian conditions with the mean phenotropy count per disease gene of 1.5. And this means again that we've only discovered about 20% of all Mendelian conditions that will one day be discovered so as you know once again as Melissa said it's not 7000 they're not 7000 rare diseases they're far far more than that. I've shown the features that have the highest important for our model here. But the important takeaway is that our model actually performs pretty badly the R squared is only 0.1 and so why is that what we did pretty well with predicting which genes underlie MC is but why are we doing so badly at predicting the phenotropy count for those genes. And it's really because there's no objective way to determine right now when the phenotypic distributions overlap for collections of individuals affected individuals so that they represent a single MC versus two or more. For example, the criteria is used to exert the existence of multiple conditions for a single gene can differ between phenotypic classes so for example on intellectual disability or developmental delay versus, let's say skin disorder. And they differ between those phenotypic classes as well as between genes. And this means that current counts of Mendelian conditions are not based on a uniform assessment of phenotropy. So here's just a concrete example of a gene that's been reported to underlie depending on what database you use and who's reporting between 13 and 20 different Mendelian conditions. And the question is not just is 13 right or is 20 right but are these all these different disorders that are listed are they actually even really equally distinct from each other or some of them actually really the same thing. So one of the goals for our word is to try to try a different approach to syndrome delineation or essentially defining what is a Mendelian condition in order to increase the objectivity and precision of this process to establish quantitative criteria for distinguishing between conditions, and ultimately we want to hopefully improve our understanding of the distribution of phenotypic effects per genotype and gene. So we are developing this approach for post hoc so after the gene has been discovered so it's not for diagnosis and it's not for discovering a gene discovery, but we're doing this post hoc delineation of what we're calling a quantitative Mendelian condition or QMC. We're using machine learning techniques, particularly something called topic modeling which is sort of the machine learning technique field family that is used to analyze like a newspaper and say like oh this this paper is mostly about this article is mostly about politics and you know 20% about sports. And we're also using techniques from Phenomics to calculate similarity between phenotypes or HP terms and this is called semantic similarity. Our model is, I think a little different than other things that have been done because we're using both genotype and phenotype information simultaneously. That all kind of schematically works like this, we take each individual with a pathogenic variant in a gene of interest. And then we set each individual here, each row, and we encode their genotype and their phenotype or their clinical findings in a matrix where each column represents one clinical finding. And then you just have essentially count one if one individual has that clinical finding. And then each row represents a unique genotype. And so then if you have multiple individuals who have the same genotype you can kind of sum up their information and aggregate across individuals into one row for that genotype. And so then once we've created this matrix of one row per genotype and one column per clinical finding, we perform a technique called non negative matrix factorization or nmf and that decomposes this matrix into two factors a genotype by QMC or the quantitative condition matrix and a QMC by clinical findings matrix so that each QMC is defined by a set of genotypes and a set of clinical findings. So here's one possible outcome of when we, you know, would do this when we would use our model, where we would, you have this sort of mix that matrix of genotypes and clinical findings. And then once you perform nmf, you can actually identify, oh, there's, you know, two different QMC's here. There's no overlap in the genotypes and no overlap in the clinical findings between the two different QMC's. Then here's another example where you might identify two QMC's where the clinical findings do overlap, but there are no overlapping genotypes between the QMC's. And then finally, one last example where we really can't actually make heads or tails of these two different groups and we think that there's such significant the model says there's such significant overlap between genotypes and clinical findings that we only identify one QMC. So to develop our model we used two different sets of cases we've we're using cases that have been extracted by manual curation from the literature and individuals case set of individuals who received clinical diagnostic testing from our collaborators gene DX. So we've identified individuals have to have pathogenic variants identified in the gene of interest that we're analyzing, and their phenotype must be explained by that variant or variants in that gene and that gene alone so there's no additional variants that are considered to be contributing to the phenotype. And literature cases in the clinical cases have different biases and challenges so the literature cases, you know, are obviously been selectively curated and reported by the authors of the papers. And so, you know, we also might be missing contributing variants that were not reported in the paper at all. The clinical cases have their findings identified by natural language processing from clinic notes or test requisition forms. And so we might end up with completely unrelated findings like this person had a fever. They have, you know, food allergy. And so we have to do, you know, our model has to be able to kind of exclude these unrelated findings. And in both cases we may be just dealing with missing data, as well as noise in the choice that someone makes for which HBO term to label a particular to describe a particular finding. So, for example, you know, one person might say something, see the same kid and say this person says they have intellectual disability. Another one might call it neurodevelopmental delay. And another one might give it mild developmental delay and they all actually mean exactly the same thing. Okay. So, here's a broad overview of our process and our model for delineation of these QMCs. We do some pre filtering of the data to exclude HBO terms if the term or the parent terms are only found in a single individual. And we also exclude terms that are too broad to be very informative. And then we create this that genotype by clinical finding matrix I mentioned with each finding weighted by the proportion of conditions in the HBO database or HBO ontology that are annotated with that clinical finding and the proportion of variance in the data set found in an individual with that clinical finding. We calculate phenotypic similarity using something called Eric approach for pairwise semantic similarity. And then we define our QMCs as, as I mentioned before the collection of clinical findings and variants that are grouped together to evaluate our model performance views precision and recall metrics for both variant level precision recall and clinical finding level recall. And these are summarized by something called f score 100 is great. So, we're going to do like 100% and we manually right now while we're developing the module model we manually label what's considered true or not. And so I'm, we've been testing on both these literature cases and gdx cases. And I'm just going to show some results from analysis of one gene in the gdx deck. One of the first genus got raised made by XM sequencing in 2010 was that variants and came to D cause kabuki syndrome, and some of the diagnostic criteria listed here. And interestingly in recent years there's been this putative new syndrome, currently unnamed I'm for brevity I'm just going to call it charge like that was reported individuals who also have pathogenic variants and came to D. And these are the most common clinical findings reported so far for this charge like syndrome. So we applied our model to 224 individuals to receive testing by gene DX and had a likely pathogenic or pathogenic variant and came to D identified. And after quite a few iterations of our model in which we tried 14 different combinations of approaches for assessing phenotypic similarity and waiting or clinical findings. And we were able to see pretty good performance on this, you know, 90s, for especially for identifying that charge like syndrome at 770 to 80%, almost f score, despite that's relatively the really small number of charge like cases in the data set. We have a word cloud with the most common HPO terms in each of the QMCs that our model identified. And you can really see that some of these, some of those terms that were reported in the literature as being distinctive findings for each of these two respective conditions are the same findings that are coming popping up commonly in our word clouds. So we're pretty excited about this. We have, you know, quite a lot of work to still do we're working on ways to test co occurrence frequencies. We're planning to next expand the model to incorporate variant different variant annotations and so that essentially like an ontology for variants and handle bio leal like variants for the recessive case and we want to understand how to apply our model across multiple genes and different context of literature literature versus clinical cases. The goals are ultimately to improve the precision of clinical diagnoses and rapidly identify genotype phenotype relationships or lack thereof, and hopefully one day get this sort of achieve this objective less bias next generation morbid map across the genome, which will facilitate understanding the distribution of phenotypic effects from gene perturbations. I wanted to emphasize that we need far more individual level of variant phenotype data to have the power to analyze all genes so we're limited by what is available in our data set. We're not going to rate cases from the literature but that's a very manual process that's really hard to scale. And they're also not representative of real world real world real world data. And so data sharing is really critical for for us to expand this project. That's just amazing and as you might imagine. I'm incredibly excited about Dr. Chang's work because it's really building upon a lot of the work that our community has has built over the years in terms of computational phenotyping components. And so I just wanted to wrap up with sort of almost a call to action for the future, which is really about how do we do better with sharing phenotypic information so that investigators like Dr. Chang can really build the computational tools that we need to improve our diagnostic characterization for patients and our mechanism discovery initiatives. So, as you all know, standard exchange formats existed for genome sequences such as your fast day files or VCF files, but we really didn't have a sort of comparable exchange standard for phenotypic information. As part of the Global Alliance for Genomics and Health in the Monarch Initiative, we set out to create a standard called a phenopacket. Next slide please. So, phenopackets really came about when I asked my colleague if we could just hand over a packet of phenotypic information in the same way that we have these exchange formats for genomic information. So, the idea is to really just improve the overall individual case level phenotypic descriptions for, you know, all of the use cases that I've represented before, but specifically whether or not a phenotypic feature was observed or not. How the phenotypic information is linked to the patient. What about the patient parents and siblings. Do they also share the same phenotypic information and associating that information with a pedigree file for trio analysis. And how severe are the phenotypic features are some more than severe than others and when were the features first observed and if they were ameliorated at some point in response to treatment, for example. How can we, you know, how can we sort of document all of this sort of phenotypic characterization, not dissimilar from the information that you might create for metadata for a bio sample, for example, if you have a genomic information. So this is really the analogous standard. Next slide please. And so this is just sort of thinking about all the different ways that we could use case level information and as Dr. Chong had mentioned one of the big challenges that we have is the literature so in the literature, where we have rare disease descriptions we might have a small cohort. You know, several dozen patients and you usually see tables in those, in those, those journal articles that might say this many patients had this phenotypic feature and that many patients had that one. But not what each individual patient actually had and all the characteristics that I mentioned on the prior slide. We really need that information for the kinds of algorithms that Dr. Chong was mentioning, as well as for our diagnostic tools to really understand the trajectory of phenotypic characters that change over time. And also the penetrance and expressivity of those features across a large swath of the world and this is really challenging for rare diseases when the literature is really not giving us that information that we need. We have a few sources of that information coming from the wonderful collaboration that Dr. Chong has with with Gene DX and and ClinVar, but there really is a great need to exchange phenotypic information at the case level. Another use case is really thinking about when a clinic sends a lab out to a clinical lab for doing the diagnostic work. They send out a PDF of the relevant portions of the electronic health record or a candidate diagnosis, but not, not necessarily phenotypic characterization of the patient that would be used in the diagnostic tools like the XMizer tool that I showed you earlier. And so there are many different kinds of context, many different providers of phenotypic information that could leverage phenopackets, many different data sources, including clinical notes, really extracting information from the notes as we know much of the phenotypic characters are documented there in or in imaging metadata and the like. We need to be able to combine these data sources with things like genome sequences, pedigree information, mobile health data and have mechanisms for ingesting this case level phenotypic information into the EHR into research models like OMOP for exchange and fire. We have a Vulcan accelerator project focusing on converting EHR data and exchanging it out of the EHR using fire for those clinical labs. Application based entries such as the phenotypes tool which is very commonly utilized wonderful tool for capturing HPO terms for cases for rare disease patient cases, and also for sequencing and testing outputs. There are many different use cases and users that I don't need to go through anymore but suffice it to say that our goal is to really create a whole ecosystem of phenotypic data exchange to support the computational needs of and bringing phenotypes up to par with, you know, some of the centers in the genomics context. And so just, you know, really excited to announce that the phenopacket standard is on its second version it's also now an approved ISO standard, and is being used around the world, and really excited to to have in the gen has their whole bio banking system now generating phenopackets for their bio bank samples, and the EBI bio samples repositories also sharing phenopackets now as well and there are many other use cases documented in the Global Lands for Genomics Health website and on phenopackets.org. Next slide. So, just last but not least I just kind of wanted to talk about, you know, one of the challenges of, you know, doing this kind of computational phenotyping and one of the things that Dr. Chung's work will really complement and synergize with is our work for public health surveillance of rare diseases, using electronic health record, if we had more phenotypic information and better tools that we could integrate using the genotype phenotype association data that would come out of her work, as well as our existing annotations that I mentioned earlier. This is a some work that we did in collaboration with NCATS, called the ideas initiative where we compared 14 rare diseases across different electronic health record systems to look at the disease trajectory of care for those individual patients, and how were the how did the care and the expenses of that care differ across health systems, as well as the incidence of that rare disease in those different health systems. And this is just an example of what it looks like for an initial diagnosis of baton disease at 14 years of age and you can kind of see some of the phenotypic characterizations, laboratories, treatments, hospitalizations that occur. I just wanted to put this in here to remind everyone that describing the phenotypic features of an individual phase patient for phenotypic characterization for diagnostics or for the gene disease associations and Dr. Chung's work is only the tip of the iceberg. We really need to incorporate many other characterize characters of the patient's journey into supporting public health surveillance and diagnostic tools that might include things like, you know, types of care that are seen, you know, or, or specific treatments as part of the computable phenotypes for the electronic health record surveillance work. Next slide please. All right, so I think some of the takeaways that we're hoping to you're hoping to get we're hoping you get from this is that, you know, computational methods can reveal novel disease entities as well as mechanisms. We really need out more algorithms and better algorithms to aid prediction of been dealing disease genes and the distribution of phenotypic effects. And this work would not be possible. Also, I want to emphasize without the prior investment from NHGRI and others in resources like HBO and Mando and phenopacics and that overall development of phenomics as a field. Thanks, Dr. Chung. I also just, I think we've hopefully encouraged everyone that we really, really need individual level genotype and phenotype sharing across research and clinical contexts. And then model diseases and their attributes in the same way, regardless of the domain geography around the world. And finally, diagnostics and it's and translational science will be greatly aided by all of the above. And so with that, we will look forward to hearing your questions and our discussion. Okay. So, as Chris mentioned, I'm Lisa Chadwick. I'm a program director in the division of genome sciences at NHGRI and I'm the program director for greater. So that's how I know Jessica. I see that there are questions coming through in the Q&A. Don't forget to put them in there. I don't know if I don't think you even have access to the chat, but don't put them there because we won't see them. While I wait for questions to keep coming in, I'm going to ask you one of my own questions. So, you know, both of these talks really showed us how important it is to have like comprehensive phenotype information about an individual that you can mine, right? But I think like we all know that there are a lot of challenges with actually getting that, right? So you're at the other end, you're still based on what a person is doing. And it's a clinician, they are busy, they're trying to put stuff in their notes, but maybe they're even a specialist. They're really only looking at one part of the phenotype and they're not capturing the whole phenotype. So how do you address that? So that tools like what you talked about here can be their most successful. Maybe either of you. Yeah, maybe I can start because I think we probably have complimentary ideas and expertise at trying to address that problem. It is a really huge problem. As I mentioned, the zebra phenomena, you know, you can't expect a general practitioner in a rural healthcare setting to really know what to do with an extremely rare disease that they'll likely only see once in their whole life. And so we need to build better clinical decision support tools. At the same time, those clinicians are incredibly overburdened. And so there's this tension that needs to be overcome, I think. And so, you know, we really aim to develop tools that really help clinicians in any clinical setting support the moving along of a potential rare disease patient. That's why that public health surveillance activity with electronic health record data is so important because if a clinician is notified that, hey, your parent patient has characteristics that make them a potential candidate for further diagnostic workup for a given rare disease, then they can get moved along faster. And we call this very technical term the zebra button and the electronic health record, which would basically be a way to say, hey, there's a there's a flag, your patient, your patient may have a rare condition. And so we have the next labs or other evaluations that you might do to help determine that or otherwise refer them on and the computational phenotyping and semantic similarity algorithms can really give us that. So if it's very similar to Amazon where users who shop for one thing shop for another, we can do the same thing with some of the phenotyping comparisons. So if a certain lab set of lab values come back as abnormal. So if it's characteristic of a component set of components of a known rare disease, we can say, oh, you need to do this last lab and once you do that you'll know if they're a candidate for further workup for for rare disease or not so really sort of bootstrapping what knowledge we do have in the HR to get the clinicians to request additional labs or additional information that would then help them move that patient along so that's that's some of the work that we've been working on. And for my side, when one of the things that that's why it would be really useful to have much, much more data right because you know for example we had about 230 cases from gdx but that's just one lab you know if we really had all the cases around the in one database, you know then you can start doing things essentially like imputation. So right now you can do imputation for genotypes, where you know if one person is done on chip X and the other person is done in chip why you can kind of use references to kind of fill in the missing, the missing information and kind of do a similar thing if you have enough data. I think I'm looking at the questions in the chat so I think there's another one that sort of is in line with this. They mentioned that they often look at clinical tables, and having some sort of standard for clinical tables to capture the attention each patient might be helpful like, you know providing some tool that is like, here's all the things that we want you to know about this thing so make sure that you know whether these things are here or not. Other questions so there are a couple of questions that sort of relate to what you were talking about Jessica about predicting how many genes might be involved in phenotypes. So, one question is about how do you deal with the genes that at the moment have no known function in any organism. But they might be conserved across all organisms does the fact that they're conserved is that enough to suggest that they might be important enough that they would lead to a phenotype Sunday. Also, I think, you know some phenotypes are not going to be observable in every condition so how do you really know that it's a, a never case where it has no phenotype maybe it just doesn't have a phenotype in that specific condition so how do you approach that. Right, so, so the never case is that yes I completely agree there's all sorts of caveats and that's why I say never with quotes because never actually saying never in biology. But we need to have something to say that these are the negative so either it could be true that every single gene might underline in dealing condition. That's one way to deal to think about it, or we can say that these ones are the least likely or the least easy for us to show. So the phenotype, not having not having a current phenotype model organisms so that's why we actually built our model to not use any information or annotations about phenotypes in model organisms. Because we didn't want to have that bias. And so we're trying to go like as as broad and as close to genomic level annotations as possible. And so then it shouldn't it shouldn't matter that there is no phenotype for other organisms. So, the questions are about, you know, I think one of them really gets to the crux of the whole topic which is like how do you define a single disease and distinguish that from another disease. I think I would also put put in the same category the concept of a phenotype expansion like how do you know what's a phenotype expansion, and you might want to define that for the audience versus a totally different phenotype how do you approach that. Right, so so phenotype expansion, if for the audience is you know this idea that you know everyone accepts that this condition whatever this disease entity is is described by blah blah blah blah phenotypes or clinical findings. And then now someone finds some individual that has a very distinct different additional finding for example. So then you're saying we can say that that is expanding what is known about the phenotype due to mutations in this gene. You can also be the absence of the phenotype that's technically be a phenotype contraction but everyone's kind of using expansion to kind of as an umbrella for both of those. So, from what we're trying to do. Yes, the, we're hoping that what we want to do is say like, there's all different ways for clinic clinicians to manually define what is considered to be one disease and then this is another distinct disease. And so we're saying, can we not do that and just give it to the computer with and do it in sort of this agnostic way and what do we get out. And so it's not saying that this is necessarily always the right answer or the one that every clinician will agree with. But if we use the same model and same criteria across every single gene how many different conditions and what conditions are we going to get out. You know, one, you know, one of the things I do want to go back to, you know, what the clean gen now collaboration is around the lumping and splitting so one of the considerations though from a clinical perspective is that, you know, you may have a phenotype expansion. You know, in, in one patient compared to a standard definition or compared to another patient, but fundamentally when it involves different clinical decision making. There's a good reason to computationally define it as a separate entity because then our tools can better support the clinical decision making. Activity for the next time a patient with that where disease comes in right so it's really important that we combine these sort of clinical needs and and sort of, you know, granular curation approaches with that mind with that goal in mind together with these computational approaches because at the end of the day the best way to make sure this can get delivered to clinical care is to make sure that the decision making process is as as easy and accurate as as possible and when those differences exist. It's really important that they're they're made front center. So, another question got at one of the other things that complicates this whole process so you've got phenotypes and you're trying to figure out what phenotypes are associated with what with what gene what with what disease. But there are also many times where an individual might have two different genes that are actually contributing to that phenotype. How do you, how do you work that into this whole model. I've not gotten that far yeah I'll be very honest actually even right now trying we're going to write the next thing we're working on is trying to figure out how to handle the receptive case, which would be two different variants in the same gene. It would be easy if we had 10 1000 cases for every gene, because then you can you can expect to see many different combinations of two different variants. But without that, we haven't really figured that out yet it might we end up being that we group by the type of mutation or something like that. And so, Digenic is a whole nother ballgame. And some nice work though, which I think will be a great input to Dr Chung's future work, using just the phenotype similarity alone to disentangle multiple molecular lesions in the same patient or in the same pedigree so I think it's a really wonderful idea and that that, you know, and again that is based after like this tension between what's a phenotype expansion or just variability of phenotypic features in a single molecular lesion versus you know how do those phenotypic features differ and interact when you have multiple molecular lesions. So, another question that I had was, you know, another kind of phenotypic data that a researcher might be generating from a patient sample is more of a molecular phenotype data so how do you incorporate molecular phenotypes into this overall framework of defining a disease and associating with a gene. I'd say for us we will rely on this being built into an ontology. So that can be my second answer that part. It's, I mean I think we, we have sort of in a sort of very early way I should say it's very it's very early efforts tried to do that so if you if you look we have a we have a collaboration with the loin. project which has a lot which is the ontology for laboratory measurements to create interoperability with the human phenotype ontology and some of those cases might might have things like you know if there's an expression assay outcome that that could be then coated with a unique value that can then be converted into a human phenotype ontology term. So that's one one approach. Fundamentally though I think we need to do better to think about specific diagnostic omics signatures that are associated with their individual disease entities. So for example if you know maybe the indication of the future might be that for a given disease, you need to have an expression analysis and examine, you know, comparisons for a specific set of biomarkers in a specific tissue for certain age groups right so these are sort of like the diagnostic guidelines that that information really needs to be associated with these same disease entities. It can sort of have you know signatures in the ontology as well as sort of like you know, expression comparison between x and y, you know is high in such and such tissue. So we can certainly encode that in the human phenotype ontology but I think fundamentally there's going to be, you know, thousands and thousands and thousands of these types of specific, you know, multi omics approaches that are associated with individual diagnostic areas that we need to really come up with a strategy for associating those indications with individual disease entities, and we have not figured out how to do that yet. So, another question I see in the chat is about the genes that you defined as sort of future discovery genes Jessica. How do you think that we could think about this in terms of medical genetics so if you see a patient you don't have anything that's associated with something that is already a known gene in the anti association but you have a B us maybe in one of these genes that is predicted to probably be involved. How can we take advantage of that kind of information to help inform, you know, very prioritization and interpretation. Oh gosh I really hope so that would be like the dream right. So I think right now, at least on the gene discovery side, we, if we see if we have a case and they have an individual with the, you know, protein truncating variant, like a frame shift in a gene and we look up the gene and they are like, you know, highly constrained for loss of function variation so you don't see, you know, nearly the variation you loss of function variance you expect to see people already are saying like okay that's a, that's a hot, hot hit right that's a hot and so you know what we hope would be that you can do this for all genes so not just the loss of function mechanism but everything and say and be able to say like yeah this patient is a, it's a hot goose. We've been talking about this a bit in, in the, the Gregor consortium about, you know, what, what criteria could be used to define levels of boosts essentially boosts. My favorite classification of a B us is the ice cold B us which really makes me think of ice ice baby. Yeah, that's, that's interesting. So I also see another interesting question here so we talked about earlier about the challenges associated with even gathering this phenotype information from clinicians. But of course patients and their families are really invested in this as well so how can patients help in contributing phenotype information that's going to help define what their phenotype really is. So, I'll say that we have six years ago seven eight years ago now we, we actually got an open science prize from NIH to develop a website that we call my gene to that's actually meant for families will supposed to be family friendly and patient friendly to essentially create a profile saying, you know, here's my, you know, here my symptoms and here's my variant if I if they have one variant of interest and then we actually use some of these, you know, HBO tools to extract and recognize the HBO terms from whatever family types in so they don't need to know the technical terms that can kind of magically recognize it from their data, and then, and make that information available publicly. And I think that's something that families can do is share their information. I think a lot of families seem to be they're very busy obviously if they have a kid who's affected their, you know, maybe even overwhelmed, but I can't emphasize enough that families have information that even the clinicians don't have because they're living it every single day. If that information right now the standard of the field is that either they're the only ones who know, or it's in a, you know, as maybe their specific disease groups registry that is pretty much on closed access to everybody. If it's closed access. I think I'm a firm believer in open data sharing if it's closed access. No one else had knows about it we don't know it exists we can't use it really needs to be public for it to be used. And of course, as you know and Gregor we're thinking a lot about how we can share these data the most openly as we can because you're exactly right that's the real key to understanding these fan types is finding other people who have the same kind of and using that information all together. And I also know that a lot of our programs have patient driven sort of ways that they can even register to be part of the project so I think there is a lot of places where the patients and their families can help participate in this process. Melissa you said you were going to answer that and then I think we'll wrap this up. Yeah, I just wanted to mention also that we've been able to translate the human phenotype ontology into what we call lay person ease for use by patients directly and create we have a project that was funded by Corey that we're just about to push out so it'll be a manuscript any day now or preprint where we basically created synthetic profiles for all of our rare disease gold standard annotations using using only the lay person terms and then looked at the diagnostic efficacy of those if patients were to provide those terms and it turns out that that patients provide information that is often missed as as Dr Chang Chang said that you know is often missed in the clinical setting so for example if a baby is inconsolable or someone snores. These are not things that are clinically evaluated you know at you know at times so we we actually know that it's incredibly important to have computational representation of the patients features from the patients and families put together with the clinical evaluation and that's going to be maximally diagnostic if and really should be a collaboration anyway so we're really excited to help build tools for the patient community. Yeah, that's a great point a lot of the terms that you had up on your slide earlier were things that I would have had to Google so I would have had a hard time reporting to you if I had any of those phenotypes. So I think that brings us to the end of this seminar I want to really thank both of our speakers Jessica Chang and Melissa Handel. This was a great talk we really enjoyed it. I know that this will also be available online if you'd like to watch it again you can. And Melissa and Jessica also noted that their slides are going to be available at a bit.ly link. And if you watch it again, they said what the link was it's probably in the chat as well. Yeah, maybe. Okay, so I hope that you'll take advantage of that but otherwise thank you all for joining us and we hope we'll see you at the next genomic innovator seminar. Thank you for inviting us.