 So, I am Jennifer Johnson. I work in Les' lab and I'm a staff scientist and I'm going to talk to you today about secondary variants. So, some ways it's not the exciting part of the presentation, but in some ways I actually think it can be really exciting. And I have two hopefully points that you'll take home today from my talk. One is that if you're doing exome or genome sequencing, these variants are in your data and I hope to convince you of the second point that you really need to look at them and at least think about returning some of them to your patients. So I'm going to start my talk by kind of going through kind of some reasons why we might think about returning these variants, but a lot of my talk is going to be the very detailed like how do you go about analyzing them and I think it's going to be at a little bit more in depth than some of the other talks are. So hopefully people out here are actually doing exome sequencing. We'll appreciate this. Hopefully I won't lose others of you. So they said I can just, I think it's fine, do I just? That should not work on those tests. Okay, there you go, thanks. So what we've been talking about for the most part today is primary variants. And primary variants are the reason for most of you that you're doing this research, okay, it is the one or two variants that are responsible for the disease in your probands. We're taking probands that have really interesting diseases, we're doing whole exomes or whole genomes. As you've heard, we end up with tens or hundreds of thousands of variants. Usually get a handful of variants of interest, maybe up to 100, and then we hope to have a gene discovery, a really cool paper, maybe some functional studies. Unfortunately, what I'm going to tell you today is about everything else. So that's the 10 to 100s of thousands of variants minus the one or two variants that were your primary variants. So what do we do with these variants? The reality is I think many of us ignore them. We are very excited about our primary variants. We're going on. We're doing functional studies. We're writing papers, and these variants are living in our computer in a file somewhere. I hope to convince you that you need to take them off the shelf, analyze them, at least look at them and make sure there's nothing there that you need to be returning to your patients. Why would we do this? Well, in our lab it's actually a secondary line of research. We want to understand which of these variants are helpful to our patients, which ones do they want to get back, which ones when we return these variants to our patients that they actually use for their healthcare. But I would also suggest to you that at least for a handful of these variants, I think you have an ethical obligation to return them to your participants, and hopefully I'll convince you of that by the time we get done the presentation. I don't think I'm the only one that feels this way. The National Heart, Lung and Blood Institute convened a panel in 2010, and they set forth guidelines for research results that should be returned to participants. They actually split their results into three groups, those that should be returned, those that could be returned, and then, of course, this unmentioned group that's going to be most of your variants that don't need to be returned. But the criteria that they put forward for results that should be returned included important health implications for your participant. That would mean that there's both an established and a substantial risk to that individual's health. The finding should be actionable. There should be either a therapy or prevention that you could change to the course of the disease, and I kind of think newborn screening here, but it's the same kind of thing if you're going to give these results back to patients when they're not expecting them. There should be something that they can do about these variants. The tests obviously have to be analytically valid. You have to know that the results are true. The disclosure should comply with laws, and then the participant has to have opted to receive the results. That means that at some point there has to have been a discussion with the participant about these return of secondary findings and what they would want to hear or what they will hear in the future. NHLBI panel also split out this, could return, as I call it, or may return results, and the main thing here is that the benefit should outweigh the risk, and they said in the participant's perspective. Now this is really interesting because unless you've had that conversation with your pro band, you're never going to know what their perspective is. So this is going to be something that I think is going to be hard for scientists to necessarily do, but this is what NHLBI suggested, is that we should think about the perspective of our participants when we're thinking about what results we're going to return that don't fall into that should return category. So comply with laws, what do we mean? I think the main thing that we as scientists have to think about is CLIA, the Clinical Laboratory Improvement Amendment Act of 1988. Basically what this says is that we as scientists in research labs cannot return these results to individuals unless they've actually been validated in a clinical or a CLIA certified laboratory. You can CLIA certify your own laboratory or you can outsource this, but if you're going to return these results you need to think about the fact that there are laws that say they really should be CLIA validated. I'm just going to mention the fact that in 1995, well before the age of exomes and whole genome, the American Society of Human Genetics suggested that we should not be testing minors for diseases, for disease mutations where that knowledge wasn't going to help them until they were an adult. Of course if you're running exomes and genomes, even if you're not trying to test for BRCA1 mutations, you are by the nature of what you're doing testing for BRCA1 or 2 mutations. So you need to think about this before you actually do these tests, what are you going to return to these minor patients, how are you going to do that, are you going to hold the results, are you going to give them back now, not going to tell you the right thing to do, but it's something you need to think about. So I'm going to assume at this point that you've decided that you're going to return your secondary variants, I'm just going to go on that. Does someone agree with me, that's fine, what do you return? So I think about this in three different levels. I think about diseases that are important to my patients. I think about the genes that I know for sure cause those diseases, and then I think about variants that I feel have enough evidence of causation that I'm going to return them. We already talked about the fact that the diseases should have an obvious and severe threat, they should be actionable and treatable. You can think about the fact that if your patient is going to be diagnosed with this disorder in another way, if he's going to walk into his doctor's office and his doctor's going to say, you know, you have albinism, for example, then you don't need to return a variant to him telling him he has something that's obvious. But if it's something that's not obvious that maybe is a susceptibility variant, that's something you should consider returning to him. Proband versus descendant risk, I look at that as something that obviously like NHLBI, if it affects the proband, that's more in the should return category. If it's affecting descendants, maybe that's more in the could return category. These are some of the diseases that we actually consider with our ClinSeq cohort. I know we've mentioned ClinSeq, it's actually a clinical sequencing study that we are involved in our lab. We are aiming for over 1,000 exomes. Right now, the dataset that I'm working with that I'm going to actually show you today is 572 exomes. We are actually taking kind of an iterative approach to secondary variants. We have already annotated cancer predisposition syndromes in this dataset. We've looked at hypertropic cardiomyopathy, long QT syndrome, malignant hypothermia. Obviously, the list goes on. We've already done these annotations in our dataset and I should say they're ongoing. We are working on them. Other things you could think about would be thrombophilia, hemochromatosis, pharmacogenetics is something we haven't even really touched. Adult onset neurological disorders and obviously carrier variants which we have done some annotation in this disease set, in this patient set. So I mentioned the three levels. At the gene level, you can get information on diseases either in the human gene mutation database, online medallion inheritance of man or gene test. There are a lot of other places but you basically want to make sure, this isn't research. You want to make sure that you're looking at genes that you know really are, have a link, a pathogenic link to the disease that you're interested in. As far as the variants, you want to return variants that are known to be causative. You can return novel variants, they're gonna be variants that look very much like things you know are causative, that are gonna be highly likely to be causative, stops in a gene that you know stops cause disease. You might want to return them. And then you should consider the effect of telling versus not telling. If you have variants where you're not 100% certain that they cause disease, if it's a carrier variant, that's you know one set of things that your program's gonna have to think about. If it's a variant in CDH1 that causes stomach cancer and the only thing they can do is have their stomach removed to protect them, you better be 100% sure that that variant is causative before you return that to your proband. So now we have to start filtering variants and this is kind of where it gets kind of heavy. So for those of you who aren't doing whole exome and whole genome sequence, I'll apologize ahead of time. I'm gonna talk about the tools that we use in our lab. Many of you I know are using different tools. The other thing is I'm gonna talk about this kind of in a set sequence. You can do this in many different ways. It is iterative, but this is what we do. So we use varsifter kind of as our annotation source. Once we do our initial filtering, we go out to databases. So in our lab, we use the human gene mutation database and locust specific databases to gather information about our variants. We analyze support for causation, which means both going to the databases as well as to the primary literature and then we decide whether to return these variants. So what does this look like? Well, luckily for most of you, I think you'll be working on trios. That's a much smaller number. We're working on ClinSeq right now and for demonstration purposes, I thought I'd walk you through this data set. So right now we have 572 pro bands that we're looking at. There's actually two million unique variants. However, in order to use varsifter on our computers and have it work at a speed that's actually reasonable, we've already filtered the data set to non-synonymous, stop, frameshift and splice variants and gotten it down to a reasonable 182,000. So these are kind of our secondary variants that we need to look at and try to ask ourselves, do we return them or do we not return them? Well, I already told you that the first thing we do is kind of an iterative process. We look at certain diseases and the first thing we did was we looked at genes known to cause high susceptibility cancer syndromes. This is just a list of 37 genes. It was taken from Lindor et al. These are all known to cause cancer syndromes in adults. Our cohort, which you kind of have to think about this, who are your patients? Our cohort is 45 to 65 years of age. They're unlikely to be getting childhood cancers that they don't know about at this point, so we focused really on adult onset cancers. As Jamie had said, and other people have said, you can actually use a list of genes feed it into varsifter kind of filter based on that and that gets us down to 455 variants, which is still a big number if you're gonna look at each and every variant, but it's much more manageable than roughly 182,000. Once you have this smaller list of variants, you have to start filtering. So we talked about filtering based on quality. Varsifter already filters based on the most probable genotype score. If the genotype score isn't 10 or greater, it's not even included in varsifter. The next thing we do is we look at MPG, or the most probable genotype score, compared to the coverage. And just empirically, a number of 0.5 has been determined to be what you need for a quality score. You can see here on the lower left of the panel, you can break out individuals by genotype, and it's actually very easy. Let me see if I can use this pointer. So here's the genotype for the different individuals. You have the genotype score shown here in the coverage, and the highlighting here is for an individual that did not pass the MPG coverage quality filter. It's actually the only individual in this dataset with this variant, and so this variant can then be thrown out and doesn't have to be considered further. The next thing we do is we think about filtering based on frequency. This is when you're really gonna have to think about what is the disease that I'm looking at? Are you looking at a rare adult onset cancer syndrome? Are you looking at recessive variants that might be at one in 30 in your population as a carrier frequency? You can use ClinSeq. If you're using Varsifter, you can use ClinSeq within Varsifter to filter your variants. You can use DBSNP, obviously DBSNP includes, can include pathogenic variants, so you need to consider that. Hopefully you can use 1,000 genomes. It's not in Varsifter yet, but if you're using Varsifter, hopefully it'll be incorporated someday soon, because that would be helpful as well. And this is just to show you that the ClinSeq information can be included in Varsifter. For those of you who aren't using Varsifter, you'll have to find other ways to incorporate this information into your data sets. So we set for our cancer set of genes, we set 1% for ClinSeq and 0.015 as a minor allele frequency for DBSNP. And we set these basically because our gene list included the most common disease that we were looking at was hereditary breast and ovarian cancer, which is approximately founded at a frequency of about one in 500. 1% is gonna be one in 100, so that we thought was a safe cutoff. I should tell you that if you're using ClinSeq to filter your variants, there's a very high population of Ashkenazi Jewish individuals in our ClinSeq cohort at 17%. For things that have founder mutations in that population, this can trip you up. So if you know that you're filtering for hereditary breast and ovarian cancer in Ashkenazi with a control set that has Ashkenazi Jewish individuals where they have founder mutations, maybe you wanna do a positive for screen for those founder mutations so that you don't miss them in your individuals. So we've taken out a third of our variants by doing this filter we've gone from, I actually had 455 on the last slide. There were a couple of variants that actually did not change protein sequence and so we took them out initially. That's how we got to 451 and now we're down to 334. So how do we evaluate these candidates? Well, unfortunately for this 334, you actually have to go out to the databases. So for each and every one, you kinda have to go out and look at the database. Whether you're using the human gene mutation database or locus specific databases, you need to ask the questions. Was this variant found in controls? Are there multiple reports of this variant or is it only seen in one paper? Is there functional data out there? Do I believe the functional data? Is it present with other causative mutations? All of these things add a little bit of evidence for or against the variant being causative. The last thing is segregation. A lot of people use segregation kind of as the gold standard, but you need to be aware, especially when you're working, looking at secondary variants where your individuals don't have disease. You're not, it's very different from the research that we've done before. Where you have a family, they have a disease. You know that it's a cancer syndrome and now you have a gene that looks like it should cause a cancer syndrome. Here you have a healthy individual standing in front of you and they have a variant. You need to really be sure. If there's segregation data in a paper out there, you have to think about the fact that there could be linkage disequilibrium with another causative mutation that they just didn't discover. And so when you're looking at your individual that has this mis-sense that you look at and you think it doesn't really change the characteristics of that amino acid, always consider the fact that segregation analysis isn't 100% in publications. If you're using HDMD, you can use some shortcuts here. HDMD is gonna tell you, so HDMD is actually included in Barsifter, that's why I have it up here. Barsifter includes both the HDMD ID as well as the disease that is thought to be caused by the variant. There's actually a question mark included at the end of the disease name if the curator feels that the evidence isn't 100%. And so you can use that as a filter. The HDMD tag shown here is included in Barsifter. DM means or stands for disease causing. I'll just tell you what that means is that the curator's assumption is that the primary literature or their interpretation of the primary literature is that that individual said, that author said it was disease causing. It's not always the case, which I'll show you in a minute. I wanna make sure you all know that there's a difference between the HDMD that you can get out there on the web for free and HDMD professional. If you're doing this, you need to make sure you're using HDMD professional. It's available through the library. You have to go and you get a log on through the library. I've included the link here for the library. The HDMD that's free out on the website does not have all the variants and does not have all the usable features as HDMD professional does. If you go into HDMD professional, you can search on gene. If you're lucky enough to be using Barsifter, it includes this information. You can search based on mutation. The search fields are over here. So here I'm searching for APC, but if I was using Barsifter, I would actually plug my mutation IDs directly into this. You can drill down on your mutation. And this is another thing I wanna point out, especially if you're not using Barsifter. So HDMD in any database, the amino acid nomenclature is gonna be based on a specific reference transcript. And there are many transcripts for a lot of genes. And so the amino acid nomenclature, if you're searching based on that, will vary based on your transcript. So here, where's the transcript? I think it is, right here. Here's the transcript information. You need to make sure that that transcript is the same transcript that you're using to get your amino acid nomenclature that you're comparing back to HDMD. So here's primary literature. HDMD links you out to this directly. And I wanted to show you this is a BRCA2 mutation. We have 81, 82 G to A in our cohort. It's listed as disease causing in HDMD. There's no question mark after breast and ovarian cancer. But as you can see, it says all three heterozygous variants were observed in two healthy women with a history of breast cancer. Okay, up here, sorry. In HDMC, the risk is uncertain. And this is the pedigree. If you can see this, there's one individual with the missense mutation. She does not have breast cancer, although she has a history of breast cancer. So I think most of us would look at this pedigree and say, yeah, she has a family history of breast cancer, but there's no segregation data here. She's not affected. This is not obviously disease causing. So HDMD said it was, but you need to be really careful and you have to, at least before you return something, you have to go out to the primary literature. In addition to HDMD, you can go out to local specific databases. There are actually two websites that I use. They're listed at the bottom of the screen and on the slides for kind of a list of all the different local specific databases. The nice thing about these is they are actually curated by experts for the most part. And so in our lab, at least we trust the data in these databases maybe a little bit more than we trust what HDMD says. You can go to whatever gene you want. There are often many databases for a gene. These are all for APC. We typically only go out to one database. We don't go out to all of the databases and we look for our mutations. If we find our mutation in a database and it has sufficient information, because of the number of variants we're dealing with, we don't go out to every possible database. Again, when you drill down, they're using a specific transcript. This is the main thing to realize. You have to, especially even HDMD, information is included in Varsift or this is not, you have to make sure you're using the same transcript. Are you gonna miss things that are there but the amino acid nomenclature just doesn't match up as you're not using the same transcript number. This is a list of all the variants. Cut off the left-hand side of the screen over here. If you go out to these databases is the pathogenicity scores. There are scores in the numerator for what the paper says, whether they believe the variant is pathogenic or not and then the denominator is what the curator believes. Another word of warning here, at least for this database, this curator has put a question mark for all of the variants. So just because this curator says that they're variants of uncertain significance, it doesn't actually mean he looked and determined that. It's basically no information. So you have to be aware that the curator may or may not have actually looked and interpreted each and every variant. So the numerators from the paper, the denominator, if the curator actually looked at things is what he thinks, many of them are just question marks. The important thing here is you have amino acid information that allows you to compare your variants. You have DNA information, but if you scrolled over to the right, which I'm not going to do, there's a lot of information. It varies by database, but it often has like, is this mutation found in controls or is the variant found in controls? Is it found in effective? Is it found multiple times? We often look at different reports of a variant as evidence of causation. I just want to show you here. So this variant is the same variant reported three times, but if you look to the right, it's all in the same paper. If you go out to the paper, it's one family. So really it's only one independent report of that variant. It's not three independent reports of that variant. So that's just kind of another thing you kind of have to think about when you're looking at this data. Really quickly, if you're trying to compare amino acid nomenclature, the easiest way to take what you have and convert it to what's in your databases is by taking the genomic positions that you're given in whatever your annotation source is. You can go out to this website, Utilizer, put in the genomic position. It will give you a list of variants which cover that position, pick the correct variant for whatever database you're trying to compare it to, and then it'll eventually give you the correct amino, it's probably off, it's probably below the table, but it gives you the correct amino acid designation. You can do this as a batch file. Typically I take my varsifter files and I do the whole entire list of genomic coordinates in Utilizer as a batch file, pick out the correct transcripts for the diseases I'm looking at and then convert them to the correct amino acids. It actually saves a lot of time. So what did we find, I guess, is the next question in our 334 variants. Well, just really quickly, it's hard to keep track of 334 variants. You need to kind of break them out some way. So we've adapted a pathogenicity scale that was initially put forth by the International Agency for Research on Cancer, and they were using this to grade the pathogenicity of variants in cancer syndromes, and that's what we started with, but we still use this scale for all of our variants now, and basically it allows you to break things out from definitely pathogenic to benign with the probabilities basically of 1% at the top and bottom, and then the 5% at the top and bottom right below that is likely pathogenic or likely benign, and then there's this huge thing from five to 95%, which is uncertain, which of course is where many of the variants are gonna fall. We've added a quality score of zero just because it was helpful for us, for our variants. I think these slides are available online if you can't see this, but this is just kind of a subset of our analysis, and basically it's just a subset of our cancer genes with their genomic coordinates, and over here are the reasons that we scored these things either pathogenic or not pathogenic. We have a BRCA2 variant that co-segregates and it's a frame shift, it co-segregates in the primary literature and it's a frame shift. We have things that have been found in unaffected relatives, found with causative mutations, present in controls, so we've moved those down to a two, and we did this for 334 variants. The bottom line is you have to read the primary literature. You can start with HGMD and LSTBs, but there's conflicting information. It's not all correct. Certainly before you return, sometimes if you have enough variants, you're gonna have to filter based on HGMD and LSTB, but if you're gonna return things, you have to go out to the primary literature. This is how our 334 variants broke out. What you can see is really the interesting thing is we had seven, five, so seven variants out of the total. We thought we're definitely pathogenic and needed to be returned, and four we thought were likely to be pathogenic. Remember this is only 37 genes, so at least for the subset of genes, this is what we ended up thinking that we needed to clear validate. This is a pie chart just to emphasize the fact that most of your variants are either gonna be unknown significance or they're gonna be benign, and these are the ones that we filtered out based on frequency, so it's a large percentage of our variants. So what did we find? We actually found, so there are five variants that were in BRCA-1 or BRCA-2 that are pathogenic. They're actually in seven different participants, but only five different variants. We found two variants that were fives in MYYH, which is actually a recessive, causes a recessive phenotype of colon cancer, and then there were two variants that we thought were score four, one in SDHC for paraganglyoma. You can see the locus-specific database. The paper had said they weren't sure, but the curator actually thought it was pathogenic, probably because it's a stop early on in the protein, but we didn't think that this was enough evidence to give it a score of five, and then Agin and Feliculin was questioned by the curator as to why there was truly pathogenic, but the paper thought it was pathogenic, and there was no indication in our pedigree, which is another tool that you can use to look at. There was no indication in the pedigree of this disease, and so we left it at a four. So seven's a big number, at least we think so. It's more than one in a hundred, and I understand you may be sitting out there thinking, well, I'm just doing a trio, I'm not doing 572 exomes, but one out of a hundred individuals may have a BRCA1 or BRCA2 variant. All of these variants were previously described. They were obviously causative in other families. They were all associated with familial high-penetrance cancer syndromes, and the most important thing is only four of these seven individuals actually had pedigrees that we would have looked at and said, yeah, there's a high-penetrance cancer variant hiding in this pedigree. The other three pedigrees did not suggest that we were gonna find this at all, and so we think that for these three pedigrees at least, these are potentially lifesaving results that we're giving back to them. This is just one of our pedigrees. This is our proband here that came in that has a BRCA2 frame shift. You can see his mother did have breast cancer in her 50s, but she's the only one in the pedigree that's actually had, at least is reported to have cancer in this pedigree, but you can also see that his siblings are all brothers. He has two nieces, but they're still in their 30s, and so this is something where these two individuals may certainly be at risk for breast and ovarian cancer. These individuals may be at risk for prostate cancer. This could be information that's gonna save somebody's life, and I believe this one's been returned at this point. I just wanted to show you this briefly. Again, it's in the handouts. You can look at it. We're trying to make this a little bit more scientific, and so it's not kind of like, yeah, it looks like we have two pieces of evidence, but you need to basically take, if you're gonna do this and break out whether things are pathogenic or not, you need to take what the basic database says, so whether things are novel and not found in the databases, or if they're pathogenic, according to the databases, varying on certain significance or benign, and then what type of mutation it is, and then add in what you know. So add in, is my pedigree showing me that this variant should be there? Is it found in controls? All the different things that I talked about, and then you can use this to kind of grade your variants, and at least in our lab, we're returning fives. We're not returning fours, things that are likely, for the most part, for these syndromes, we weren't returning fours, things that are likely to be pathogenic, although, like I said before, depending on the disease and the gene, you may decide to return things that are likely pathogenic. Cautionary tale, so CDH1, I mentioned it earlier in my talk. We identified a mutation in one of our ClinSeq individuals, and the mutation is this alanine 298-3anine. If you go out to the literature, it was found in an individual who was 36 years old. So he has the mutation. He had two family members who got gastric cancer in their 30s. They were not tested for the variant, but it said all three mutations here conferred loss of ecautia here in function in an in vitro assay. So we had three family members and a functional assay that was saying that this actually changes the function of the protein. This looks like it's causative, and this really concerned us. We didn't, I don't think, really expected to find something in CDH1 in a healthy 60-year-old individual. This is the pedigree, and we questioned what to do, but the reality is, we have a healthy 61-year-old individual. She has relatives that did not have gastric cancer. She has siblings. This individual died really too early to have tested the gastric cancer gene, but this individual's in his 40s or 50s, or six individuals are in their 40s to 50s, highly unlikely to have a high-penetrance gastric cancer disease variant segregating in this family. We sat on this for a while, and then we had a second individual in ClinSeq that has the same kind of pedigree with the same variant, and obviously at this point, we've said this really can't be a high-penetrance gastric cancer variant, and we've downgraded it, but had we told her, her only option would have been to have her stomach removed to prevent stomach cancer. So it was really something we struggled with, at least initially. So, let's see how I'm doing for time. I think I'm fine. Carrier variants, I think carrier variants definitely fall into this, could return and not should return classification as far as NHLBI, but you have to think about the same kinds of things that I talked about before, and you may also think about the threshold, a threshold set for disease incidents. So a really common carrier variant, maybe like CF, you'd be more tempted to return, versus something that's one in a million, maybe you'd be less likely to want to return. You have to think about this framework again. What are the frequency cutoffs that you're going to use? We use a 15% frequency cutoff for ClinSeq. The reality is for our ClinSeq cohort, when we were looking at carrier variants, we did not use a DB SNP cutoff because we felt that 572 exomes was really enough of a filter. However, you could use the minor allele frequencies in DB SNP or 1,000 genomes. And then the list of recessive disorders is huge. It's close to 2,000 if you got to OMAM and kind of do a search. And that's too many genes, too many disorders for us to look at right now in ClinSeq. It would just return too many variants. So the one thing you can do again is you can limit your set. And we've limited it to 78 genes that are offered in a prenatal panel by Ambrygen, which is just a company that offers prenatal testing. Bell et al is listed here on the slide and they actually had suggested 448 severe recessive childhood disorders that they looked at. But the bottom line here is this is all iterative. Hopefully someday there'll be this magic program that we can put all of our variants in and get a little list out at the other end. It's like 10 variants. And you know that they're important and you can return them. We don't have that right now. So we're just kind of trying to start someplace and go back and just keep looking at this data set for other variants. So I just wanted to mention that I was one of the first ones that said let's not return things that are like one in a million in the population. Because if it's only one in a million, how important is this gonna be to our pro bands anyway? Assuming that our pro bands are still having children which ours probably aren't, but at least some of yours may be if they're trios for rare diseases. So if you have a common disorder, so let's just take CF, I rounded it up to a one in 30 carrier frequency. That's gonna give you a one in 3,600 frequency in the population. If your pro band is a known carrier for CF mutation, you've taken them out of the equation so to speak. And now the risk to a pregnancy of that individual is one in 120, which is 30 times the population risk. I think most of you might, I think you would agree with me that one in 120 is pretty high for a risk of having a child with CF and it's probably something that we might wanna return. But let's look at something that's one in a million. So what does one in a million mean? Well, one in a million means one in 500 carrier frequency, right? And then if we have a known carrier, if one of our pro bands is a no carrier and we take him out of the equation, we're just down to a risk of one in 2,000. That's 500 times the population risk. You might argue with me that one in 2,000 is something that you're willing to not return to your patients. I don't know where the cutoff should be, but just realize it's something that sounds like one in a million, which sounds like a huge number. One in 2,000 isn't quite as huge. And you need to think about this when you're thinking about carrier variants. So what did we find in ClinSeq? So the bell paper suggested you're gonna get about three variants per individual for rare recessive childhood disorders. Just looking at 78 individuals, 78 genes in our 572 individuals, we got 10 stops that were in the human gene mutation database and 216 non-synonymous variants. We also got novel stops. We got frame shifts, which because of the way Varsifter does this, they're not really linked out to HGMD. We got in-frame deletions and splice sites that were not in HGMD. And then 623 non-synonymous changes that we really haven't followed up on. Hopefully we'll keep going back to this dataset and more of those will become annotated, but today they have not been annotated. If you look at the individual variants in CFTR, you would expect to find Delta F508. We did in seven individuals, which is I think actually maybe a little low. We found BBS 10 mutations, which is something I work on. So I was interested in that too in 401. The bottom number is the number of individuals that were actually gave back a positive gene, a genotype that could be interpreted that was included in the Varsifter file. So it's not always 572. Because we have a high population of Ashkenazi Jewish individuals, we have Ashkenazi Jewish founder mutations for rare diseases. And you can see there are other kind of rare diseases up here on the screen. If you were doing a TRIO, which I think is what many of you are gonna be doing, it's gonna be a little bit different in that you're gonna start with maybe 90,000. I said 10 to hundreds of thousands of variants. So for just one of ours, we started with 90,000. And then instead of initially, let me just take this down to the next slide. Instead of taking it just down to the non-synonymous stop, frame shifts and splice, what we did here is we filtered it a little bit differently. And what we said is how about if we look at every variant that's in a gene that is known to cause disease in the human gene mutation database, and you can do this within Varsifter. So that got us down to 9,264 variants. If you look at the ones that are annotated in HGMT, you can get down to 362, and then disease causing was 113. And again, that's a pretty reasonable dataset. That's not even filtered for frequencies. But at 113, it's reasonable. If you filter it for frequencies, you can get down to 65 variants that you have to look at. In addition, you have to consider novel variants, and there were 19 novel stop splice and frame shift variants in this one TRIO. So it leaves us with about 80 variants to look into. So the paradox here is that we're generating a ton of information, and a lot of us are leaving it on computers. This information we say can be life changing for our participants, but the only way that they're gonna change their lives based on this information is if we give it back to them. And in order to do that, I think as you've seen, it's gonna take a lot of time and effort for each of the researchers here, or maybe for a handful of researchers to just design a great tool for us so that we can get down to our 10 variants that we need to return per individual. But we need to think about this kind of as a field. How are we going to answer this problem of secondary variants, and what are we gonna return to our participants so that they can actually make healthcare decisions based on the information we give them? So this just says the same thing. Secondary orientation is a burden. It takes a lot of time, but it is important. I hope I've convinced you of that, and we really need to improve the tools that we have available. And that's it.