 Great. We'll let Douglas move back to his seat. We'll open this up. I'm going to impose the rodent rule and ask the first question. And then after I've asked it, if you get my attention, I'll add you to my list. So this is for Dan, I know that you're very early in the genome business. But would it be called GAC, then, I guess, would be the... But the question is, have you thought about how you might apply some of the depletion approaches that you discussed to non-coding regions of the genome, since there's clearly some important information there that we need to be able to begin to think about how do we crack that? So, firstly, I can reassure you, it won't be called GAC or GEC or any of those things. The tentative name for it is Nomad, G-N-O-M-A-D. So for the whole genome version, we're obviously extremely interested in looking at constraint in non-coding regions, but it's worth being realistic about the challenges here. In coding sequence space, we have a couple of key advantages. We already know the chunks that are important in these regions. And secondly, and I think most importantly, we can divide up variation within those regions into different functional classes very easily. We know synonymous and protein truncating, for instance. In non-coding regions, both of those go away. And we're left actually with some real challenges in defining these regions. I think there's a good chance that we will be able to define a small fraction of the non-coding region as being under constraint using these approaches. But the regions will generally have to be quite large and very constrained for us to be able to pick them up using this approach. And I think it's going to take massive sample sizes before we can really start zooming in on, for instance, particular types of variants within particular enhancer regions where we can really say, this variant looks like it's likely to be pathogenic. Ted, do you have an estimate of what you think the sample size would be needed to do that? So it continues to get better as you get bigger. But here I think we're easily talking the high hundreds of thousands to low millions of samples to really get very tight resolution in some of these regions. I mean, just bear in mind, even for protein-coding genes, we'll still be getting gains as we start approaching a million samples, which would have sounded stupid if I'd said it five years ago. But now a million samples is actually not terribly far off. We will reach that. And I think the resolution then for non-coding regions will require us going even larger. Yeah, that's another good action item to come out of the meeting would be a million samples funded by NHGRI to do just that. So thank you for that. Thank you for that. So other questions for this session? If you got a million samples, Dan, would you have to put them all through that recalling pipeline again? Ideally. So the challenge here is if you, there's various ways in which this can be done. The anything that deviates away from the goal of having a fully harmonized variant call set will result in at least some level of batch effects and issues across the data set. And depending on how much that's confounded with things like population that can cause all kinds of chaos downstream. There are ways of making this a little bit easier. I didn't go into the details here, but the variant calling process works in two stages. The first is a very computationally expensive stage where we do single sample variant discovery. So we basically say for each of those samples, which sites are potentially variable and that results in an intermediate file, which is quite compact. It's possible to do that. It's so long as we, and we're actually working on this within the common disease centers, the NHGRI funded ones, standardizing at least some of the early parts of that process so that it's possible to then share processed versions of the data with at least one central location that can then do the final step of joint calling across all of those samples. Ultimately, although I think, at least I think right now, we still really do need to do some kind of centralized joint calling. That's, that's my bias at least. So, Danny, I have a question around, so how many genomes do you, whole genomes do you need before you can start having value to the data? I get that we're going to need millions and millions to really figure out what every variant does, but how many do you need to start having information that's informative today? So, right now we have five and a half thousand, and that's super useful. So, I mean, for the, for, it depends on which, which part of the analyses that I discussed today we're talking about, of course, for, for frequency filtering with, with 5,000 genomes, that means you can say, here is a non-coding variant that is above 0.1% frequency, and we can, we can throw that away for some dominant disease, for instance. So that's, by itself, that's, that's definitely useful. It's, and I, I want to be very clear in, in talking about hundreds of thousands or millions of genomes, that's really focused on that question of, how do we zoom in on constrained regions of the non-coding genome, and that does, does require much bigger sample sizes. There's a continuous spectrum of information that goes from even, you know, thousand genomes was not a waste of money. That was an enormously informative project. But then as we get bigger and bigger, we'll be able to do more and more things with that data. Yes, a technical question, Daniel. For the genomes, are you going to go back in and look at the algorithms for the variant calling, since they were developed for exomes, sequences that may not be perfect for the genomes? Of course, that means you'd have to do all the exomes, the genome way and all the genomes, the exome way, or both the perfect way, although which options are a bit complex, but still may be useful. That's a good, I'm not, I guess I hadn't really, I don't necessarily think that we're going to see much of a difference between the genomes and the exomes in terms of variant calling. There's, there's one way in which the variant calling will clearly be much better with the genomes, and that is because we have PCR-free whole genomes, which are glorious for things like indelcalling, for instance. We're not currently using PCR-free approaches for exome sequencing, so that's not quite as nice. That will probably require some recalibration of the model, but that will be done within VQSR, within the standard model development approach. I don't think we're going to need to do anything fancy, but I could be proven wrong, in which case that's going to be a pretty horrible thing to try to fix. I'm looking for you to do that work, to see whether you need to be proven wrong or not, since you have the data, and you're out of the road. So let me ask another question, since I'm not seeing any other hands here. I'm sorry? Oh, cricket, okay. Just a short comment. Yep, go ahead. So we've done some exome negative genomes. It's a small n, it's only 50. There's another 300 coming next week, but we've seen about a 15% pickup of missed SNVs from exomes, and a lot of that reflects the date of when the exome was done, part one, and the depth of coverage is perhaps the most impressive thing, because it's obviously far less uniform, and so it really, what we're going to see in broader numbers, I don't know, but we were pretty impressed with 15%. Yeah, I think there's an ongoing conversation about the relative merits of exomes and genomes, and that was actually a big topic of discussion at the, and how it is laughing at, that was a big topic of discussion at the centre meeting last week, both for the Mendelian centres and for the Common Disease centres. I think there was some consensus among the Mendelian centres that we're probably seeing, comparing a really good recent exome to a genome, the increased yield is maybe closer to 5% to 10%, but it will depend, as you said, on how good the initial exome was. Most of that will come from structural variants. Many of those, we've certainly got some cases in our hands that there are structural variants that are balanced and are completely invisible to exome sequencing, there's just no way we could ever pick them up, and we do see them with genome. But that 15% was not any structural variants, that was SNVs, single nucleotide variants, which is pretty impressive to me. So that's definitely very different from what we've seen, but I guess, again, it'll depend on the relative, I mean, what the coverage was of the original exomes. Yeah, I think that that reflects to some degree what the Nijmegen group has reported, although they would admit that a lot of theirs were on pretty poor exomes, and so their yield is much higher. In our experience, where we've done probably now 50 exomes and genomes, we have not as yet come up with anywhere we've identified confidently a variant in a genome that we hadn't already seen in the exome, but we anticipate we will see some. I want to come back to your wish list, because that actually was a question I had asked before you answered it. From your perspective, what could a group like this contribute in terms of the things that you are specifically looking for to enhance your resource? What would be the role that we could play either in terms of generation of data, contribution of data, the safe harbor issues that you mentioned in terms of tying phenotype? What would that look like? So there's lots of different answers to that question. One is about the generation of the sheer volume of data. So big sample sizes matter enormously here. These to some extent will come around through the natural process of the common disease centers. I was reassured actually to hear out of some of the discussions from the common disease centers that there has been a move away from a wholesale focus on whole genome sequencing towards exome sequencing, although I think there's definitely value to both. There will be substantially more value, I think, at this stage to having much larger numbers of exomes than there would be to have a smaller number of genomes. So that's a great outcome. So simple data generation is a big win. The second thing that I think is critical, and this has really been a challenge for us, is a whole series of issues on the regulatory front where it would be fantastic to get guidance and clarity about how we can actually proceed with things like, you mentioned Safe Harbor, but for instance, the use of European samples and how that's gonna change moving forward as European regulations start to shift. We have a lot of European samples that we need to figure out what to do with. What ethics are actually required for us to be able to aggregate samples together and share that aggregate data? And are we okay to keep doing this in the way that we're doing, what else do we need to do to make that easier? And then finally, how do we actually do this really difficult thing of being able to link variant data with phenotype data? Everyone who uses AXAC completely understandably wants to ask the question of that variant that I've seen in my disease patient that you have five people in AXAC that carry it, what phenotype do those people have? And at the moment, we have almost no way of answering that question. We can sometimes do it in a very ad hoc way. What we'd love to do would be to have some database that allowed us to actually push out systematically for suitably consented samples, phenotype data that was linked to those variants, but in a way that didn't de-identify the samples and was not considered to be violating any of the ethical issues around that. And right now, we have absolutely no idea how to do that. And I think some guidance from this group would be fantastic to help enable that. Let's do less first and then Dan. Yeah, so we've had some experience with this, having consented all of our sequenced individuals for post hoc phenotyping out of the gate. We have a little bit of an advantage there. So I think it's gonna be a tough sell and probably highly heterogeneous in the responses you're going to get from IRBs and other agencies regarding that question. So good luck with that. Thanks for that. But it's actually worse because I think for a lot of these variants, the phenotype data that you're going to want may not even yet exist. So what you're going to want to do is to do a post hoc, customized phenotyping and actually bring this individual in, these people in, and do phenotyping based on the genotype. And that is asking more. And it will be probably, mostly you would assume it will be unrelated to the reason why they were sequenced in the first place and people, as I have learned, get very nervous about anything that's unrelated to the primary indication. So there's your whole challenge there. But it can be done with the right cohorts and actually the main problem we have is that people are complaining that we don't ask them to do enough things. So it can be done in the right patients with the right consent. Great. Is there a way to add another, I mean, I mentioned at the break, I wanted to call them that said, what phenotypes have been associated? And I understand the difficulties, but is there a way to add his data or other kinds of predicted functional data to that set? Is that, do you envision that? Or do you think this is a sort of stand-alone and then you sort of take your variant and then go off to Doug or somebody else and figure out what the predicted function is? Is there one giant place we should be aggregating these things? So I was actually really pleased to have both of us in the same session because I think that two data types are really complementary and provide enormous amounts of both of them provide lots of information that really meshes together quite nicely. I mean, my vision for how this would work in a situation where you have a patient who turns up with a VUS is that you can immediately, you need to be able to look up three things very rapidly. The first thing is, has this ever been seen before in a collection of reference individuals like EXAC? If so, how common is it? Is it associated with some phenotype there? That's piece one. Piece two is, has it been seen before in disease patients? Disease patients like mine and that's where programs like matchmaker exchange come into play and those types of issues. And then if it's a completely novel variant, as Doug mentioned, many of the variants we see in our disease patients, in fact, most of them still right now are completely novel. They've never been seen in another patient or in EXAC. You then need to be able to go immediately to these large high-throughput functional assays and say, does it have some functional impact in this preprepared, amazing preprepared table of variant impacts that Doug's generated for every gene in the genome? And I think having the ability to have all three of those types of information laid out in a way that is accessible to people who are actually doing clinical interpretation is where the future has to be in 10 years time. The question is, how do we make the steps to get to those three resources all put together? Yeah, I think that that raises a really interesting question about the user face, user interoperability, the role of ClinGen, things of that nature. And at some point, and we've touched on this on several occasions during the discussions today about the perspective of the end user. And at some point, we'll need to think about how do we engage with those end users with all the great things that we have and say, it's not just about what information we have that we could give you, but what information is gonna be the most useful to you and how can we represent it in a way that at least we can raise the odds that it's going to be used in a proper way. And I think, we default sometimes to the idea, well, it'll be clinical decision support, which I think if we think about that writ large is probably true, but the details in terms of how you would deliver that and when you would deliver it and how you would get the data to operate in such a way that you could make it work at the point of care just in time is a daunting informatics exercise. One that we could potentially build on some of the work that was done at GM7 on genomic clinical decision support and the ClinGen work. But ultimately, I think we do need to think about this from a beginning to end type of a scenario. And without that end user feedback, it's going to be very challenging. We will likely design something that will once again not be utilized very well. Yes, Carol. It strikes me also, if we had a system that allows you to seamlessly get at those three areas than you just outlined, it also would help better align basic research efforts with driving clinical need because it would give you all those variants where we're just blanked right now, right? And where we need sort of the basic model organism work to jump in to try to functionalize them to move them into something that clinical relevance can be more obvious. So that whole, again, we're talking about data again, which I think is good for me. But I think that this vision is kind of what we need to get that alignment of the basic and the clinically relevant work. Yeah, so I think as we, that's a good point. We think about pushing the information out of the clinicians, but clearly it needs to be bi-directional. So thinking about how we could then lower the bar to contributing the information that is needed by our basic science. I mean, that's really the intent of this entire meeting is to try and sort that thing out. But as was pointed out, it's at the present time, even if there was a button for me to push in the EHR, that would allow me to send that data into some sort of useful repository, it would have to be routed through 55 lawyers and HIPAA specialists and compliance folks. And so figuring out how to do that. Now, whether PMI, under its rubric, is going to create some policy solutions to move this information is, I think, unclear at this point, but it seems obvious to me that this is another group that will need to be engaged because there they're talking about sample sizes of the size that we're dealing with, along with at least some set of phenotypic information that will be collected, although that we don't know what it looks like yet. Howard. So I wondered if either of you have looked into your data sets around allelic balance, and I specifically say balance, not imbalance, because that's one thing that's currently missing, but the kind of data you have, we could start getting that level of information. This is a bit broken, but right now in the cancer space, we're doing a lot of clinical sequencing. It's almost all, both locally and as to send out, tumor sequencing with no normal, mainly for cost reasons, it's twice the cost, almost twice the cost. And so we have a situation where there's these variants of a known significance, these variants of almost known significance, they're a domain or whatever it is, we call those VAX, but then there's also these variants that are likely pathogenic in a gene, that we hit it against ClinVar and it might tell us something. But knowing that there's a variant that was seen in your database, but it was only 3%, not 50%, would be really helpful as we try to interpret whether one of your patients had some residual cancer floating around and you picked it up or whether there's something else. And again, that's a very nuanced question and I'm sorry for that, I think there's a type of data that we're not really looking at. That could be quite informative as we're trying to apply this stuff. So yes, indirectly. So there's information actually within the VCF that you can download, you can actually have a look at the average allele balance across the individuals within the dataset. You can also actually, for most of the variants in XRQ, there's raw read data available. So if there's only one to five individuals present with that particular data, you can actually see the reads yourself in the browser and use that to decide whether the allele balance is correct. I think it's a fair point. And as I very superficially touched on this, but we have actually identified a number of previously reported severe dominant mutations that are in XRQ individuals, but clearly a somatic mosaics because the allele balance is way off. There may be only 10% allele balance. So I would say that the type of data that I talked about isn't gonna speak to that because we're starting with a library that we define that's a coding sequence. Although, we and others are thinking about assays for splicing and other things that could be useful in that context. I did wanna just comment on a thread that was going earlier, which is this idea of how to talk to a basic scientist. And I mean, in my own experience, is that it's been hugely helpful to hear from clinical people where is the actual need? Because it's kind of hard to figure that out. So if there was some sort of index of clinical need over all genes, that would be really nice. I got hooked up with the PGRN, which has been hugely helpful in clarifying my thinking about how to apply the kinds of methods that we've developed. Like I told you about Sark, well, we started there because I thought a few years ago, Sark is cool and there are inhibitors in the pipeline and let's see if we can build a resistance map for Sark, which is what that project's all about. But it turns out that clinicians about that, they're like, no, no, you should do something else. So I mean, I don't know, that's a Chinese wall problem, I guess, but I don't know. Well, I mean, you could imagine ways to tackle that in the sense of we do maps all the time for disease impact, whether it's based on cost or days in the hospital or whatever, and then within that subset, you could say, well, what are the known genes that could impact things like renal disease or cardiac disease or that sort of thing. And so if you did that type of an association with genes, you could at least have one measure of prioritization where you could say, well, if we solve something in this realm, the impact across the healthcare system could be enormous as opposed to this little piece over here, although that little piece over there may be much easier to actually do it. So I've got Howard on this strand and yeah, I know, but are you guys, now we've got 20 people, okay. Good, we finally hit the topic. So Cecilia and Steven, let me ask you, because you were first in the queue. Are you speaking to the same issue? Okay, so I've got you on my list. Let me just hold on to you for just a second. I've got Howard, I've got Les, I've got Deborah on this string, not on this string. Deborah will go in the other queue. Okay, this is the conflict, so Howard, please. So I think one of the discussions around this that may be to frame this slightly differently is that, and I had mentioned this to Terry, I think the variant of uncertain significance is not a problem if you know what the gene does, right? And so you mentioned this and you're trying to model this, right, so for example, cystic fibrosis. Once you know that that gene plays a role in the disease process, having a variant inside that, you can actually model what that would be, right? So one of the challenges is what we call a goose, a gene of uncertain significance. So the problem is you have a variant that looks like it doesn't work, has a problem in the gene, but you have no idea where it goes and so back to the basic biology you were saying, you know, is there a way that you could index? Yeah, all the genes that we have no idea what function is? I mean, and I'm not being flippant. I mean, there's a huge number of genes and we really should be thinking about how to prioritize that. Now COMP and some of these other projects are working on that, but I'm not sure how far we've taken that to a level that we can now start saying, what do we know about all of these genes that are unclear because I think that would help so much in knowing where we want to prioritize and think about some of those things. Yeah, just a quick follow up to what you said, Mark. I think one of the things that ClinGen is doing that is potentially relevant as you and I are both aware is actionability. And so questions related to being able to know what to do with a variant is at least some degree tied to how medically actionable it is if you want to know how much clinical traction there is to your data point. And so I think that's a starting point and as well it makes me think that potentially ClinGen ought to consider for actionability the concept here of how much do we understand about mapping a variant to a function of the gene in order to be able to do something to a patient. I like that title. Perfect. And I was just gonna follow up. So yes, definitely I think that's worthwhile for the actionability group to consider. We should take that back to them. And then as far as assembling sort of this list of what we know about Goose's, I mentioned earlier sort of our work in curating the clinical validity. So at least starting there to trying to really understand what evidence exists to authoritatively say that this gene is linked to a disease. At least that sets the stage then for doing the variant analysis. But that is also a place where we really could use some work. So we took the ClinGen group that's doing that started as far as integrating the functional data that the paper that came out in 2014. So that's for our starting place for how we're evaluating evidence in the literature. But I think we really could use input from sort of the basic scientists on if we're doing that correctly, what other evidence should be incorporated into that matrix to make sure we're not missing something big that's not gonna be useful back to the community. Great. And Dan, I saw your hand. Is this related to that string as well? I wanted to ask a good question, but wait, that can come later. Okay, I'll add you to the queue. So Cecilia, are you have... Yeah, I have a comment and question, maybe a combination. The exact database is clearly enormously useful. But one thing that's frustrating for me is the fact that the data is aggregated. So if you're interested in more complex genetic model and you wanna test certain hypothesis, there's just no way to do that. And I understand the reason why it has to be this way. But I'm just wondering, is there any other alternative? Can there be an honest broker? Or just go to Dan and get his help to analyze some of the data to ask these more complicated questions. But to me, enormous amount of genetic information is lost when we aggregate that data. And if we're moving towards trying to understand more complex genetic model of disease, we need that information. And so I'm not saying that I have an answer, but I'm just thinking that this is something that we need to somehow face and figure out a workable solution so that all of us who will eventually go down the path of looking more complex model of disease that we have some way to actually get more useful information out of that huge amount of data. Consent. Yeah, consent is the problem. So just to highlight the challenge here, so we can release data aggregated by variants because there's no way of re-identifying someone's exome from that process. As soon as we start releasing data that allows you to say there is an individual who carries these two particular variants and then index that across the whole exome, it becomes possible to reconstruct someone's entire genetic sequence or release a big chunk of it. So that means that publicly releasing that type of data isn't possible. The honest broker model, I think, is the right one. The problem at the moment is that EXAC, and this won't actually change, is composed of cohorts with a very complicated set of consent and data use permissions, where at the moment the only entity that actually has permission to store all of the individual level data is the broad. There's no way to actually transfer that to any other party. So we can certainly help with analysis wherever possible, and on that we're bandwidth and resource limited. But I think as other cohorts come together, PMI and other cohorts where perhaps there is thought putting up front to address these kinds of issues and it's not done in that opportunistic way, then I think there could be opportunities to be able to ask these questions that rely on individual level data. Great. Cricket, then Steven. I had a quick question about the use of trios, either at the exome or the genome, because there have been a lot of RFAs about those and it would be a shame to lose that information. And I realize that also raises, again, the level of confidentiality. And my question to Doug, but maybe I should wait, are we going back to Doug later on? I'll throw a question to Doug as well, which is that when you think about the proteins that you described for us in the functional assays, I was very struck that all of them are loss of function. And yet that's not really true in biology. We see some variants have gain of functions and I was wondering how you think about that. Well, so we can see gain of function in some assays, right? So we found about 1,000 gain of function variants across the seven proteins that I showed. The gain of function variants are rare compared to loss, but we try to set up our assays to detect both. The kinds of general assays that I was talking about at the very end, those are probably only going to be sensitive to loss of function. To measure gain of function, you really have to have an assay specific for the function of the protein. But we and others have done those on the scale of tens and I think even just if the rate doesn't change, we'll have hundreds of such data sets in a few years. But yeah, you can see them and they're there and they're interesting. Like I mentioned in SARK, we can see the canonical SARK gain of function mutations and they do exactly what they should do. They activate the kinase function, yeah. So I have Stephen, Deborah and Dan and that will probably bring us to time. So Stephen. Yes, I think I would agree with other people who have spoken of the value of the EXAC effort as something that is quite remarkable. I also have questions like Cecilia. My particular one is more recomputing the variant call file since no single aligner is going to do, you know, there's value in having several aligners and variant callers. And so direct question will be whether that's possible since that would I think add a lot of value. As you think about a future version of EXAC and you spoke about 120,000 individuals, I certainly hope that there'll be a focus on minority groups since that's really clinically where we're stuck, you know, with Caucasians. I think we do have really good allele frequencies and so we can be highly predictive when it gets to Hispanics, African Americans, Asian subgroups. We really often, we just, our allele frequency information is pretty much hopeless. And then lastly, just in thinking about what's next, I'd be really interested in other people's opinions on this, you know, something I'm struggling with myself. Is the next gold standard going to be a complete genomics exome or genome or complete genomics genome? These are very interesting questions since none of us yet have structural variant information of any type of quality that's going to be clinically meaningful yet. And so that's a whole piece of our diagnostic universe that really we're missing these days. But by complete genomics, you actually mean the BGI version or? I'm sorry, I always get those mixed up. 10x genomics chromium. Yeah, cool. Yeah, okay, go right here. I can, sorry. So there's three questions there. The first one was about testing different alignment and calling methods. We definitely think that there is clearly some orthogonal value to different aligners and callers. The challenges that the computing requirements for EXAC to do that even once were pretty monstrous. And to be honest, I just don't think I have it in me to do it again with another aligner and another caller. So I think for the foreseeable future, our goal is to do the best possible job we can with the alignment and variant calling pipeline that we have and just allow that that's going to miss, that people who use different pipelines will have different data types. And I guess with additional resources, it might be possible to go back and do that. But really, I mean, this would be an unbelievably expensive exercise to do that with, particularly going back to the alignment stage to do that on 200,000 exomes or 20,000 genomes. That's a lot of cost. What if it was free? Then it would still be incredibly painful, but doable, but doable. So I think in the magical world in which compute was free, then that would be pretty cool, yeah. The second question was about other populations. I didn't touch on this at all. Again, in EXAC V2, we remain completely opportunistic in terms of looking for the data that's available. Not just emphasise, again, EXAC doesn't do any of its own sequencing. We rely on other people to sequence populations and then be willing to share the data with us. There's big holes in our data set right now, and those are lighted over in the continental level breakdown that I show, but we're missing, for instance, the Middle East at the moment is a complete black hole in terms of reference populations. We have basically zero in Middle Eastern samples in EXAC, and yet, of course, Middle Eastern samples are overrepresented in our rare disease cases for due to consanguinity. That's an enormous problem that at the moment, I'm not sure that we have a clear path to solving in the sense that I don't know of any other large-scale common disease sequencing studies currently underway in Middle Eastern populations that are likely to have data shareable with us. If those do come about, we'd be delighted to work on including those samples. We're, of course, missing big chunks of Africa and many other populations around the world. Again, to the extent that NHGRI's focus continues to be on increasing minority representation, which I think it sensibly is these data will come along in time, but we're dependent on other people doing those experiments. The final question was about 10x genomics. I think other people should comment as well. I only have my own experience to go from here. We've looked at 10x exomes as well as genomes. For those of you who don't know, 10x allows us to look at linked read data, so it basically allows us to look at long haplotypes within the data, both for exomes and genomes. I'm very excited about it. I think it adds power, not only in terms of haplotypic phase, but also we can look at copy, it's much more sensitive to copy number variation incredibly so, and this is true even for exomes. So I think there's a lot more experimentation to do. We have a pilot projects that we've got underway at the moment. Once that comes out, I think we can see how that looks, but at the moment I think it really does add a lot of value. Great, Deborah. So I just have a residual comment that it is very difficult clinically to know where the truth is in the accuracy of variant calls, and that's something that I think we all struggle who do this testing. I don't know which system is right for doing the calling, but my question is, it seems we're paying attention to a single variant at a time, and the reality is that patients, all of us have multiple variants either in the same gene or in multiple genes, and there's pathways that are affected that cumulatively result in diseases, and how do we begin to approach that reality, which is moving away from more the single gene, medical genetics approach to genomics, to the reality of what each of our genomes is doing to create the phenotypes that we have, either medically relevant or otherwise. Is there any way to even begin to approach that? I have two answers, one of which is that in my world, at least that will require enormously large sample sizes, so as soon as we start relaxing our genetic models to go from monogenic to diagenic or more complicated models, that means that our power drops precipitously and it becomes incredibly important to build up sample sizes. My second answer is to point at Doug and say that he has to try and figure out how to solve this because I think the most likely approach here will be through some kind of functional system where it's possible to introduce pairs of mutations that are candidates within a particular patient and see if they have some non-additive effect in a model system. Could I just ask whether or not there's another approach to this other than power because each one of our genomes is fairly unique and so will you ever get the power to say that this combination of mutations is resulting in this disease because you see it enough times? Or is there some other approach that we have to start taking? Not that I'm aware of. I think apart from the experimental approach, it's diagenic and more complicated models are gonna be extraordinarily difficult to solve statistically, using statistical genetics and require massive sample sizes. So that's why I think the functional approaches will be more important. And I think maybe there's some hope there. I mean, we've started, so we've developed some methods for installing variants at multiple loci in an efficient enough way to actually screen large libraries. So that's coming down the pike in our lab and then there's been some really beautiful work using at least genome-wide gene knockouts in human cells to sort of build a two-by-two map of knockouts, right? And so I mean, I think that progress will be made but it's sort of a reflection of the same problem. It's just gonna take development and more every time you add a degree of interaction you're increasing geometrically the number of interactions. So a question that I have that I don't know that we have the answer to yet is exactly what degree is gonna be important, right? I mean, is measuring all, you know, all pairwise interactions enough or do we really need, as you suggest, to sort of be able to make and test each individual genome separately? Like that's a tall order. I don't know if, you know, I mean, I guess that sends you down the road of for those phenotypes looking at IPS cells, you know, patient-derived IPS cells individually, but that's hard to multiplex that kind of approach. So, you know, we're looking at pairwise interactions at the moment. That's as far as we've gotten. So we've been gifted a bit more times. So Dan had a comment and then Gail. So I actually had a question for Doug which is about, I wanted to push a little bit harder on the generalizability question because the key thing here is for that three-pronged vision that I laid out where, you know, a clinician can look up each of those three different things. The key thing is that for their gene of interest that they actually do have some high-throughput functional assay available, it's already been generated where that particular variant has actually already been put through an assay. So you mentioned, so obviously the assays that you showed were dependent on selectability as a marker. You mentioned that you could use flow sortability as another approach. What, I guess I'm just trying to get a sense as to what fundamental limitations that places on the number of genes you can actually, the number of genes and number of functions you can actually test using this system. And maybe even if a very hand-waving number, what fraction of total known disease genes as of right now do you think could fundamentally be assayed using these types of approaches and which ones are really totally out of reach? So those are all good questions. You know, I think starting from the last question, you know, probably the capacity to implement the technology, the sort of technologies as they currently exist would be, you know, on the scale of maybe hundreds of genes that could be, you know, over the next few years. I mean, obviously in the limit of resources they could all be done, but, you know, hopefully as we sort of get better and better, you know, thousands would be doable. I think in terms of phenotypes, using the type of approach that I told you about with flow sorting, then I think, you know, you're probably limited to we're thinking in the neighborhood of 10 phenotypes or so that we want to flow sort for. You know, we can test a bunch of them, pick the most maximally informative set of somewhere around 10 and then, you know, see what that gets us. You know, there are other ideas that people have talked about that, you know, if they work out, could, you know, be better. But they're not, I mean, in terms of what's practicable now or in the immediate future, I guess that's what I'd say, maybe 10 phenotypes per variant in a general sense and maybe a few hundred, you know, functional assays for the most critical genes. But I think an open question is, how many do we have to scan before we get good enough at making predictions? Like, is it going to turn out to be the case that every protein is idiosyncratic or if we, once we have a really nice dataset for a protein in a family, do we then have the power to make fairly accurate predictions for all the family members, right? And if you think about it that way, then the number that we need to examine drops, you know, quite a bit. So, Teri. Great, so before Mark calls on Gail, let me just ask my panel to converge up there so that would be Les, Greg, Kat, Erin, Liz, and me, if you would just head up that way, thanks. Great, and just to respond to Doug's comment, certainly some of the work that we did at University of Utah suggested that you could do pretty reasonable calling within families after you have a pretty, pretty well phenotyped set of genes within there, even working off of one or two well phenotype genes. So, I think that is likely to reduce some work. Gail, last word for this. Let's have a question and maybe Carol can help with this. So, I know in the mouse community, people spend eight to 10 years developing this outbreak cross and trying to put more defined diversity into the mouse to look at complex traits or modifiers. And so, I wonder for some of these issues, you're not gonna get necessarily the specific variant, it's not always gonna be perfect. But again, if you've got secondary variants and you're trying to sort out which gene may be playing a role, that for some of these models to go back and use that cross to help with some of these questions. Yeah, it goes back to this issue of being able to perturb the system sort of genetically and then look at the phenotypes that come out of that. And so, as a way of getting at principles of these things and modifiers that might be related to human disease, that certainly is the intent and there is work going on to do that now that leverages a lot of the genome editing technologies that are coming on board so that we can make specific variants and then put them in the context of neighborhoods of other sorts of variants and see what the phenotype outcomes are. So that information, again, I can see feeding in to the type of resources that people are talking about here, being able to address specifically in experimental context. Great, well, I've sufficiently perturbed the group here. So thank you for the good discussion in this group and we'll turn it over to Terry and the panel. Great, thank you very much. I know it's late in the day, we thought a panel would be something a little bit different to kind of stimulate folks a bit. And we had actually identified a few questions in case there wasn't enough topics for discussion. But we probably don't need to use those, we can kind of draw from what's happened today. One thing I would like to take advantage of that we haven't taken advantage of quite yet is that we have a number of NIH people around the room, both from our basic science division who've been very quiet throughout the day and also from some of the other institutes and would like to encourage them to speak up. I am horribly myopic and undercorrected, which should be really scary when I get behind the wheel. But it may also be scary here, so please wave if I'm missing you. But one comment that was made earlier was that we need some uniformly ascertained cases, at least a large number of them, for a wide variety of diseases. I think Callum, you made that point. Which seems to me to sound a lot like NIH Institute's disease specific studies. And I wonder if those around the room either at NIH or Callum or others can mention some. That would be studies that perhaps have done really exquisite phenotyping, but don't have the genomic sequencing that might be useful for us to reach out to. So these congenital heart disease groups, Cricut left the room unfortunately. But there's a congenital heart disease consortium, as I understand it. There's an ARDS consortium. I'm thinking of all the heart, lung and blood ones that I knew from my previous life. So I think there are many cohorts, but I already have them have been phenotyped in a limited number of axes. Certainly that's the case, for example, for the congenital heart disease cohort. I know Cricut in retrospect was mentioning some of it. There are very few of them have been systematically phenotyped, for example, in a neurodevelopmental axis at the same time. So those are the types of things that I think would prospectively be very straightforward. As I said earlier, I think the scale of what's required suggests that you might need to do that outside a disease-centered cohort. And even when you're thinking about some of the work that Daniel talked about, where you're trying to work out what the marginal risk for a particular allele is, if you've collected, if you ascertain your entire cohort and the basis of the disease that you're interested in predicting risk for, you're already in a sort of vicious cycle of prior probabilities that probably doesn't bode well for how you would look at it in the general population, which is ultimately what the individual practitioner is going to require. Quick question on that, Callum. What percentage of those patients do you think are consented for a recontact to come back in and do? Because it seems like a great one to start with. If they've already got a good cardiovascular workup, then the other workup would be. I honestly don't know. I mean, I will say that last cohort, we did a natural fibrillation cohort that did the same consent type strategies, but a very small number of those, partly because the IRBs were quite resistant to that type of clause when these studies were being designed. So I think that's actually one of the things that is an important issue here, is that almost all the cohorts that are mature enough where they actually have now been collecting or being sequenced were cohorts that were consented under a very different set of perspectives than are now relevant to understanding the genome. Greg, what about the IDD cohort, the Caesar cohort? Can you go back? Do you know, were they excluded from having cardiovascular symptoms? Yeah, there's no exclusion. It's relatively small numbers that we're talking about hundreds to low thousands. And we have permission for recontact, but we'd probably need a separate IRB for sort of unrestricted medical record trolling, yeah. Suppose you could bring this to be recontacted as a phenotype in its own way. Although, you know, a separate IRB is a barrier, but it's not an insurmountable one. And if there's enough value, I mean, I think one could expect that most institutes would have supported primarily their disease phenotype and perhaps not related phenotypes. Although in many cases they do, you know, expand beyond theirs and certainly, you know, some diseases and, you know, you look at inflammation and it's systemic and that. So there may be possibilities there. I don't know, Mark, do you want to comment? Yeah, I mean, I think also the recognition that there are a lot of cohorts that are now being created under new consent with full consent for use of sequence data, medical records and recontact that could be done. I mean, we have 100,000 patients that are under consent with recontact, all of whom are going to have exome sequences. And so within that, there could be, you know, disease-specific cohorts that could be done under some type of arrangement, either with an institute or center, or you could say, well, maybe it would be of more value to do extremely deep and comprehensive phenotyping on folks that don't have any sort of preexisting condition. But I think that, you know, we're not the only example of that type of a cohort that could be leveraged to do this type of work. Great, so, oh, I'm sorry, go ahead, Susan. Yeah, I just want to mention terms of congenital heart disease cohorts. Obviously, PCGC has a very large collection and they are moving ahead with sequencing, but I think the, I think it was already mentioned, the phenotyping data may be not as deep. But I think that there are clinical centers at various different places where there are smaller collections of patients where there's deep phenotype data. So where I am, we actually have around 600 or so patients that, and also patients in Western Pennsylvania don't move. So we have a lot of longitudinal data. A lot of these are patients that were operated, so we also have outcome data. So I think that there are clinical centers at various places where that exists and we actually have DNA and cell lines on everybody. So I think that it's just a matter of finding where these places are and finding opportunities. And so we're trying to compete for the kids' first funding to see if we can get some of them sequenced for trios and they've all been consented for recontact. So I think that if there are opportunities announced, I think that there are, I mean, I know of a couple other places where there are a collection of congenital heart disease patients where it would be possible where there's deep phenotype data and you can really move forward in a different way. And we also have brain imaging data and our patients, about 150 of them have brain MRI and your cognitive assessments. So I think that it just needs to be a context to really get these things flushed out and move forward. So in terms of the deep phenotyping, obviously one of the use cases, if you sequence people who are well phenotyped, you have more power to determine whether that individual is part of your case group or part of your control group. So I think those data sets are very useful, but one of the problems is going to be once you do the deep phenotyping and the sequencing, it's back to what Daniel said, how do you make that, what's the path to make that data available to everybody who might want to use it? And I think that's something that we all know has to be worked out, but it's not easy. So as a big thing to tackle, that might be the most important thing that would increase our ability to make more diagnosis. And I think that also goes back to the problem. If you have to aggregate the data, you really just lose so much information. So in the end, not only making it available, but if you have to aggregate it, it's a big loss in terms of the actual genetic information you're after. And I mean, available all the variants and detailed phenotype for each of those individuals and the ability to go back and ask them questions. No more. Well, I might ask Adam or others around the room, we have our common disease centers that are doing sequencing of reasonably well phenotype people, I believe, with a variety of diseases. Adam, is that a group that could be used for this that might be recontactable for additional phenotyping? Yeah, I would have spoken up before, but I was trying to think about how much detail I remembered and the devils in the detail. I'm sure there are many samples that are gonna be like that, but without having the list in front of me, it's hard for me to tell how useful it's gonna be. So that might be a place to look where we could at least get started. That's already underway and already working. How good the phenotypes are there is, I think, of some question. If you could kill your microphone. So then a follow-up question, it's not on. Yeah, use this one. So a follow-up question is, assuming that you were able to get that data in one place and you wanna be able to use individual level data and not necessarily aggregate it, but then we've heard throughout the day that the data is captured and using a thousand different protocols. Is it necessary to layer on some kind of ontology to that data to make it useful or can we rely on NLP and other avenues for processing the data? So I'm not sure if that's a question for me, but I think having looked at phenotype data as put together by multiple different cohorts, it's, at least in our hands, we found it extremely difficult to do any kind of phenotype comparison across groups without at least some attempt to do harmonization, obviously, of the quantitative traits and then structured ontologies for the ones that are more qualitative. Definitely in rare disease research, we've now moved to human phenotype ontology for all of our cases and I think that's made it, it's limited and it's not perfect and it definitely doesn't capture all the scope of things that a clinician would wanna put into a record, but it does help us do comparisons between cases in a way that we wouldn't be able to do otherwise. So I have a question about the idea about consent and what you're consenting for. So we focused a lot around deep phenotyping, variant analysis, diagnosis, but a lot of times we get past that point, right? And now we get to the point where we have a diagnosis and then we're incredibly data deprived, so to speak. We don't have the deep phenotyping on the patients. If we were to make model organisms, it becomes very difficult to figure out which variant we would wanna model. And then of course, if you're going into therapeutics, biomarker analysis, what's happening at the level of the transcriptome, if you're talking to a protein person, they don't care about the DNA, they're looking at post-translational modifications and so from a model organism standpoint and then moving towards therapeutics, having the mutation and the diagnosis is great, but then we have this incredible void of all of the rest of the information. So the long-winded question is, is there some level of a comprehensive consent form that you could get where you have access to not only re-contacting the patient, but the level of the transcriptome, tissue banks, repositories, all of those things that go so that you're dealing with one comprehensive consent or is that, am I being naive? Is that too much to ask? An HGRI sanction? So. That's not a new thing. I mean, there's plenty of models for that. Most of our institutions have that sort of thing. The reality is, it's not that often we get a piece that's flushed from someone, even at a cancer center. You know, our patients love for us to take tissue from them because they don't want their cancer, but yet we still don't have that many pieces to put into banks. So I think the challenge is not the consent part. We have comprehensive longitudinal follow-up consent in our total cancer care study at Moffitt, for example, but the problem is how do we get people to want to actually be phenotyped that often and at that deeply? So, Cricket and then Mark. So, you know, I would agree that biobanks in a lot of academic institutions are doing this. At the partners, there are 10,000 people who are consented to have anything you want from them that comes out naturally or in the context of disease. But remember that that is the joy and beauty and potential of iPS cells, that you could. And I limited my comments to say where we're not there today, but we will have organoids that could be patient-specific and though that's not gonna be mass-scale phenotyping, it certainly is going to allow for really rich phenotypes to be evolving and especially in particular, you know, candidate disease genes, so. Yeah, I mean, again, I think this is an area where we could, you know, do some searching for phenotypes of this type. In our obesity institute, all of our patients that undergo bariatric surgery, in addition to having, if they're consented into our project, in addition to having exome sequencing, we will get liver and adipose tissue at the time of surgery. So we have those tissues across now, probably over 1,000 patients where that would be available. And so again, I'm sure there are other collections like that that could potentially be sought to say, can we use these for other purposes? And with under our consent, that would be possible. Great. Yes, Mike. I was just gonna say that the PMI consent, as far as I know, has not been drawn up yet, but it would be a good opportunity for people from this group to weigh in on the kind of things you would want in that as it's designed. Yeah, I think that's being worked on now and might be something worth keeping in mind. I can't imagine that it won't be as broad as they can possibly make it, but there are constraints on that, as you can imagine. I was more speaking from the reference point, maybe, of the ALS community, where we had all of these tissue banks and repositories. And at the time, then we get to the point where we want a sequence and those patients had never consented for that. And that becomes relatively frustrating. The answer ALS also has these IPS cells. And while I agree, that's a great, every patient will have their own IPS cell and then have that, do that phenotyping, but it's still a cell in a dish. And it doesn't necessarily give us the insight to the databases. And I'm thinking now about Parkinson's and Alzheimer's, where they're starting to follow patient populations, even large populations, even if they don't know that they're going to develop Alzheimer's or Parkinson's, to see if they can be more predictive. And so it seems like that global approach is pretty ideal, but I don't necessarily see a lot of people focusing on it pre-symptomatically. It seems to be once a disease diagnosis is made, but I might not have all that information. Again, I think that's where healthcare centers are collecting every patient. I mean, Gyscar is doing that. It's best I know Vanderbilt is doing it. A lot of places are doing it, every patient, irrespective of disease. And for patients who begin in childhood at an institution, that's lifelong. Great. Okay, maybe we can shift to the question of quantifying evidence. It's something that I think we've all struggled with for quite some time. We heard a couple of suggestions of things that could be quantified, one being the, let's call it the inferential distance of a phenotype from the actual clinical characterization, another being what Daniel showed us in terms of comparing the frequency and controls to saying how impenetrant would a variant have to be to be a causal variant. What do you folks think, and I'll ask this group first, in terms of other ways that we could quantify evidence and the needs for that in order to try to put it into some kind of an algorithm that then we can assess. Is it great? Yeah, so I think this is a case where Daniel and Doug's talk really, for me at least, are the path forward is thinking very big. It's empirical null models, right? You do lots and lots of people, you do lots and lots of variants spread across lots and lots of genes. And it's very easy then to ask how this variant, variant X had this, you know, some functional effect, some unit of measurement and you ask questions like how rare is that? Well, I've tested a million mutations or 10 million mutations or eight and a half billion mutation events and this is where it ranks. This is the standard deviation. This, you know, you get a very empirical sense for what is the probability of that observation. And that is the sort of thing that lends itself to building these probabilistic interpretations of genotype, phenotype correlates. Now the other thing I'll say is that, you know, I know it's, and like I said, I really like Doug's talk is thinking very big on the functional side, especially, you know, I think when the end code pilot phase, for example, was sort of conceived and started, the whole genome was very daunting, right? So it was, let's do 1% where we're doing arrays doing other things. We don't know what assays are gonna scale. We don't know what information they're gonna give us, but it's worth trying 1%. You know, there really, there are eight and a half billion SNVs of the reference genome, which is a big but very finite number. You could envision a lot of assays that actually are meaningfully scalable. You know, things like, do I up or down regulate transcription? Do I affect protein structure? Things like that where I would, you know, be very happy to have a generic pre-generated, even if it's not specific to my gene or my locus or the mechanism that I care about, something that's generally available and quantifiable can turn out to be very powerful. And I think, like I said, Doug's data showed that. And it also, you'll find new things too. When you look at not just the variants, the 10 variants that you care about as good candidates, you're looking at millions of things. You're gonna find new collections of variants, new annotations that predict feature X. So even if your assay is going after one property, you'll probably be able to learn clusters of things that predict other properties as well. I guess I would just say thinking big on that is the important thing, getting lots of variants would be a major priority. Lest, do you wanna come over and talk to me? Yeah, I was struck by the same thing that you were, Greg. And it came across a couple of talks, both the cardiac disease functional talks that we heard, and then Daniels and Doug's talks. But the one I wanna go back to is actually Stephen's talk, it looks like he had to leave, to drive home the point of number one. And we tend to do this a lot as a field, which is a lot of self-flagellation. And it's important to go back and remember that even though we don't have all the variants and we don't have all the pathogenicity assessments, and we don't, we don't, we don't, it actually works pretty well already, which is great to keep in mind. And what that means to me is that Genomics has taught us that a lot of good data is better than a little bit of perfect data. And as a field, we need to keep moving in that direction. And that means we need to understand underlying principles of how things work, which is I think how the key thing that I often feel like I need to understand for my basic science colleagues. And then if we can transform that into high throughput functional assays so that we can get a reasonable estimate of how well the protein is working with and without this variant, then we can use those data, imperfect as they are and crude as they are, to sensibly modify again in that sort of Bayesian inferential network what the pathogenicity of the variants are. And I think there's some good efforts in this area. You mentioned the BRCA, so you have your DSB assay, the MSI assay folks I think are making huge progress where we're going to have a huge catalog of decent estimates of how much each variant affects MSI. And then we can use that. And I think that that is the path forward. So I think we wanna start to build a model where we have the basic science labs tell us what the principles are, connect something in the laboratory to the phenotype and then design that high throughput assay so that we understand the function of as many variants as we possibly can reasonably well much as we do high throughput drug screens and answer a lot of these questions efficiently going back to Stephen's talk because the decisions that need to be made, that he is making every day cannot wait two to five years for the lab to work out everything and make the mouse and do everything you would like to do. We need the answer between one and five days not between one and 10 years. So Les, I really, really like that. But and I think one of the things that we also do in this field is that we throw out a lot of things that sounds like what we're doing is inaccurate and that we're not good enough yet to do a diagnostic. And I think in some senses by competing and I don't mean that in a negative sense, we're trying to figure out how to do the best science but I think outside the field that's being confused that this is not ready, this is not clinically ready, it's not useful, we can't do anything with it. So I think the question is how do we continue to drive forward improving it, making it better but not shooting ourselves in the foot every time we turn around by this, it's not perfect and it's not gonna be perfect for a long time but it's usable today. So how do we balance that? And I'm asking you now on your physician hat because when I talk to physicians, they point to it's inaccurate, we don't know how to deal with it, there's all this various stuff, the data's, and so what you just said, we really need to be using this now, which I agree but how do we balance that with the community? There's a thing called the nirvana fallacy. What you said must have been important because we're hearing it again. The nirvana fallacy which is comparing something that is actual with an idealized objective of what could be. And I think that's what our field is doing and when people yell and scream that you don't know the penetrance of all these variants and people when you ascertain them this way, that is true, we don't know but we have estimates of what the penetrance will be and to break the nirvana fallacy, what you have to do is force people to say, okay, what's the clinical reality? And someone said this earlier, I can't remember who, what's the clinical reality of the decision that you're going to make in the absence of this information and what's the reality of the decision you'll make with it? And again, in Bayesian reasoning, every piece of incremental data that is valid, even if it's not perfect, improves your decision. And so that's the way we have to start thinking about this and unfortunately I think we and our colleagues are often more tied to tradition, that's the way we've always done it, and then we are to logically reasoning through this and using the good data that we do have. So I think that's what we have to pop is that nirvana fallacy balloon. So I think to echo that, we're really lucky, we live in a time where there's really good ways to analyze and store disparate large data sets that are sometimes poorly structured. And so the key is what's the use case? What's the question you're trying to ask? Let's go and gather these questions and then people on the informatics side, we can work it out, store the data and map the data. But if we're doing that without having a good set of requirements, a good set of use cases that we're going to run into trouble. I just want to play off what the better Howard was saying a couple of minutes ago, that we've learned a lot. I think Stephen's type was a great one for highlighting the best is enemy of good type of thing we get stuck into. And one of my favorite sayings is in the land of the blind, the one eyed man is king. And then we have one eye, it's not a great eye, but it does see something and we need to be moving forward. And but we need to be better prepared to learn from our actions. And we don't have this iteration that goes beyond what happens in an individual institution or even sometimes individual lab if for that inward. And so that part. And then we haven't really applied it to the, to our basic, more basic science colleagues, which is part of the point of this meeting for these two days. But that sort of interaction is rare. In my institution where I am now, the clinical buildings and the research buildings are separated by one of the three valet parking areas. It's called gold valet. And we jokingly refer to gold valet as the Grand Canyon because the only time the scientists go across it or go to the cafeteria and no science comes across the gold valet, it's just for eating purposes, we walk there. We need to be interacting in better ways. And part of it has been, we've started including our basic folks, some of them in some of our clinical tumor boards because they can help us get from a variant of unknown significance towards a clinical action. And they're terrified that we're gonna mistake them for a doctor, that kind of doctor. But they're willing to help us know what happens in a cell. And that's really valuable when you have to make a decision like you pointed to for our patient and you have nothing else to go on. And so I think there's opportunity here to really put forward. And if we're gonna make genomic medicine great again, we need to, we need to really be pulling this together and moving forward. And I think that's our opportunity. Yeah, I'm sure Howard is a very nice guy. Sorry, Les is gonna preempt you for just a second. No, it was totally off color. So I would posit that ever since Lejeune published The Down Syndrome Chromosomes, this has been the dilemma that genetics has lived in, Mendelian genetics. Every new technology that's been introduced has led to this dilemma of, how good is it? What resolution of bands is resolution enough? What resolution of microarray is resolution enough? How do we interpret this? What does this inversion mean? Oh, it really doesn't mean anything. Well, we maybe shouldn't have counseled people about this. But the difference that I see it is that we've been able to, up until relatively recently, do this as sort of a craft. In other words, it's just our little group of craftsmen that can talk back and forth and kind of make these decisions and use literature and traditional methods to be able to learn incrementally and apply that knowledge incrementally and then subsequently build on more. We're not talking about moving something from this craft into scale, in which case we're talking about much larger data sets. Many more people involved, a fast, the larger phenotypic landscape. And so we have to think about innovative ways to be doing this. It's not that the job is different, but the scale of the job is tremendously different. And so that's where innovation around how do we create the methods of sharing data back and forth to rapidly and incrementally improve our knowledge is important. And frankly, it's healthy for us to be talking about things like what we're calling the same variant different things in different laboratories. That's exactly the sort of thing that we need to be doing in order to improve what we're trying to do. Yes, there's the risk that that could be interpreted as saying, well, you guys don't understand anything, but we have to come back and say, well, actually here's all the ones that are in there that we all agree on. This is what they are. We're trying to move these into that space and then find more and keep doing it. So that part is a messaging issue that the fact that we're publishing this type of information is so that we ourselves can commit ourselves to continuing to improve, not to say we're terrible, we don't know anything. That interrupt the sequence because I think Erin has a comment directly to this and then we'll go on. It's really just to echo what Mark said. For example, ClinVar is becoming a really valuable resource where there's Melissa might even comment how many variants are in ClinVar at this point. Roughly. I think it's 140 some thousand. Yeah, 140 some thousand, but you hear a lot in the press and from the public that well, there's so many conflicting reports in ClinVar, but you're right, we do need to make that message to the community, but that's where we have to start. At least now we know six labs have analyzed this variant for this disease and three of them said it's pathogenic and three said it's a VUS. So that's an important point, Mark. Great, thank you. So Callum and then Gil. I was really just going to echo what Lasse and all the hards had said, which, but to point out that I think we, and I said it earlier today, we tend to fixate on the diagnostic point and until we're sure the diagnosis is perfect, we're never moving forward into an intervention arena. And I think if this were another biomarker, I mean, we would already be in clinical practice routinely. It's, if you imagine the things that people are using and what the pre-test probability and the post-hoc reinterpretation are like, and you compare it to what we're trying to do in genomics, we're already way ahead of many assays that are in very common clinical use at the moment because they were built into a therapeutic trial and as a result are part of the co-development of that therapy as it moves forward through clinical care. Respond to that, I promise I'll get to you Gil. Yeah, just to amplify that a bit, I think it's another component of exceptionalism which is killing us, which is the legacy of determinism. And I think all that talk about determinism led people to think that the standard is that our predictions and our correlations would be 1.0. And it's of course ridiculous. None of us think that way. We all think probabilistically. And when you compare our predictions to other predictions in medicine, many of them compare very favorably or superior to them. And yet, it's like, oh my God, no, you can't use this because it's not perfect. It's really a crazy situation and we just have to stop that. So we've been talking about, I guess, ambiguity between the lab and the clinician and there's another level and that's between the clinician and the family and the patient. And I think some families are gonna take the ambiguity and roll with it and others won't be able to deal with it, don't understand it, won't be able to get beyond it. And so I think knowing the patient, the family, if you're not interacting with them as the clinician, working with the clinician who does, to try to, from before the test is ordered till you get it back and then you may get updates and all to really work with the family, whether it's the physician, the counselor or someone to help them through this and to understand where they are because there's some families that maybe are not ready unless it's really clear and others that will take the information and go with it. And I think that's always gonna be part of the interaction as well. So I'd encourage all of you to go back to your home state. You can probably do this online right now and look up the policy for your largest insurer and what their policy is on exome sequencing and whole genome sequencing. So if you haven't done this, you should do that. And you will be shot at the papers that we are publishing and how those are being used to prove why we should not be doing genomic testing and paying for this. And so if you haven't done that, I encourage you to do that because it's pretty, I was pretty shocked as I started looking into this of what we're having as dialogue in this room is actually being used as ammunition for why we shouldn't be paying for this. And I'm not saying we shouldn't be publishing this because that's not what I'm saying, but we have to understand that some of the discussions that we have are being used against us and it's just worth looking at that. And I think as a community, we need to start addressing those with our local insurances and keep in mind that there's 50 states and there's multiple insurance companies and they each have their own policy. And that's a big, big deal that we need to pay attention to. Yeah, Howard, I will follow the metabolism listserv. I don't know how many people follow that here. And I just about broke out laughing the other day reading a case asking for help with their patient and they listed and it was a single space list, probably half to two thirds of a page of tests that had been done that were very, some very expensive tests, some quite invasive tests that pose significant medical risk to the patients in the performance of the test, test after test after test, asking what other individual low throughput specific tests could be done because the insurer will not pay for sequencing, that's just insane. First you want to speak to that. In this discussion about what level of evidence do you need to interpret something clinically, I think we have to remember that a lot depends on the clinical situation, which I guess really informs the prior probability. I would say that when you have a phenotypically affected person in front of you, you don't know what the cause of that is and now you found a variant and you can make a compelling case, that that's a different situation than an incidental finding in a person who at the moment is perceived as healthy or different say than a prenatal diagnosis in a fetus where you have at the moment no other evidence of pathogenicity. So it's not a single pallet here in terms of what the clinical application would be and therefore not a single kind of approach to defining evidence. Excellent point. I'd like to shift gears a bit but actually pick up on a point that's been made earlier today and that Mark made as well and that's the issue of scale and I might ask my colleagues who deal in scale, particularly in the genome sciences division, Jeff and Mike and Elise and Adam perhaps, to think about when you get down to a patient, there's no scale, there's an individual and yet we need databases that are at scale to be able to interpret those variants. So what kinds of data resources are feasible for us to generate, let alone what might be needed by this field to be able to make those inferences? So Jeff, do you have any thoughts on that? I think this is a really good example of where the clinicians need to be working together with the basic scientists to define what really are the most useful data sets. We in the genome sciences think that the encode data set is really useful. It seems very useful for biology. We have some indications that it's useful for giving people ideas of where to go if you want to conduct a functional test. That doesn't mean you can use the encode data set to interpret a VUS directly, but maybe combining things like what we heard from high throughput protein perturbations and encode and depending on where you are in the genome and comp and so forth, we can have enough hints to be able to, well there are a couple of ways to look at this. First of all, not require the highest throughput functional testing in zebrafish or mouse or something like that, but to be able to zero in some of the tests. But then again, we need to know what are those high throughput not perfect data sets that will best inform that. Cricket did mention it in this latest discussion, though not in her talk, that while there are certainly shortcomings to trying to test variants in IPS cells because they're just cells, they're just cells in a dish, was mentioned again up here. Cricket mentioned there are organoids on the way and these are gonna be mixtures of different cell types in a microfluidic system of some sort that is hoping to mimic a tissue. And then you could presumably put a test that tissue under different physiological conditions. So these are all tools that we're trying to build to try to sort of knock down one at a time some of these barriers to be able to do the specific tests that we'd like to be able to do. Not sure that they're gonna get us to a three to five day answer when you have a patient in front of you, but there are tool sets that hopefully when in combination may enable us to work toward answers to some of these questions. I just mentioned that, for example, the new round of encode is not only developing, amplifying the catalog of potential regulatory signals but is working also toward number one, doing that in tissue from patients with diseases. So it's not just K652 cells but also has another component that's trying to actually zero in on developing methods to figure out which are the actual promoters and enhancers in a particular tissue. So we are trying to move in these directions with some of the programs. But again, this is with somewhat limited contact with clinicians who are trying to figure this out in patients. So we need to meet better in the middle. So we had Greg was gonna make a comment and then Gail. Yeah, I was gonna say one follow-up to that. So obviously I'm a big fan of encode and the data that comes out of it. But one thing that is a major sort of conceptual philosophical difference is what the VUS community wants is variant level data, right? And that encode has been arguably quite reasonably said we want lots of TFs, lots of DNA. We want lots of a small number of cell lines and tissue types with lots and lots of assays. Whereas a variant focus would be we want a limited number of assays on a whole bunch of variants, right? That gives me something where I find a variant and somebody's already tested it, which is very different than trying to figure out. Well, it's overlaps with this that and the other thing. So I would say that it's arguably a different sort of philosophical thing. It connects with encode in some ways, but it's arguably also a separate issue which is encode is defined all of these basic element types and where they exist. And now the one major goal should be let's saturate all those elements with mutation events and register what they do, which is again a different general goal. And let me just mention that we have a technology development program that's trying to scale up those kinds of assays. You know, this is kind of along the regulatory. So this is in my clinician hand. I've had two patients referred to me for genetic counseling related to diagnoses. One was a VUS for an exonic splice enhancer predicted by two programs that it didn't change the amino acid and the patient had told they had the disease go get the genetic counseling. And then the other are intronic variants. I think this one was nine in or 16 in the intron with no functional data anywhere. And so I think this is the other side where some labs maybe or I think are totally over calling or confusing clinicians. And I think we have to be very careful with these kind of regulatory things and you put it in a report and even though it says clinical correlation and variant of unknown significance, it's in a clinical report and programs predicted it. And so that's why I think clinicians where the clinical hat hasn't bought into all of the enhancers and the ESCs and all of that because I think that we really need functional data that this is relevant clinically. Deborah and then Mike Pazin. So this is gonna seem really radical I think to this group and it's gonna go beyond crickets free for all. I have watched a number of people who have high school or college educations interact with their genome information. And they explore it, they look at it, they correlate what they're seeing with what they're experiencing in response to taking certain drugs or taking vitamin D and having awful reactions to it. And of course they have that vitamin D thing that affects their metabolism of vitamin D. I go back to my comment, there's a patient in the bed we're not just at the bedside. I really think there is valuable information that curious motivated individuals. Now this isn't every individual but there are a lot of people who can provide a lot of information in response to their genome information if given access to it and allowed to explore it with maybe some support, someone they can go to to ask questions or to get feedback. I know that is not a medical genetics model classically but I think we are missing a great opportunity by not involving patients, individuals, people, healthy people. Excellent point. So we have Mike Pazin and sorry, did I get you Gail? Did I skip you? Yeah, okay, sorry. Mike Pazin, Bob and then Wendy. I just first wanted to respond to something that Greg said and then something to something that Jeff Schloss said. With respect to the ENCODE biosample portfolio, a lot of people misunderstand what's out there. We have for a small number of cells, very large number of assays, but then for a very large number of cell types, a small number of assays that include hundreds of organs, primary cell types, tissues, a lot of people seem to think they're just six cell lines. There are plenty of people in the community that are taking advantage of this in their basic science and also in their translational studies. So for example, pretty recently, J. Chandra Ray published a paper where they were hypothesized that perhaps in cell-free DNA you could get from people's blood that the DNA breaks might come from the cell type of origin compared it to ENCODE chromatin accessibility data and found out that there's some reason to think you might be able to tell who has cancer from this and what cell type of cancer they have, even without other symptomology. That's a really powerful idea that could be followed up and couldn't be done without that basic science. But following up on what Jeff said and also what Howard Jacobs said started us off with, I mean we can't, basic scientists can't push the information out to clinicians. If the idea is that no one will believe a fly study or a mouse study, then no number of fly or mouse studies will ever convince that person. Or if the study has to be done in an ambulatory person, not in an organoid or a cell line, then no number of those studies will ever convince people. So we do need to hear from clinical practice what kind of information would be credible. Thank you, excellent point. I'm not sure we know Mike's problem. Bob? Yeah, Gail's description of her case reminded me of my thinking this morning when Howard was asking, his case presentation is asking at what level, at what point would you put this in the medical record? And I think as I analyze why I raised my hand when I did, it had to do with the fact that I've done biology, I understand enough of the biology and the physiology behind this, and in fact it was the EM of the potocytes that did it in part because I had seen a patient before with the rare genetic disorder that affected the potocytes that had severe kidney disease. So I had some context that actually came out of a fairly rare knowledge base, I think, and that allowed me to make that jump. And I think that's the difficulty that most providers in practice have is that they don't have that comfort level with that kind of information. So that's one of the challenges I think we need to address. Wendy? So I think I'll jump off of your comment and I actually pushed this button to respond to your comment, which is, I think that we're a bit stuck because the laboratories are there interpreting the variants largely. And they don't really have the detailed clinical data. The patients have them, so I'm classically trained in all that and I have the issues with the direct to consumer and so forth, but I think that's incredibly valuable way to engage people. But I wonder if we've kind of given up on the hard work maybe it takes to extract data out of our complex medical system because it's so hard. And I have a much better belief in functional assays than I did yesterday, listening to a lot of this today and I'm much more hopeful. But I do recall, I have my biases as well. In this case, it was a negative one where in the ClinSeq data set there was a hereditary diffuse gastric cancer CDH1 variant and the functional assay predicted it would be deleterious. So I'm reading all the evidence like I would with my clinical hat on back in Chicago and I was pretty convinced. And then it turned out because of the frequency data, you know, it really is not pathogenic. And so this fear of being misled. But I think I'm making the point. Yes, we need multiple ways in, but it sounds like we may have forgotten that let's say if we're talking about conflicts in ClinVar and they are a serious proportion and they have to be dealt with and we don't have to beat ourselves up too much. But we could, for known phenotypes we could actually go pretty far to resolve those if we only collected more detailed phenotype information on patients that supposedly have a VOS. But they're affected and vice versa. I'm seeing this as largely a way of how do we, the reason this is a hard problem to solve really is because it's a systems issue in our healthcare delivery system. And to even start to get the partners together to really discuss that is really hard. But the data problem itself is it's a tractable problem. It's just a matter of the phenotypes collected in a standardized way and the tools to put them into a place where you can see it all together and algorithms and so forth to make sense of it. And then you don't really have to extrapolate. It's not a model. It's the patient or a set of patients who have disease, not disease, based on a series of findings. So I'd like to think there's sort of hope that we could also go in that direction at the same time. Great. Well, I might ask the issue that Monti raised early in the day about having the basic labs get access to the full patient data. I mean, I wonder if in some cases, Monti, it's not even the full patient data because there are questions that clinicians don't even know to ask, based on what you might find. So Les has a wonderful story about hereditary pressure policies that I'll maybe I'll let you say that never would have been picked up if the question hadn't been asked of a patient. So, Monti? Yeah, it's pretty, I don't think it would surprise very many of you, is a man who was in ClinSeq and had actually picked up three people with HNPP out of a thousand, which is actually stupendous. And one of them was a man with type two diabetes who had well-controlled early recent diagnosis of diabetes and came down with a diagnosis of diabetic neuropathy. No, he has HNPP. And, you know, the inference was all made and in fact, he did have a reasonable probability of having diabetic neuropathy at a later stage and with poor control, but with the genomic information, it totally changed the calculation of the relative likelihood of diabetic neuropathy versus hereditary liability to nerve and pressure palsy. And though it wasn't why he was sequenced, wasn't what we were looking for, it came up as not a red flag, as a green flag. So this is something that you need to know. Well, and something that the patient would not have volunteered unless you said, you know, you have this variant, but you don't have this condition. Oh, yes, I do, you know. Yeah, yeah, all right. So how does that translate to what Gail and Wendy were saying in the clinical setting where the clinician orders the test, it goes to the lab director, and there's really little exchange of phenotypic information at all. That's really an important problem that we need to tackle. So I had another question along those same lines about patient reporting. You know, sometimes when you work with these, where disease foundations and these groups, you know, they'll even say, well, I see my doctor once a year or twice a year. What do they really know about my condition? And then there are foundations, and I don't even think it, I think it's not even a foundation anymore, but patients like me, you know, who are self-reporting almost on a daily basis, you know, their phenotypes, and how much, I mean, this is a question. I don't really know the answer to it, but how much are we integrating with those groups to crowdsource that data from the patient population? I'm just, can I follow up on Erin's point? And I think Erin's point links back to what Bruce said a little while ago, which is that prior probability issue. And I think what great diagnosticians do, naturally, is they have, number one, a huge repository of clinical information that they carry around in their brains. And then number two, they actually have the ability to do sort of a seat of the pants, lazy and informal reasoning. And that I often tell people when, you know, the classic situation is you sequence somebody, they have a variant, then, oh, it's Newton syndrome, and then some smart aleck comes along and says, tut, tut, tut, you didn't need to sequence a patient. I could have told you, you had Newton syndrome. And what I tell patients is, you know, if your physician is William Osler, you don't need to be sequenced. But if your physician graduated, you know, in the middle of their career, they graduated in the middle of their class from an average medical school, you might need to be sequenced because they're not as good as William Osler. And, you know, the data are useful and the more data we have, the more useful it is. And if you think as Bruce was laying out and you understand the priors and you make conscious and deliberative determinations of which error you want to make, do you want to make an overdiagnosis or do you want to make an underdiagnosis? And in different situations, different errors are appropriate. And then when you set those thresholds right, these data are incredibly useful for making those determinations. First of all, can you imagine what Osler would have done if he had a sequence at his disposal? And actually, it's an interesting point because in a way what the sequence does is it sharpens your vision because I think we have to get out of the mode of we send the test, we get a report, we go back to the patient and we're done. What actually ought to happen is a dialogue between the clinician and the laboratory because the lab may raise questions that you didn't think to ask when you saw the patient. You go back and then discover something that the laboratory actually stimulated to be found. So two comments. First, in terms of the patient disease groups and all, I mean, now when I diagnose someone, I tell them very soon, you're gonna be much more of an expert on this than I am because if they're motivated and they go to the groups and they go to the meetings and all, it used to bother me a little. It doesn't bother me anymore because there's so many diseases and what you wanna do is not miss the obvious thing in surveillance or something that's not related or things like that. So I think it's great their patient disease groups out there and they can provide them things that I never could because I don't have the time. The other thing in terms of the disease and I'll take the Noonan example. So I've been sending Noonan testing and I've never had one come back and then our lab finally started a big panel, 13 genes. And the first one I got was the newest CBL which has this high risk for AMML and I had a variant of unknown significance and it was like, oh my God, I was better off before I knew to order the test. And that's not true, but I think even where we know the phenotype, some of these newer genes are gonna have different risks in all so it is important to do it. It's nice to get it right the first time. You pick the right panel or something but I think there's so much more information now about all the different disease genes that it's definitely important, at least in my mind. So I'm gonna call on a couple of those but I just wanted to reassure you that even though we're going beyond the time for this panel discussion, Carol and I long ago decided we didn't have anything wise to say at the end of the day. So we will finish at six o'clock but so don't worry about that. Go ahead. So I'd like to follow up on Erin's comment about how do we get improvement with the physicians in the dialogue and Stephen Kingsmore actually said this in this presentation, I'm just echoing this, is that one of the challenges we have is that the clinical laboratories that are doing the sequencing and I think we have a good one, we're disparate if you will from the data that comes in and so the idea of democratization and getting this out there is because it is dynamic data, right? So I don't know of any and maybe there's something else out there but I don't know of any other tests where you could do one wet lab test and then that data now becomes useful for a long period of time and I think the disconnect between the physician, the informaticians, the sequencing and the follow up and the patients, I think Erin is one of the big challenges that we have to face and as long as everybody has to go get it someplace else and there's a handful of clinical laboratories that are doing this, I think it restricts the value of this and I think the physicians that are treating that patient and anybody who's been in a case conference on this where the physicians are actually going through the data, it's a completely different experience than you're in a case conference as a clinical laboratory and you have a paragraph that's six or seven sentences that are trying to describe the phenotype and so I think the balance is how do we figure out how to do that and the discussion for the day is there's not a lot of people trained on it, how do we get it out there but I think that that's really where the dynamic phase of this has to be is having it at people's fingertips and it's not just it's you know I don't know if there's any radiologists here so I don't wanna offend anybody but it's not just another X-ray, I mean so. I was thinking about that too having it at their fingertips and then laying on this discussion about ontologies because if it's gonna be data in all these disparate places even if it's at their fingertips you can't integrate it without any kind of ontology but we also heard from Cecilia and others that that's really, I mean a lot of people are using HPL but that's not necessarily the end all be all and there might be multiple ontologies that need to be mapped and linked together so that's sort of an additional complexity but if we could at least map some standards onto these disparate resources it'd be easier to integrate maybe not necessarily for the clinician upfront but. Well I might just interject before Collin talks about that the radiology and analogy is not a bad one if you think back 10, 15, 20 years ago back in the olden days we actually used to go down to radiology and review films with the radiologists and we'd talk about the clinical correlation and they'd say oh you know and they'd hold it up to the light and they do all kinds of other things and you know it suggests other studies and that sort of thing that isn't happening with this test and really needs to I agree. Is it in Vermont I bet it's happening. Well in Vermont and I know UCLA they have at many places where they're doing the sequencing they do have these interdisciplinary conferences where the results are discussed and you do go back to the patient and you get more information. So I don't want the clinical laboratories to just be labeled as this place that's technical and goes off and doesn't know anything about medicine because that's not the way pathologists work. So just I had reached my point of not being able to keep quiet anymore. No, no, you may. From the mad seek study was that the availability of a genome completely changed the patient's engagement in their clinical care even if they were in the well cohort the healthy cohort. And so I do think that again to echo what Deborah said that having the patients involved and this is actually going to be a vital part of moving forward. And then the one other thing I was going to do is to say that a lot of what we've talked about today all the way from the bi-directional interaction right the way through to building the new assays that you might need for proximate assessment of disease is happening in the Undiagnosed Diseases Network. I mean that is part of the way the network has been set up. Yeah, two things to build on Deborah's comment. One is is that in the formative work prior to initiating the sequencing component of the my code community health initiative we did focus groups with, you know over a hundred guys in your patients and almost to an individual they said we would like all the data at some point that you know and the comments were like well we know you guys are busy we have a lot on your plate we don't, we're the ones we're willing to take ownership and help with this because we know you're not gonna have the time to do it all. And if you think about that in the context of the discussion we had earlier which is the updating issue given the fragmented and disintegrated nature of healthcare in the United States even if you committed to re-analysing someone's sequence the chances that they're still your patient you know at a system other than geisinger is unlikely and so how do you track them down how do you get the information the only constant actor in the healthcare system is the patient and so in some sense if that individual has some ownership of that and has tools that would allow them to go in and re-annotate and re-update and then use that as an engagement point for the system I think that's a very innovative approach to think about the problem of this updating information over time. Which is an interesting thought mark for you had mentioned when I first mentioned patients genome connect but genome connect is a unidirectional you put in your phenotype you put in your sequence but what if that could become bi-directional that would be amazing. Well and I think as part of ClinGen as part of the resources that we're building I think one of the things that was not and Erin can certainly as Queen speak to this one of the things that we didn't necessarily appreciate at the beginning of the project was the idea that there would be different users of that resource it was initially built as a clinical genomic resource but relatively early on as we engage with patients from the perspective of getting their data they're saying well what could we take from this resource and now there's a very active group that's looking specifically at how do we build patient specific resources into ClinGen in addition to the laboratory and the clinician and the researcher use groups and I anticipate that that will only increase over time. What's the last word? Yeah although I would not a fan of the patient actually physically having the data having practiced general pediatrics for two years I know what the reliability is of a patient actually showing up with their vaccination record for their kid and it approaches zero. So I think having it be somewhere controlled by the patient is a great thing but not actually physically having it and I would agree with part of what Deborah said about the important role of the interaction of the treating and diagnosing clinician with the pathologist and I'm sure Terry when you were down in radiology you were that was a bi-directional comment you were adding clinical facts to that determination which then changed the radiologist's perception of what they were even looking at which needs to happen with us but I do think it is essential that we keep clear and distinct the role of the laboratory and interpreting the pathogenicity of the variant based on what's known about the variant from the role of the interpreting and diagnosing clinician whose job it is to decide that the variant with a given level of pathogenicity is or is not the diagnosis for their patient and those are two different activities and it's important to keep the information separate and distinct because the way we're currently doing it is very muddled because there's some phenotypic input on the front end which affects what parts of the exome and genome are analyzed and then that leads to a constraint of the testing and interpretation and then the clinician says oh gosh the patient actually has these six phenotypic criteria that means you must be right and all that is is reinforcing an error that was made early in the process so I do think we need to not pretend that the labs are diagnosing the patients the labs are interpreting the variants the physicians are diagnosing the patients and on that note I'm sorry there's 602 but we did reasonably well we'll start again tomorrow morning 8 30 a.m. Howard McLeod in the chair thank you all very much