 I'd like to thank you guys for inviting me to speak about our work. Our goal is to address the problem of variants of uncertain significance to the development of high-quality, high-throughput functional assays. I'm going to tell you about two applications of that, the tumor suppressor BRCA1. So although the very famous hereditary breast and ovarian cancer risk gene, BRCA1 has been sequenced in at least 1.5 million people and has been the subject of countless research dollars. And there are still 350 variants of uncertain significance across the gene. In the literature, and there could be many more that have been, you know, reported by a myriad of different genetic testing companies. Somebody have the drum set. Mary, I need that drum set. Conventionally, we have tried to interpret the impact of variation through classical genetic experiments, like retrospective or familial studies, or one-off functional assays. But at the rate we're discovering VUSs in very important genes, like BRCA1, these methods, while they're highly accurate, they lack the throughput these days. On the other hand, computational predictions of pathogenic, of variant pathogenicity are obviously scalable, and as Daniel and Doug showed yesterday, they are getting much better, but they still could use some more work. So as a potential path forward that provides validity and throughput, we've been developing a toolkit of methods for functionally assessing hundreds of thousands of variants in a single experiment. So we do this with massively parallel functional assays, and they can be very different depending on your protein or your gene of interest. We call them massively parallel reporter assays when we're looking to assess mutations in regulatory regions, demutational scanning when we're looking at protein function, and saturation genome editing when we look to multiplex genome editing to look at either protein function or splicing. These assays measure different mutational effects, but they always consist of a few key steps. First we generate a library that contains all the variants for the sequence of interest. We and others have developed several ways to do this, and then we introduce those libraries into cells, and next we functionally characterize the variant library in a multiplex assay, meaning that although all the variants are present within the assay, we can simultaneously measure their activity, and for each, we might measure the, for example, we might measure the abundance of each mutation within a population of cells, both before and after a competitive growth assay, which is dependent on that protein of interest. The key here is that we can separate cells with that harbor functional and non-functional variants, whether it be through flow sorting, affinity purification, or differential growth. And this is by far the hardest part of development of any of these experiments. But yeah. And then we measure the abundance of each of these variants by high throughput sequencing before and after the selection using next generation technology. And finally, we analyze the changes and abundance to estimate the effect size for specific mutations on the functional activity of any sequence of interest. And today I'll tell you about our effort to apply demutational scanning and saturation genome editing to mis-sense mutations in BRCA1. So the first question when developing one of these assays, or what are the biochemical and cellular functions of the protein that we could build assays around? So first, BRCA1 is a large protein encoded over 24 exons, and all of those clearly have to be spliced correctly. And then after greatly oversimplifying the cellular role of BRCA1, the full-length protein is required for homology-directed DNA repair, and that's abbreviated HDR. I'll be using that a lot. The HDR activity is required for tumor suppression. BRCA1 has to dimerize with this other protein, Bard1. Otherwise, it doesn't fold at all. The two proteins interact at their end-terminal ring domains. And together, they actually form an active ubiquitin ligase. And most amino acid substitutions that have been found to be pathogenic in the protein are actually in the ring domain or the BRCT at the other end. But what we really want to get at is whether or not variants affect BRCA1's tumor suppressor function, but we can't do those experiments. But we do know, as I told you, that the ring domain of BRCA1 must bind to Bard1 in order for the entire protein to fold and function. Because if it doesn't do this, it just gets degraded. So to start, we developed two assays for two functions of the ring domain, and I'll tell you about those experiments first. And later, I'll tell you about our lab's effort to test the effects of SNVs on splicing in Exxon 18 using a different technology. First, we tested the effects of mutation of the ring domain on the ubiquitin ligase activity using a multiplex phage-based system that I developed. Again, we used DNA sequencing to track the abundance of each variant before, during, and after selection for this function. Variants with deleterious effects become depleted from the population. And we visualized these effects in a sequence function map like Doug told you about yesterday. So on the X axis, we have every position in the ring domain, and on the Y axis, all the possible amino acid changes. And above are just diagrams, the structural features of BRCA of this domain. Shades of blue represent variants that were depleted relative to wild type, and red represents variants that were enriched during selection. So we took the same library and we tested for effects of variants on the ability of the BRCA one ring domain to bind to the barred domain, and we did this using a multiplex these two hybrid assay. Again, sequencing the population at multiple time points, and yeah, so there are striking patterns in these maps. So you can see here that these are stop codons, and they are bad for function. So that's a nice control. We always like to see that that's working. Additionally, these areas in loops one and two of the ring domain are required for ubiquitin and ligase function, and in general, those are more deleterious to protein function than other mutations. Whereas in the same domain or in the same regions for barred binding really, only the zinc binding amino acids that are really important for the structure of BRCA one are really intolerant to mutation. We also found regions in the four-hill expundal. This is actually new information that these four-hill expundal amino acids are really required for binding. There's really interesting things in these maps. As a biochemist, a trained biochemist, for example, there's an arginine to lysine change that makes the ubiquitin and ligase 100 times better. Who knew? And these are really fascinating, but that's not what we're interested in. Ideally, we'd like to use our data that we collected in model organisms to estimate the likelihood that any variant would be either pathogenic or benign for hereditary breast and ovarian cancer. However, there are only 18 variants classified as pathogenic, and three is benign in the domain, and that's not enough to build an accurate model. But what we do have are HDR rescue experiments for many more BRCA1 variants. And so thus far, all the HDR activity assay for BRCA1 has been 100% specific and sensitive to both benign and pathogenic mutations in the protein. So we felt like this was a good proxy. So we trained a support vector regression to transform our ubiquitin and ligase and barred binding scores into HDR activity predictions. And we next compared the accuracy of our model using leave-on-out cross-validation to variant effect predictors like polyphen and CAD. And you can see that they do a fairly poor job predicting the effects of amino acid substitution in BRCA1 on its HDR function. Molecular effect predictors grant them an aligned GBGD, do a little bit better. But our empirically derived E3 and barred binding scores do a lot better at predicting whether or not the full-length protein is going to be functional. In addition, when we add the best performing molecular effect predictor, aligned GBGD, to our data and retrain the model, we don't gain any extra information. Since we had trained a good model, we applied it to our data to predict HDR activity of the BRCA1 variants that have been identified in patients through clinical testing. And the results are showed in these histograms. All of the benign variants are predicted to have high HDR function, whereas most of the pathogenic ones have low HDR function. We misidentified this guy. But these two are splice variants. And this is a caveat of using CDNAs. And I'm going to tell you about assays that and get at splice variants next. But what we do find also is that six variants that are currently classified as VUS are predicted to have very poor activity, whereas 13 are predicted to be just fine. But beyond those just seen in patients, we have an additional 1,200 variants that we can use to predict HDR function. There's a bimodal distribution of predicted effects. So while some variants are predicted to have poor HDR activity, many more would have wild type HDR activity. While we would predict that proteins that are amino acid mutations that fall here have done so much damage to the protein that we would predict them to be pathogenic, whereas the guys in the high HDR that are predicted to have high HDR function might be benign, except again, these are CDNAs. And we really don't know anything about splicing. So I'm going to tell you about our work to assess SMVs on Exxon 18 splicing using saturation genome editing, which is a slightly different method. So the first step in a saturation genome editing experiment is to transfect CRISPR-Cas9 and a guide RNA, targeting your sequence of interest. And in this case, it was Exxon 18 of BRCA1. And to edit the genome, we also transfect library of repair templates that contains SMVs over the Exxon of interest. At a low rate, the repair templates are incorporated into the genome, and this creates a heterogeneous population of cells. And after five days, the genomic DNA and the RNA are collected in selective PCR that only amplifies the edited genomic DNA and CDNA are used to amplify the Exxon. Like before, each variant in the genomic DNA and the RNA samples are counted by next generation sequencing. And the effect of each variant in splicing is determined by looking at the ratio of the RNA over the genomic DNA. Again, for CDNAs that are depleted, that ratio will be lower. Shown here are the ranked ratios of CDNA to GDNA for a single experiment. And you can see that edits where a splice enhancer is created, shown in Richmond, are a stable amount of CDNA. Whereas when splice silencers are nonsense mutations that trigger a nonsense mediated decay or created, those RNAs tend to be depleted from the sample. And this is a sequence function map for splice effects of 98% of the possible nucleotide changes across Exxon-18. You can see where there are large depletions of CDNA, it was either due to a position being changed to a nonsense codon and therefore triggering nonsense mediated decay. Or in most cases, they were also predicted to be effects splicing by the mute-pride algorithm. And this guy right here is a VUS, valine 1714G. So I would say that that's probably pretty bad. So clearly, we're in need of new technologies to deliver on the promise of genetic medicine. And we believe that massively parallel technologies like the deep mutational scans of the BRSA1 ring domain and the saturation genome editing of Exxon-18 that I told you about represent at least a partial solution to the problem of VUSs. But I don't want to discount what we've already done, but as you might have already noticed, BRCA1 is a lot bigger than its ring domain or just Exxon-18. In addition, there are many more genes that one might say are high-value targets. In my mind, I see the hereditary cancer risk gene panels, but many people in the room probably have their own list of genes. There are challenges for scaling up. As I told you, the first step in these experiments is library construction and variant delivery, and we think we're actually pretty good at that. And the next thing we would need are parallelizable assays for protein function. And we're working, as Doug was telling you yesterday, about more generic assays that get it protein folding. But we're not biologists, so if anybody has assays for their favorite gene of interest, we'd be interested in hearing about them. It would also be great if there were a suite of genes that could be tested using the same assay because this is the most difficult part. So for example, BRCA1 and a lot of the other proteins that are involved in double-strand break repair can all be tested in the same assay, which is very nice. Sequencing of variants, we've got that. Doug's group has built a very nice computational scoring pipeline. And then again, we like to use these scores to calculate likelihood estimates for pathogenicity or damage to the protein. And we'll just need access to large-scale projects like EXAC and the forthcoming PMI. So we believe we could, by doing more scans of high-value targets, that information could be useful immediately. That would create better databases. So when somebody comes in with a new mutation or a new variant, that those scores would be out there. And then those data can be used to build better variant effect prediction models like Doug was telling you about yesterday. And I'd like to thank all the people that helped me with that work. Thank you very much. We'll have, as you remember from the format, we'll have some time for a few questions for this presentation. And then we'll also have the open discussion. And the good news is there's only about 20,000 candidates for your list. Yeah. So I think you'll be in very good shape for a while. It's fine. It's just fine. Sorry? You can't. All right. Dan Rudin's being literal. So we'll have to, yes, 40,000. All right, Howard. So that was very impressive. I'm just curious on what cost and scale. So in terms of scaling up, what's the cost and what's the time window? Because I think part of the challenge in thinking about how do we go forward on this is, so can you give us some feedback on what the cost and time would be to do what you just did? Well, we were learning. I was exploring a lot of the space around experimental mistakes. So now that we've got some of that down, it takes a bit. But again, it's the functional assay, the building of a functional assay that takes the longest. So if somebody came to us with biology in hand, we can build the libraries and get the sequencing done and do the experiments within a few months. And then the data analysis on the other end is also the hard part. So building these models and building error models around whatever the assay is is actually quite difficult. So would that be $250,000 for the gene in six months of two postdocs? Because I'm just trying to get a sense. I get you ahead to build it. But I mean, if we're thinking about scaling and trying to do $20,000 or whatever the number is, it's just useful to get an idea of what does that mean. What does that mean? Oh, please. Let's go, Weiss. It's $20,000. I mean, I think most of it right now, what most of us are returning is the ACMG 56. So that's where you'd start. You start with something that's practical, manageable in the context of what we're actually trying to return. It's those genes where the VUSs are causing the biggest problem. So I think it's a much smaller number than you guys are talking about. And even in the context of somatic sequencing and cancer, we might have 700 genes on our targeted panel, but only about 150 of them are regularly returning results that we even think about acting on. So you're right. The universe is not $20,000, but it's also more than one. Yes. And also, we're hoping that by taking large gene families. For example, the ring domain is there's a family of 600 of them in the human genome. So we're hoping that some of our results at least will build better predictive models immediately for entire families of genes. Peter Dahl. Is your model trying to replicate the germline or somatic situation? And by that, I mean, the function might be different in the germline setting where it's the only gene mutated as opposed to the somatic situation, where there's a bunch of other genes that are also mutated. That's an interesting question. I think we're really just trying to get at the function of that protein. So we're trying to build these assays so that they are sensitive to mutations in that protein and that protein only. So I guess it shouldn't really matter if it's germline or somatic. But I'm not sure how to answer that question. All right. It'll be Mike and Kalman and Gail, and then we'll have the next session. So Mike? Really interesting work. I have a question about the splicing component, and perhaps it's too early to know the answer to this. But do you have a sense of when you looked at Exxon-18 and Bracka-1, where was most of what you're finding things that would be generalizable to other Exxons, or was most of what you're finding something specific about the regulation of that particular Exxon-Exxon joining event? Because I'm wondering, are you creating a resource that you could use to study all of splicing? Right. So actually, most of what we found with the splicing actually is fairly similar to what's predicted by algorithms that are already out there. So I don't think there was anything special about Exxon-18 in particular. It's just more to show that we can do these experiments at scale in multiplexed assays, I think, was more at the point of that experiment. But yeah, there wasn't really anything special about Exxon-18. Kelly? You alluded to sort of universal assays or more broadly focused assays. Have you tried any unsupervised or supervised learning methods with a simple cell system or something along those lines? There are some very successful phenotype-driven drug screens that have been powerful even for non-cell autonomous phenotypes in cell autonomous states. We have not tried that. We're looking at generic stability assays right now. And that will be an interesting thing to apply to those, I think, machine learning algorithms. So. Gail? Yeah, I'm glad someone else said the ACMG-56. One of the other classes in that, besides cancer gene or the channel genes and cardiac arrhythmias. And I guess I've wondered about it's not as scalable, but Xenopus oocytes and looking at channels and whether one could think of something like that for that group and whether any of the folks here who work in some of the models could say something about that. So we have the fallow reagents in our lab. And we're starting not with we picked the wrong channel to work on first because we picked a really big channel to work on first. So we're doing the same thing Doug did. And we're sort of honing in on a particular domain. And the really interesting question is whether you can develop high throughput assays that don't rely on electrophysiology. There are multiplexed electrophysiologic assays that you could do in multi-well formats that allow electrophysiologic readouts. We think what we're trying to look at is generic questions like does this variant prevent a channel from getting to the cell surface? Does this variant allow a channel to find its partners or not? So those are going to be easier for high throughput. But at the end of the day, we may end up having to do the electrophysiologic measurement. It'll probably be in a transfected cell as opposed to a frog egg, which is big and technically not the right way to do it. And the other idea, of course, is to incorporate IPS cells into all this, which is sort of like down the road, I think. Way down the road. Doug is nodding. We're not quite there yet. Yes. Yes. Leah, thank you very much. Excellent.