 With Nancy's introduction I was remembering one of my favorite movies when I was a kid Brewster's Millions We're at to spend 30 million and this is Richard Pryor 30 million and 30 days But I did the math here and it's a hundred twenty five thousand dollars a second to try and convey an idea Okay, so Let me just I'll start with a slide that's largely redundant with the conversation this morning The interpretation of genetic variation is a mission critic and mission critical challenge for this Institute in this community At least two examples of where it is currently rate limiting our variance of uncertain significance And genome-wide associations a related point, which I think there's also probably consensus on is that simply sequencing more genomes isn't going to effectively address the underlying challenges So I guess the question is which way forward so computational prediction of variant functionality is certainly a viable and worthwhile goal and We and others have put a lot of work into developing algorithms to integrate various annotations functional annotations to a single score Based on our experiences to date. I worry a little that trying to improve computational approaches to the place where we'd like them to be May turn out to be a bit like squeezing blood from a turnip so to speak Where the turnip represents present a functional annotations and there's only so much information content that will be able to squeeze out And so my point is not that we shouldn't be pursuing Improve methods for computational prediction like the fund of our initiative, which is great It's that we need more and better training data to complement or proceed these efforts So, okay, so one path forward to address the challenges in the framework of a scale program would be to pursue Experimental measurement of the functional consequences of a very large number of variants So for a budget of 10 to 20 million in each of five years I don't think it's absurd to think about numbers like 10 million As estimates of what might be possible which equates to about one in every thousand of all possible sniffs And I'll add that there are of course a lot of different ways to slice the question of how to pursue large-scale Functional studies and what I've really tried to focus on is how to maximize this number of how many variants that one is looking at Perhaps at the expense of other considerations So so 10 million is an ambitious number, but if you suspend your disbelief for a second and say we could do it what would we gain first absolutely massive training set for Improving predictive algorithms for coding reasons this would be about a thousand fold larger than the data sets that are available today For training for non-coding reasons. It would be effectively infinitely larger since we hardly hardly have anything at all Second this goes back to one of the points this morning about biology being a goal So sequence structure function maps for diverse classes of sequence elements So really informing biology at a base pair level resolution. That's not really accessed by current efforts in functional genomics And then lastly the the measurements themselves which could potentially improve Interpretation of VUS as well as fine mapping of of associations So how would one in practice get to functional analysis of 10 million variants? So the answer of course would have to require would likely require some large degree of multiplexing This is a workflow from a collaboration with Len Panasio and Nadav Huitev Joe Ecker brought up this general classes of approach approaches earlier today So in a nutshell, this is involves first generating a comprehensive allelic series of a let's say a single mammalian enhancer in one reaction with each variant linked to a transcribe barcode then putting this this population of enhancer variants through a massively parallel functional assay and then finally using sequencing as a readout where you read out the transcribed barcodes in RNA and Infer or estimate the effect sizes of each possible mutation on the enhancer and this is all in one experiment. So Here's a here's an example of such an experiment. So this is a 250 base pair Entronic enhancer of Aldo B which with position on the x-axis and logful change on the y-axis And so we've effectively generated and this is this is remarkably reproducible and quite sensitive measurements for the functional effects of about 750 non-coding mutations in one experiment So Of course as it came up this morning as well. We want to do functional studies on coding variants as well So here it's more challenging Because ideally you'd want to introduce one and only one Mutation to a full-length orph. So Jacob Kitzman who just started his lab at the University of Michigan Recently worked out one method to do this in term pals Where one uses microarray derived oligos to program an allelic series of all possible codon Substitutions to an orph in a single reaction And then some of my colleagues at UW Lea Serita and Stan fields have applied this method to the analysis of BRCA 1 in particular Braco one Bard 1 interactions and have generated just beautiful maps of the relative effects of nearly all possible Amino acid substitutions In this region of Braco one that interacts with Bard 1 And then these can be then correlated with structure to inform biological understanding of the interaction But also evaluated for utility in predicting the pathogenicity of clinical variants And so there there are a lot of groups that have developed methods related to the examples that I gave At the take-home messages at the underlying technologies that would be necessary are already there or are rapidly maturing In practical terms what would be the rate limiting factors for scaling this so first Producing allelic series at scale Would be one challenge and the second this would probably be the harder one would be adapting diversity of cellular and molecular assays to multiplex readout But I think in sharp construct a sharp contrast with with other large-scale projects Sequencing would be the least of the challenges And I think it would be a kind of a different portfolio of what would be hard about about this project So in terms of actually generating allelic series It's much easier to imagine how one would get to a number like 10 million If we are talking about dense or saturating mutagenesis Which encompasses both of the examples that I gave you and you can imagine tackling 2,500 1 kilobase regions would get you to 10 million assuming that you had Three substitutions and a deletion for every residue or a nucleotide And this could be spread between coding sequences So let's say 500 clinically actionable genes and several thousands this regulatory elements What I think is you know I also came up a bit this morning in Joe Eckers talk But it's a bit difficult more difficult to imagine with current methods is how to multiplex the functional analysis of sparsely located Variants so viable candidates for for GWAS associations But you can imagine with CRISPR Other related methods there are possibilities, but probably on a per variant basis This is going to be much more expensive than these dense approaches Okay, so the other the other challenge is is functional assays and the concepts in the slide are adapted from another one Of my colleagues at UW Doug Fowler and the key point here is that one doesn't need to develop 1,000 functional assays to study a thousand sequences of interest so rather you can funnel related types of sequences into identical or at least similar assays And this of course applies to to regulatory elements where you you know They can all be read out with transcriptional activation at least in answers But you can also imagine that that all members of a particular class of protein domains could be read out using Related assays and then the second point is that once you have these illylic series constructed for a particular Element regulatory elements or protein of interest you can essentially put the same illylic series through a variety of biochemical or cellular assays Including for example many cell lines, and so we're really talking about building a matrix Where you know you have 10 million rows and the columns are the experimental measurements But once you build the first column it becomes easier to do more So the limitations in this kind of data are familiar. We've lived with a context problem for the whole history of molecular biology Functional assays are not humans and molecular functionality is not pathogenicity But these are related attributes, and I think these can be informative about about one another And so and you can also imagine that that the choice of contexts to pursue Would presumably be informed by other projects like encode and road map and so on and so forth Namely what you know what cell type of particular enhancer is is relevant to study and for example So how are the resulting data be used so first this would be a massive Database of experimental measurements associated with individual SNIVs Or point mutations that would be dramatically larger than the training data sets that are available today And and might also and I imagine they're there are probably different opinions on this But could be directly useful as effectively Pre-computed interpretations for view SS in in the most relevant genes in this contrast with the model of First waiting to see the variant and then trying to go back and and do the functional analysis Which I think is pretty inefficient Second the focus on translational implications is great and again as I think Eric brought up this morning Let's not you lose sight of biology as a goal And a data set like this would be informative about both structure and function at Single base per resolution and then once you learn lessons about let's say particular You know mutations of particular residues and protein domains You can then extrapolate those residues to other members of a functional class So what would success depend on so if you work backwards from this this kind of a budget And a goal of 10 million variants you get to about $30,000 per kilobase of DNA subjected to saturation mutagenesis, which to me seems reasonably generous As it turns out doped oligo synthesis is already using micro arrays is already very very cheap So that in itself would probably not be the major cost barrier But success would depend on you know having better methods for producing a lilac series and then again probably more importantly Developing functional assays that are compatible with multiplexing Scalable and recyclable to different different elements So it's a different different set of activities and sequencing, but I think that the the knowledge of NHGRI Could prove critical in figuring out how to effectively coordinate and scale a project like this a Lot of the technical know-how to get started is already there so you can imagine this proceeding in kind of a phase fashion with lessons With respect to to assay development and so on and so forth kind of applied as you go along So I'll stop there and happy to take any questions Jay yeah Well, I wasn't sure if your assays covered this but one Nawing question is that we often assume a base change is a missense change when in fact it could be a splice change so is there a way to take the coding region of a gene and Determine whether every single base change impacts splicing in one of these assays so Yeah, one could I mean you know again there can be both coding and non-coding basis that affects placing So probably the best way to do it would be to adapt genome editing methods to similar kinds of multiplexing Where you're you're effectively, you know doing dense mutagenesis of a region With with those methods and that's probably the way to go and I think that's that's certainly a possibility Yeah, David. So I think you're on to exactly the right thing. I personally think this is is incredibly important I think to to just Additions, I mean, I think it's great. One is or There's going to be can assays of convenience that will report in some molecular function in your slide that says you know Molecular function doesn't equal pathogenicity One way of framing this is that we need to train our understanding of what assays are relevant Based on the fact that they faithfully report the relationship between genotype and phenotype That's point one and then point two is that is of direct relevance to pharmaceutical development because the input to drug screening is an Assay and the key issue in drug development is is that assay predictive of downstream clinical response? Because if it's not you're going to spend 10 years to you figure it out when you have a drug You do the clinical trials so this is an incredibly important Sort of triangle if you will of the assay its genetic variation the human its disease variation And then that goes into screening or or informed screening So let's not settle for the multiplex convenient lots of stuff we can measure no no I'm saying and let's use it to figure out what should we be measuring absolutely So I think I think there's so I agree with everything you said and I think you made this point Indirectly earlier this morning as well I think there's an argument for getting started with assays of convenience And you know and I think I think in many cases will be surprised by how little we know about which assays are the most relevant But but I your your point is fully taken and I agree 100 So I totally agree with David and at a coral area that is that we should you know in tandem link this to actual You know clinical genetic data so that you could see okay these p53 mutations that I'm predicting to have this You know in fact do I really see them in either tumor normal comparisons or in hereditary cancers and so on so I mean you'd like to be able to have that kind of synergy between the Actual record that you're seeing in and what you're kind of measuring in the lab But you know many of us agree that this kind of high throughput functional screening is the way to go particularly if you could Yeah, my guess is that groups of allele there'll be groups of alleles that behave similarly and you can probably you know use what data is available to To you know validate a particular subset as likely being clinically relevant and go from there, but yeah Thanks, Jay. Yeah, you are sorry to stop the questions, but it's getting late. I do want to move on