 Okay, welcome back everyone. We're rejoining the open session of the advisory council and HGRI's advisory council. We're going to have our last concept presented. This will be by Adam Felsenfeld, Program Director in the Division of Genome Sciences. And the concept of his title is molecular phenotypes of Nolaly in cells pilot project. Adam, go ahead. Thank you, Rudy. And welcome back from the break, everyone. As Rudy mentioned, I'm here to talk about a concept called molecular phenotypes of Nolaly in cells or morphic. And I'm here speaking on behalf of a small group of people who worked on this. And I want to extend my thanks to Colin Ajay and Stephanie for their help. You've all at this point, you've seen the published vision document. And as Eric mentioned earlier in the day, there are a number of predictions. One of which is the biological functions of every human gene will be known. There are in the strategic vision, a number of compelling genomics research projects and biomedicine, which follow from, from those, those ideas. One of them is to acquire an increasingly comprehensive view of the roles and relationships of genes and regulatory elements and pathways and networks. And when you drill down into that compelling projects language, you find this, that there's an unprecedented opportunity to decipher individual combined roles of each gene and regulatory element and then this must start with establishing the function of each human gene including the phenotypic effects of human gene knockouts. So morphic is aimed at this. The long term goal is easier for me to talk about the long term goal this particular concept is for only a phase one and I'll get to that in a little bit. But the long term goal is to create a catalog of molecular and cellular phenotypes of no alleles for essentially all genes in human. These would be assayed in vitro. The first is that the phenotypes are intended to be consistent in the sense that a similar depth, breadth and quality of information will be collected for each gene. This will make it easier to use the data for analysis. The second key idea is that the effort should be comprehensive with respect to genes. Note that we can conceive of even more comprehensive efforts at an extreme, for example, all alleles of all genes and all cell types with all assays, but this is going to be difficult for some time. By limiting ourselves here to a single strong will for each gene, it's now technically possible, we think to obtain a useful subset of this type of data across all genes with informative assays. This will be very useful for interpreting data about other alleles and other contexts as I'll expand on in a bit. This concept envisages that these assays will be done in cells and as informative as possible regarding the cellular molecular phenotypes we anticipate this means multi cellular systems, such as organ lines. The concept also anticipates that at least for phase one the focus will be on protein coding genes. Even with these constraints. This is a challenging problem. I think that the basic ingredients are available, but don't know which are the best in terms of information and scalability, including capacity for lower cost per gene in a phase one with a target of 1000 genes we think we can learn this. Here's just a high level view so you know what we mean by phase one. I'll show this again later in the presentation to discuss the details but the idea here is that today's concept is just for this phase. This is approved and successful we might be returning in several years to discuss what to do after that but we need a phase one to develop a pipeline and to answer some basic questions about cost throughput technical challenges data utility and interpretability data quality standards formats what the best assays are and a host of other challenges. So let's just justify some of the choices here. Why not alleles, those that produce no functional protein we propose them here is homozygotes for several reasons beyond what I just mentioned about constraining the question to subset of the functional problem. First we know how to engine engineer them fairly reliably second because they're likely to be highly penetrant and expressive. Third, it's long established that no alleles are useful for interpreting other alleles in the same gene or interpreting similar phenotypes seen for other genes. Fourth, the results will be complimentary to the extensive gene knockout phenotype data from the knockout mouse project which has anatomical and physiological data but little molecular and cellular data. Alleles do have drawbacks, including the possibility that they will be so highly pliotropic that interpretation will be complicated. It's hard to know in advance whether this pliotropy will be useful, or will be confusing in this setting phase one can inform that. Alleles also envisages that these assays will be done in cells in a way that is informative as possible regarding the phenotypes. Again we think this means multi-cellular systems such as organoids which are for the possibility of displaying phenotypes that are more faithful to the ones seen in whole organisms. They have the potential to allow assays across multiple cell types in order to capture more aspects of the function of a particular gene including features related for example to cell autonomy or cell specific function. Some general benefits. We are missing functional information for many genes. Morphic seeks to establish a consistent base level for all. Even for genes where we do know the associated or an associated anatomical or physiological phenotype and for which we have some molecular phenotypes like expression data. There's usually a gap between the two that we don't understand as well. Morphic will provide data that can help fill that gap. Finally the question of how NHGRI will go beyond correlation and prediction to mechanism is an idea that's come up several times in recent functional genomics meetings. Cellular and molecular data in multi-cellular systems can help with this. These data will also inform molecular and cellular pathways of course. Some additional benefits. Another source of data for making inferences about function in association studies and other gene function or perturbation studies. There's potential for the data to be quantitative which will allow for improved analyses. Consistency of information is nice when doing analyses and this project may yield other deliverables in addition to the knowledge generated including for example cell lines or analysis tools or computational tools. There are some challenges. The success of phase one depends on how well it informs our ability to address these. First, can mutagenesis scale. We think this is at a stage that can now be pushed. They're commercially available knockout IPSEs their existing libraries of CRISPR gene knockouts will need to be alert to some potential challenges including off target effects, genetic compensation and maybe some others. Some of these are just to be avoided, but others could be informative and a scaled effort could serve to characterize them more completely. It is not possible to assay every cell type, but enough cell types need to be assay to see an interpretable effect for every gene. We think they're likely to be creative solutions to some of these challenges, possibly through scaling or multiplexing assays cascade testing prioritizing assays based on known spatial expression, maybe others. Another potential challenge again is the problem of uninterpretable pliotropy or even cell lethality, especially with homozygous knockouts. How often will we see uninterpretably severe phenotypes. We think they're probably creative ways to deal with this when it happens, including testing heterozygotes are testing other loss of functional use. We expect there to be effects of genetic background that will be identified in this context as well phase one should assess the possible range of these effects and how to address them. In the scope of phase one, we propose a four year project with three components that I'll describe in a couple of slides for now to develop the elements of the resource. The objective again will be to demonstrate the scalability of all aspects of Morphic, making alleles of developing high throughput yet informative assays. In phase one, we expect to learn about prospects for improvement of cost throughput and utility of essays. We think that a phase one target of 1000 protein coding genes will be adequate to do this. The FOA and the funded investigators will have to consider how genes will be prioritized. There are multiple ways this could be done, some suggested here. I don't want to, I don't want to say which one of these will be. But they're because there are a variety of them and and they have their merits. I'll have their merits. So for example, an available cop knockout for the gene could be a criterion known or suspected human disease gene genes with relatively unknown function. Deliberately taking a deliberately diverse set in terms of for example protein class or tissue to make sure that phase one is testing the limits. Again, there are others and they're not mutually exclusive. Phase one will also have to develop standards. For example, for mutation QC assay comparison and data formats, comparing assays as a key activity for phase one. Because we need that information to know what to choose for phase two. The scale of phase one is not enough to be thorough, thorough and incorporating diverse samples but is enough to learn how to be systematic in the context of this project and to implement phase one phase two. One way would be simply to sample a spread of populations among the phase one assays. And we'll probably need some overlap in the genes tested between different population backgrounds. Another aspect of phase one is to start to use the data and analyses to inform production and see how the data perform in those analyses. So the best data types data formats and applications. There's always details about data that only become apparent once someone tries to use it. This can be a scientific issue but it can also be a technical issue for example that metadata formats. And finally, phase one goal is to develop data infrastructure. And that's all just summarized in an expanded version of the slide I showed earlier. In phase three phase one, we need expectations for success. This list of things that I just talked about is not a comprehensive one, but raises, I hope raises most of them. Again, they're all into the question is a phase two technically possible is it justified and what's the best structure, what are the best assays, how much is it going to cost. What follows from the scope is straightforward. We propose three components that will work in a consortium. All we use a cooperative agreement mechanism and all are proposed for a duration of four years. The first of those components are data production centers. They would lead the prioritization of phase one genes decide on overlap and testing develop plans for comparison between assays. They would engineer the alleles they would carry out their proposed assays. And then we would add data to component three, which I'll get to in a minute and leave a discussion about data standards and validation of wheels and cost per gene component to will be data, and I should say that we propose that that as four to six centers we definitely need multiple centers here up to $1.4 million per year each for four years component to would be data analysis and validation centers, two to three awards up to half a million dollars total per year each. And again, is to get people using the data so they can identify your issues early and then feed that back to the consortium. Analysis could be anything related to the key analytic issues pathway analysis pletropy cell type inference aiding interpretation of other functional data sets or association studies, and others are certainly possible. And again, mostly they should be designed to eliminate key issues with the data utility. And others are also likely to identify new deliverables that we didn't anticipate of high interest to the community, and we'll have to think about those as they arise and that's happened in other programs. The third component that we propose as a data resource funding ideally would be structured to meet demand, meaning that it would amount of money would ramp up to be commensurate with the amount of data that's available. The fourth component, there would be one of them would receive wrangle annotate and present data for the consortium and community use lead discussions about data formats and requirements. Integrate data from data production centers and pursue opportunities to integrate with similar, excuse me, or complimentary resources. And that would be leading efforts to integrate with complimentary resources for example integration with with the knockout mouse project regarding how to combine and present data on gene orthologs they're knocked out in both the mouse and human systems, for example, and also bring into the more of a consortium lessons learned from those other projects that are relevant. The third component would also be the consortium logistics center and help with communications and tracking tracking progress. We know that in the long term we need to be looking for opportunities to integrate data resources handling similar kinds of data, but certainly something like this will be needed, quite proximate to the consortium in the near term. We always talk about relationship to other studies. We think that morphic data will be valuable on their own, but are likely to be most useful in combination with other functional variant association data to provide insight into gene function and mechanisms of phenotype. Clear relationships and connections to other projects. They're mostly complimentary there might be some overlap. The ones here are just examples of disease association studies and Mendelian and common disease clinical studies or resources that interpret variants. And other knockout studies such as comp, it's not at all hard to imagine some of these interactions, but more detail will require more specific planning between efforts. Again, there's an obvious connection in particular to comp here, which has the anatomical and physiological knockout phenotypes, but not so much the molecular and cellular ones in addition to having. They've also thought carefully about an overlapping set of issues and could be useful in prioritizing genes as well as thinking about appropriate assays. So in summary the long term goal is a catalog of molecular and cellular phenotypes of no wheels for all human genes in vitro phase one would develop a pipeline, assess barriers to scale address challenges, and would entail 1000 genes over four years. As for structure four to six data production centers three analysis centers and a data resource fairly straightforward. And before I handed over for questions. I want to thank my many colleagues who worked on this for extensive input on ideas and and the presentation. So now we're on to questions. And I've asked Dr. Chung, Dr. Eidecker and Dr. Chang to lead off the discussion. Adam, would you like to take down your slides so that we get a full gallery view. Thank you. Should I lead off Adam. Yes, please. Okay, so I'm still a little bit trying to get my head around this concept. Obviously it has potential utility in this a lot of utility. It's a big to me it's big in terms of scope and you know 10 million per year for the four years. And I'll be very interested to see in this pilot phase exactly how it works because there's so many complexities and when I think about which genes to prioritize and obviously the those who apply will make these decisions but you know they're all sorts of dimensions you could have genes that are known to be associated with disease has some utility but also genes that we don't know what they do have completely different but also utility in there and when I just think about the humans they're so complicated to me in terms of both by time and development by organ system by that is by tissue but even cells within organs are such heterogeneity I'm trying to get sort of my head wrapped around how do you have this single platform with single cell type so you can do the cross comparisons but yet have the relevant ones for individual genes and so that's what I'm still trying to think to myself how that how's that going to work and how's that going to be useful in these ways, knowing and also you know trying to think about nulls work or homozygous nulls work for some diseases but probably not others and trying to think of the complexity of. I get the idea to do knockouts but you would you look at heterozygotes and homozygotes would you do that based on the human genetics or what we know about tolerance for haploinsufficiency or. You know what we know about mouse models and in comp and anyway I'm just sort of those are all the things really in my head and it's it's very complicated so I'm sure the first pilot phase will help us to understand that complexity and the value but I just see the potential for a lot of complexity. Thank you and I, I agree about the complexity and and one of the. Yeah, one of the challenges of writing in RFA is to make sure that we we focus applicants on answering those questions. So any advice that you could give me on on ways to do that be grateful for. Thank you so much for this presentation. I'm quite positive about this idea I think this will really fill an important gap and really expand our knowledge of genes are both familiar and unknown to the genomics community. So we have several important considerations, first you already highlighted the fact that the null phenotype depends on a genetic background. And so we spend this morning talking about diversity. And so it's important that in the pilot phase, we test several different genetic backgrounds, we don't want to just kind of lock in a single resource just for only one genotype and equally important obviously males females access versus XY genotypes should be tested in the pilot phase. So this just gives a sense of really how different could outcome be. If you have a different genetic background to start with right now give you a sense of how much of a dimension that parameter used to be. The second point is that the so called no allele. It may not be as straightforward as people think, and I would highlight that perhaps in the kind of the RFA or when you think about it. It can't just be a conceptually this would not turn to a pretty different it is a no we believe it's a no, they're already research showing that in certain cases. I mean, are they can actually trigger compensation of other gene family members whereas if you don't make the RNA at all, that doesn't happen to actually documenting what actually happened to the perturbation to the target gene in question is very important to interpret the results. I think that the third aspect question came up as to what what cell type or organisms should people focus on one possible impact of this work is that these data these cells will serve as a reference point for a lot of kinds of research and so perhaps then cell types and tissues are very accessible to the populations by blood or skin or something like that will be something you want to emphasize because then that's something that people can then obtain from individuals and compare against data generated from the morphic resource. Thank you. Okay. Thank you, Rafael and how. Thanks Rudy so so I wrote down three considerations, Adam, and they covered the gamut of scientific and dramatic issues they're kind of all over the map. One of them follows up on on Wendy's comment about phase one and and really trying to nail down what it can achieve. So, so one consideration is how how exactly can you calculate now your standards for evaluation of phase one when it's complete. What is what is success look like does it mean, you know, of those 1000 genes, 80% have been linked to some phenotype is that is that success. Whereas if you know less than half fail to show a phenotype by any assay you throw at it that's failure is that the way to think about this or you know how are you thinking about it. It's a combination I can answer. Yeah, we can do this in order and a sort of Socratic method too. So please. Sorry, yeah. So, so I've been thinking about this a lot. I think there's two things one is it's useful to have a number of successes but or percent percentage, but it's, it's actually much better just to no matter what percentage you have you understand why you succeeded and why you failed. And that's really, that's harder to state quantitatively but that's really what I'm looking for is is insight into why things work and why things don't work. Right, right. And just one one scientific note this is probably the least important of my comments but but one interesting note that I thought of when I was thinking about what what phenotypes would you really look for to be guaranteed you know I thought Dury Gider has his paper from over 10 years ago with a title uncovering a phenotype for all genes where she looked in yeast and it turned out by the time she had looked at 1100 different chemical compounds and all she's found a growth, by the way, she found a phenotype for 97% of these chains. Now I realize he's in people are very different, but I would point out that used to have a complete genome duplication so any, any suspicions that you might have compounded your efforts, somehow in fact those those paralogues according to her have have diversified, even for simple phenotypes like growth in different conditions. So, so that actually augurs well for for this effort if if any, if he's can be any judge. Now to my last point so so you did mention that there's these connections to other other efforts, such as IGVF, and in fact IGVF, I had thought of as sort of an umbrella effort almost to which this would be a special instance. So, so do you, I'm guessing you're going to disagree this is a special instance of IGVF, but maybe we can talk about in what ways it is or it is not. For instance, could someone propose to IGVF this exact type of project where they're going to blow in and delete every gene, or or does the language I might remember the language prohibits a complete gene deletion and the IGVF program. So I would just comment on whether this is distinct or this is like a special instance of of an IGVF type of project. Yeah, that's a hard, that's a hard question. And I'm actually going to call on Stephanie if she's able to, she's able to weigh in about whether or not IGVF specifically precludes this because I just, I'm sorry, I can't remember. Trey, I, I, in terms of the my understanding is, is far from perfect. And I'm also going to ask Mike Payson to weigh in here, especially if I, if I say the wrong thing. IGVF, my understanding of what it admits is, is, is broader. And, and because of that is, is less likely to be comprehensive in any one dimension. So, I don't know if you can look at it as an umbrella, but there's, there's clearly a continuum. Adam, I'm going to have Dan speak because he is one of our panelists I'm not sure if Mike or, okay. Stephanie are so Dan can you state Sharon pointed out the good point can you define what IGVF is, and then we can come back to the specifics of this and I just want to clarify, you know, IGVF is a as an approved concept the applications are in and that's sort of moving forward. So the sort of to your point Trey understanding how they relate is an important question that's slightly different from adjudicating. No, that's the spirit of my question. Yeah, so Dan, do you want to go ahead. Sure, sure. So, to the question of what IGVF is this is a program that we're starting up that's looking at the impact of genomic variation on function, considering genomic variation, very broadly. And so what Trey said to the effect that morphic could be a very specific or parts of morphic could be a very specific subset of what's covered in IGVF I think that it is true. There could be components of IGVF grants that are looking at some knockouts but whole gene knockouts but would not be looking as comprehensively as morphic is and and whole gene knockouts is not the focus of the type of variation that IGVF will be looking at. Does that help clarify things. Well, but it's likely that people will make nonsense mutations right in frame shift so. Yeah, so it sounds like this could have been proposed for IGVF sorry. Oh no, that's what I was going to say Trey is I think, I think if this had been proposed for IGVF and those applications are to states that we can't even really talk in great detail about what was proposed or what wasn't proposed. So in the process of that you some things would be null alleles but the real focus there is I'm looking more specifically at variation and I think the key point to Dan's is the level of characterization functional characterization of the null alleles that's being proposed here and morphic wouldn't be in scope for what some things that's being done so we have the connections and working together on this for how these relate so yes something like this could have been responsive but I don't think that the full dimension of this to your earlier point about going in different dimensions. This is going much in a deeper dimension on really understanding function of genes, as opposed to impacts of variations. And I know those are tied together concepts but there's key parts about the letter. And Trey, I sometimes think about this as a morphic as a way to run ahead in one dimension of all the various different perturbation by various different phenotype, etc. So that's one of the things you could do and something that could be put in place in in a relatively finite period of time that would be extremely useful again for broadly for interpreting these kinds of data. Yeah, so maybe clarifying language like like Adam you and Carolyn just spoke would be would be one thing you can put into the language of the program if you haven't already. Okay. Were you done Trey. Rafael and then how go ahead, Rafi. Thanks. So this question is similar to the one. Some of us raised about IGBF and it's, it would like to hear a little bit about the rationale for having this be a consortia, given how complex and how many dimensions there are. How hard it's going to be to decide among all those dimensions and possibilities as a as a big group. I want to hear a little bit about the rationale for having a consortia instead of at the first stage is letting every investigator do what they want to do on their own and see what works, maybe from there. Decide later to do something more organized. Yeah, so I, I think that in order to so first it's not a huge consortium. I don't consider this to be a huge consortium. I think that you sort of need a captive audience and, and that means cooperative agreements and and being able, preparing people to, to really sit down and hash out the advantages and disadvantages of the complementarities of their systems. And to have enough different kinds of data that you can get people together in in one place and talk about talk about mundane but key things like like data formats and QC and do things like force people to test the same gene in multiple systems, which might not be advocated by by all. And, and also to force people to, to, you know, for example, for the for doing diverse samples to make sure everyone's working on the same, the same range of diverse samples. So, I, I think I, I think that you need something in between just having people go be off on their own and, and trying this for a while. So that's I think the primary reason why I think this should be a consortium. Okay, how and then Sharon. Wendy nicely introduced the concept of complexity, but I think it was only the tip of the iceberg in these types of systems. It's not going to make much sense, for example, to look at a receptor knockout when the ligand for that receptor is missing from the system. In regard to the complexity of no alleles even with nonsense mutations some alleles will be subject to mRNA decay and no protein production where others will make truncated proteins with gain of function. And I think that it's, it's going to be imperative that wise choices are made regarding the genes, alleles cell types and contexts that are studied and I'm, I'm just not sure that the same groups that will be best at assuring efficient production of these cell types and cell lines are going to be the ones that would make the most informed choices regarding all that level of complexity. I'm wondering if there was consideration of having a mechanism for the broader community to nominate genes cell types alleles and contexts for study in this initiative. I think that would be a better way of assuring success. It's also should be noted that cellular phenotypes in artificial contexts will often give you results that are true, and yet irrelevant to normal cell biology physiology or pathologic disease context so I'd love to hear thoughts on that. Yeah, I, I, you know I started with, I think it's a good idea. And I agree with you about again about the potential complexity here. And with a thousand, with a thousand genes that then there's some overlap. And, and then you even a community nomination process needs probably needs some has has to have some direction to make sure there's a diversity, a diversity of factors that are tested. I think it can be done, it probably would start to cut into the, to the thousand, the number of thousand but maybe that's not so important if it probably can learn quite a lot from 500. Maybe as much as you could learn from 1000. So, I think that that's a good idea. Have to think about it. You know, I might argue that you could learn more from 250 good choices than 1000 bad choices. Sharon. I'm going in the same direction of how as how but came to a slightly different conclusion which is, I do think there's the issue of making the alleles, which is sort of right in NHG rise wheelhouse. There's the issue of doing all of the cellular phenotyping, which is really a much broader biology question. You know, by your emphasis on organoids, you know, people are probably going to make these alleles and IPC and you're going to need experts in differentiating into different cell types to really get any kind of reasonable assay. So I worry a little bit I mean it may just be the people will team up, but I worry a little bit the way it's written there's so much emphasis on the engineering. I think really a lot of the works going to be figuring out a set of, I don't know, 20 or 50 neurologic disease genes and then getting experts in those organoids versus another set of 20 or 50 cardiovascular genes, and getting experts in cardiovascular physiology or organoids that I'm just a little worried the way it's structured, you're going to have, you're not going to have the expertise you need to really set these systems up as well as you could. And so I would think carefully as you really write the details of the RFA, the degree of expertise you're going to need in these different physiological systems in addition to groups that are just experts in cell cycle analysis and physical capitalism that will probably cut across many cell types. I did appreciate how you reference comp I would say comp has been and I'll bring this up to the end about data visualization comp has put a lot of effort into data visualization of these kinds of phenotypes which I think have been quite effective. And so it would really be nice to see that. And I'm somewhat biased here because Baylor's part of comp and not so much part of the data visualization part. But I do think it would be important to kind of benefit from what they've learned about visualizing this type of data for the community to use it. I guess my sense is that it's going to be harder to find a phenotype than you expect, and you know you need some kind of triage process or alternatively a top down process like is it prohibited to suggest you want to knock out 10 contiguous genes all at once and then do the experiment and see if you have anything in all your asses because if you don't, or 100 genes or 1000 genes right I mean you can just go and cast CRISPR systems are very easy to make big deletions and so I just is that out of play if somebody said, that's another way I can tackle this or it doesn't really have to be individual No, I think that the trick is to leave it as flexible as possible and still get the answer and if that's a good way to get the answer we don't want to, we don't want to foreclose on that. Other comments or questions for Adam or the team. Okay, that's enough time for people to find the mute button so let's forge ahead can I get a motion to approve the concept. A second. All in favor. Thank you all opposed. Was your hand up there Sharon it was so quick. We can't hear you Sharon. Okay hand is up. How you also opposed anyone else. Three. Okay anyone abstaining for anyone abstaining. Okay, thank you very much. And thank you Adam. Thank you.