 Hello, everyone. Welcome to the pre-application webinar for a new NHGRI program called Molecular Phenotypes of Neural Realists and Cells for Morphic. I'm here. My name is Adam Felsenfeld. I'm a program director here. I'm here with my program colleagues, Colin Fletcher and Anjit Kilai. I also want to thank Sean Darren who will be helping with timekeeping and the slides today, and Gerald Simani and William May for their technical support. Sean, if we could go to the next slide. This slide is just for reference. This is relevant links that may be useful to you. I think you're probably familiar with all of them. The only one that you might not have seen is the one for the 2020 NHGRI Strategic Vision, which I think provides some higher level context for the kinds of programs NHGRI is interested in pursuing over the next five or seven years. Next slide. Some notes on the format. I will give a high level introduction to the program, and then I'll launch into a discussion of the first RFA, which is the Data Production Research and Development Centers, and then we'll stop for some questions. Next up will be Ajay, and he'll go through a brief presentation and then take questions for the data analysis and elevation centers, and then Colin will follow with a presentation on the data research and administrative coordinating center plus additional time for questions. To note, this call will be reported and posted to the NHGRI Morphic Web Pages. Your questions will be rendered into general FAQs with our answers, and those will be linked to the Morphic Web Pages. The chat is disabled. Please ask questions in the Q&A. You will have the option when you ask a question to ask anonymously, and you may do so. There is also an ability to upvote questions that appear in the Q&A. The next slide. On to the high level overview of the program. Next, please. We think it would be great to have a catalog of molecular and cellular phenotypes for gene knockouts of all human genes, especially together with the knockout cell lines as a resource. As an ideal, the data would be highly informative, that is, of multiple types and from multiple tissues. The data would be relatable to other kinds of data, so other alleles are anatomical physiological disease phenotypes. They would be consistent, high quality standardized, well characterized assays, and they would be complete, that is, for all genes, resource covering all genes. We don't think this is feasible yet. We don't know the best way to design an effort like this. There are a fair number of technical and scientific challenges or barriers, which I'll get to later. We don't understand scaling issues like costs and throughput. We don't understand the strategies to employ, the tools to be developed. We don't understand the trade-offs that will need to inevitably be made to best explore what's likely to be a very large experimental space, even though we've limited this to all of us, to one type of wheel. And of course, just getting the data is not enough. We need to understand how to use it with all that entails from data management, quality control to making the data interoperable with many other types of data generated elsewhere to demonstrating the scientific value of the data. We go to the next. Why is this interesting to do? So first, there's a positive idea on human knockouts, though of course there are resources like Nomad that do yield some good information. There are mouse knockouts from the knockout mouse project, but there's almost no molecular and cellular data for them. And while mice are an outstanding model for many reasons, they're still not humans. Knockout phenotypes are useful for interpreting other alleles. The data will complement a number of large efforts already under way to study gene regulatory variation. Such a catalog would be a resource for insight into biological pathways. And if the cells on which the assays are done can be saved, they may ultimately represent a collection of disease models for further study. Next, please. With the consideration that we don't yet know how to realize a long-term goal of a comprehensive morphic catalog, NHGRI is starting the program with a phase one. The purpose of phase one is to inform the feasibility value and design of the potential of phase two. Ideally, phase one will illuminate and start to address the main technical and other barriers to the long-term goal, such as how to select genes, how to optimize making millions, what cellular systems are the best, what molecular and cellular assays are the most generally informative, what's the potential for scale, what are the key scientific issues, and we can anticipate some, like specificity or priotropy or cell monotony or genetic compensation, experimental and biological variability, and I'm sure there are many others. And we don't know how to get the maximum value from the data, not least because we don't have a lot of this kind of data to play with yet. Phase one intends to do these things above by starting at a modest scale and at the same time giving room for some diversity of approach. It's fairly straightforward. Next slide. As I mentioned, up top there will be three components, data production, research and development, data analysis and validation, the data resource and administrative coordination center. Next slide. Switching gears now to some mundane but very important reminders that apply to all three FLA's. These are cooperative agreements. That means there will be substantial nature program management. There will be collaborative tasks, for example, across the consortium, for example, with regard to sample prioritization, quality control, data format discussions, data flow discussions, please read the FLA turns and conditions for how this will be managed. Another aspect, the consequence of having cooperative agreements is that NHGRI has a flexibility to set and adjust milestones, which is needed in a complex program. Finally, there will be a kickoff meeting after the grants are funded to establish the consortium and to start to outline some of the collaborative tasks. We ask that you please send in letters of intent. We cannot require them, but we do encourage them. The letter of intent date is September 15th. If you can't make that date, please send something in as soon after that as feasible for you. These are very helpful to us. We will call, we will contact everybody who intends to submit an application to talk to them. The other reason to send these in is that they really help our review branch recruit good reviewers, which always takes a lot of lead time. Next slide. Some more general advice. Please always read the review criteria section of these or indeed of any FLA that you are responding to. This is what reviewers must use to evaluate your application. Please read the instructions to applicants for the research plan sections. Each of these asks for a specific format to organize the research plan. The FOAs all have a separate resource sharing section. Usually when these are included, they are considered, but they are not included in the score. In this case, for all three of these, the resource sharing section will be considered in the overall score. Please read the section in the FOAs on review and selection. This section lists criteria that NHGRI may apply in selecting among well-scored applications. Please also read the budget section that provides guidance on minimum time commitments and also states that you should reserve some of your budget for work and social interactions. Finally, please choose letters of support judiciously. If you have too many of them that aren't directly relevant, it may put members of the community in conflict who could be good reviewers. But if there really is a collaboration, obviously you need one. I note about diversity in funding. NHGRI especially encourages applications from investigators from demographic groups or institutions that are generally underrepresented in genomic science. New investigators, experienced investigators who are new to genomic science, investigators that have not previously participated in NHGRI Consortium Program. Finally, please look out for the FAQs in the next few days on the website. These will be updated as we get more questions from folks over time. So please do check back with them. At this point, I'm going to go right into the discussion of the first RFA, HG21-029, with data production research and development centers. At a high level, what we're asking applicants to do is straightforward, conceptually, though definitely not biologically or experimentally. And that is to generate molecular and cellular phenotype data and informative cellular systems for mullo wheels of a thousand protein coding genes across the program in the initial five years. So straightforward, identify and prioritize the genes, make the alleles in the appropriate cellular systems, carry out informative assays in that system. The overall focus here and a point that I'll make over and over again is the overall focus should be on generalizability. How or what is done contributing to understanding how to do this across the whole genome and going forward in a potential phase too. Next slide, please. Following this data production research and development grantees, well as I said in the last slide, we did the consortium prioritization of genes and produce alleles in the chosen cellular systems. Diving into some of the others, the grantees will generate data from high throughput cellular modular assays and in vitro human systems. Multi-cellular complex systems such as organoids are preferred but not absolutely required. Molecular and cellular assays should be considered based on the utility of the information we know that and contribution to understanding how the best do this going forward. Grantees will ensure compatibility and reproducibility, develop metrics and quality standards, standardized allele and assay validations to enable assessment of advantages and limitations of different approaches. Grantees will share data, best practices, cells, protocols, methods, software and other products of research. Grantees will collaborate with other consortia or projects developing complementary data sets and generally develop an approach that will be informative for how we can eventually generalize and scale. So notes about responsiveness of application. Again, we're looking for generalizable approaches across multiple cell types and phenotypes in classes of genes. So not just one disease and not just one cell type. There's some discussion about that potentially in the FAQs. These are not grants to study gene regulatory variation. These are not technology development grants. They must work. What we propose must work at phase one scale. And each application must propose an integrated effort so gene selection for generation of molecular and cell or phenotype data and not just any single aspect that we've asked for. Next slide, please. You will see in the FOA that there's a list of what we call main barriers to understanding how to get to the long-term goals for Morphet. I have them in the slide, but I'm not really going to read them. I just want to say something about them. They're not necessarily cues for specific aims. They're mostly higher level than that. These are things that we want to come out of the initial five years as a whole and as such, they represent important motivations for phase one of the program. If we come out of phase one with a clear idea about these points, we'll consider it to be a success. This list is probably not comprehensive. If you can identify other barriers in your applications, I think it will be helpful. Next slide. I've already talked about a lot of this, but for emphasis, grantees will have responsibility both to their own aims and also consortium responsibilities. These are likely the key areas in which they'll arise. So developing gene priorities, developing quality metrics, data formats, policies, internal policies, sharing data plus other products, quality control. There's always a possibility here and an attraction to the idea potentially running the same genes through multiple grantees' assays. See if that's feasible. We won't know till we get there. Characterizing variability, both experimental and biological, and helping to develop use cases for the data. Next slide. I know I already said this in the general introduction, but I really do want to emphasize for this FOA. In particular, it asks for the research plan to be presented in four sections, so center overview and management, mutagenesis, biological sample systems and experimental assays, data management and analysis and then integration. And then as I mentioned before, there's a separate resource sharing section. Again, consent in the school. All right. I think at this point, we can start with the Q&A. I have some in the next two slides, some pre-loaded Q&A, but if we have some, a front-line student, we already have some. Okay. So this is first question is about whether we require stable null cell lines or is an approach that uses pooled knockout experiments. So it does not, it does, either of those are possibilities. If you look at how the FOA is worded and the review criteria, there's room for the second approach, but I think it really has to be carefully justified. There are some things about pooled experiments. There's some shortcomings of pooled experiments that I can think of in terms of reproducibility, for example, and I think that would have to be addressed. And you would also have to talk about how that's generalizable, again, against the other higher goals of the FOA. Let's see. So how defined does a null real have to be? I think that's, you're going to have to state, applicants are going to have to state how they have validated their alleles and how they will be reproducible and give consistent. So if you redo the experiments, you're going to get a consistent result. So I'll leave it there. There's another question about the deliverables. Okay. So it is not strictly correct that it is a requirement for a final deliverable to be individual criminal human IPS scenes, but I think that would be a great thing if this came out of the program. That's a close reading of the FOA. Okay. And then there's a very specific question about for IPS scene lines, how many vials of cells are needed to be frozen down for distribution. So you will have to propose, I think this is something for you to propose in the research, in the resource sharing plan. What you think is the white balance between cost versus what is a good deliverable? So I think you are going to have to figure out what the right balance is and propose it, and there's a few of us who will take a look at it. The next question here is a question that is I would say is a frequently asked question. How many individual genes should each application propose? So the way that we stated it is a thousand total over the whole program over the whole five years. And given that we intend to make four-ish awards, that's 250-ish a piece, but please leave some room for overlap. I think that it would be great if there's enough compatibility between what centers are doing, because I think that would help with QC and validation and extending the results. And then there's finally another question about an added dimension to this, which shows up in another FAQ that I already have, which is everything in every dimension we add adds a lot of work and requirement for resources. That's absolutely correct. The one asked about here is, generically, is sample diversity. We do not have, I don't think that there's any way any way that there are enough resources to, it's unlikely that there are enough resources for, to really test sample diversity and to get, you know, to get well-validated results. But I do think that this program needs to come out of the first five years with a much more solid idea of what the variability is between, between cells taken from different genders, from different life state, people from different life stages of life, and from people from different ancestry. And again, I don't expect that this can be thoroughly explored, but I do think that the program as a whole does need to explore this. And this is something that's explicitly discussed in the, in the instructions to applicants and also in the review criteria. So please take a look at the FOA. Okay. And I think I've answered the next, the next question. So the appropriate, I want to be clear about this again, because maybe this question is about what are the appropriate organoid systems and cell lines that would be responsive. So, so the ones that, that strike the best balance between informativeness and, and all the other and generalizability and all the other and specificity and all the other things that, that your plans are going to have to balance to, to in five years contribute to telling us about how feasible it is and how you would design a phase two. So you're going to have to propose that and strike that balance and justify what we choose in terms of the, the, the sometimes the goals where there are tradeoffs between, you know, how informative something a system is and asking is how specific it is, how generalizable it is, how, how expensive it is. And I think that's, that's what we hope you can tell us how to do. Okay. The next question is important. Single grant may not be able to do everything. Would it be responsive for one application to focus on creating stable mal cell lines and another application to focus on molecular assets in the cell lines in a consortium manner. So we, we deliberately decided not to do it this way. And applications focusing on just one aspect would not be responsive. We really want to hear the integrated ideas. I think that there's a significant chance that, that these things are these different parts might have certain dependencies that we don't understand yet. And so it's not really possible to, to do it this way. In addition, because it's phase one, we want some, we want some room for differences between, between the centers that we fund. We want some diversity of approaches. That said, the extent to which something like this can be done is, is, is something that's very, very interesting possibility for, for a consortium discussion once applications are funded, successful applications are funded. So it is something we'll be thinking about. Right. Here's another question, which is in selecting genes. Should one have in mind that they should be functionally tested in different types of organoids, such as cortical organoids, liver organoids, et cetera? Yeah, this is a distinct, this is a, this is a possibility. It's, it's completely, it's a completely reasonable approach, although obviously it spans the number of tests and costs. So again, there's some balance here. Maybe, maybe there's a, maybe there's a more clever way to do this. Right. So I think these are all, these are all answered. I'm sorry, there's something going on in the, there's something going on in the chat that I, I, I'm not sure. Let's see. Okay, there's a, there's a question about, for computational teams, it's difficult to envision a data analysis center given that assays are still not defined. Will there be another opportunity to synchronize post-letter of intent period? So I'm gonna, I'm gonna flag this question and, and leave it for Ajay after the next, after the next press, after his presentation. So if it doesn't get answered, please ask it again at that time. The next question is what time of interactions between I think, I think this is meant to be production sites is expected. Yeah, whatever is, whatever is useful or maybe it's funded sites is expected. Yeah, whatever is useful, I think, I think that these are two possibilities. If you think, again, once the consortium is formed and people know who they are and what they're doing, it will be easier to imagine this, how to do this. And I think that the practical approach, and this comes up in several, in several ways in several different parts of these applications. My, my practical advice about writing around this kind of thing where there's a, you have to come up with a plan and yet you have to be flexible is, is to come up with the plan that you think is best and justify it in a thoughtful way. And then indicate what parts of it are flexible or flexible, flexibilities to share and limitations that are, that, that, you know, that are realistic ones. I think that's the best way to do that. But I really think that it's quite important that we and reviewers hear what your opinion is of the best way to do this and what that justification is. What kind of like either assays are expected, gene expression, epigenome mapping, et cetera, or just say the phenotypic assays. So the, the, we want both molecular and that includes molecular omics and informative cellular phenotypic assays. We want both together. The, the, you know, gene expression assays are, are well characterized. They're generalizable. We have a feel for how much they cost and how informative they can be, but, but not, not enough, I think. And I think we need to push beyond that. Again, in order to be able, the part of the goal is to be able to use this catalog to interpret other kinds in combination with other kinds of data that are not produced by this, this consortium. That includes other phenotypic data, other cellular data to be able to put it all together across different kinds of data resources to make useful biological conclusions, to ask useful questions. And, and I think the richer the data, the better the chance of doing that though, exactly what data has to be, have to be thoughtful about that. Okay. I, I think the next question is can one create or and test multiple null, null alleles of a gene? I think it's supposed to be for functional testing. So I, I'm trying to, I'm trying to understand this question and I may need a little bit more information. Are you talking about, for example, different isoforms? I don't know. So it, it should be a strong allele. Obviously, a null allele, we really want something that has, has close to no functional protein as possible. And that's pushing things towards, towards strong phenotypes that are interpretable. But it has a downside as well, which is it's also likely to result in some pliotropy in many cases. And frankly, I have no idea across all genes how that's going to play out. And I'm very interested to think about that. I think that has important implications for potential phase two. It does say in the FOA that, that null alleles are, are the default. But if there is very good reason to believe that a null allele won't be informative, for example, because it's so lethal, then, then it is perfectly fair to propose alternatives. The next question is does one need to assess more than one organ system? And just careful here. The important thing is, is to consider generalizability. And that means at the end of the day, different cell types, understanding different tissue types, understanding, you know, being able to, to select the 1000 genes and test the 1000 genes in a way that's maximally informative about the whole genome. And I'm sure everybody has their own ideas about the best way to do that. And the best way to prioritize those genes, and that is going to be a discussion. In addition to something you're going to have to propose, at least propose the rationale for it in your applications. But there are ways, I can imagine ways to get generalizability. So it in several different ways. So if you do propose a single organ system, you're going to have to talk about, please, how in the application, how that is going to be generally informative across the genome, or across different tissue types, or across different cell types. Okay. Yes. Mr. Sajek, can I take one of the questions in here and ask you a very specific? Yes, you can, but you'll have to move closer to your microphone because I'm having a hard time hearing you, please. Okay. So one of the questions here is that that goes, my understanding is that this RFA prefers the final deliverable to be individual human clonal individual clonal human IPSE lines for each knockout gene for distribution to different labs in the community for further studies. I wanted to ask you a question about interpreting community. What is the interpretation of the word community we are using in the FAA? Do you expect these to be made available to anyone who asks for it from the entire biomedical research community? Or is it just the consumption? Right. So making it available to the entire community is obviously going to be prohibitive. It could be prohibitive, especially if these have to be maintained and expanded indefinitely. So I do think, again, yeah, I don't have a great answer to your question, RJ. I think it's a really good question. But I think applicants will have to propose what they can do and what the practical limits are. At a minimum, I think that there should be the ability to exchange samples within the consortium. And because of the way it's structured, that's between the four to five funded production groups. And then we can talk about more more after that. There are just a whole bunch of complicated issues in addition to cost, which includes how informative they are and things like that. So I don't think that it's practical to require something as extreme as having a resource or having preserved and expandable cell lines that can be extremely widely distributed without adding something else to this program. All right. Here's a couple of more. RJ, do you think that that was a fair answer? Absolutely. Okay. Two more questions. One is similar to one that I answered before. That's many scalable pulled screens and things that are dependent on redundant measurements and multiple SGRNAs. But we only use individually verified knockouts, but it's still okay. It is not critical to have an output of verified wheels and individual cell lines as an output. It would be, I think it would be a great thing to have that. But there are technical trade-offs and the technical trade-offs I could see could push you towards a different kind of system. But it's up to you, again, it's up to you to justify your choices in the FOA. There's certainly room for this in the FOA as I wrote it and as I read it. Another question is, are human gastroids an acceptable model? Yes, you would have to justify them. Okay. We have no open questions at this point. We are at about 40 minutes in. So we have about another 10 minutes. I'm going to read off some of the questions that I had as potential FAQs that have not been asked already. One of them is can we analyze our own data? And yes, and this is in the FOA. But there are limited resources, especially after doing all the other things that this FOA is asking for. So if you are going to do this, please prioritize analyses that are designed to characterize the quality and utility of the data for downstream applications. So biological and technical variability data consistency analyses could include, for example, looking for correlations between assay data types or comparisons integrating outside data, for example, from COMP or other perturbation by phenotype data. Those are all acceptable as long as they refer mainly, they're mainly used to help characterize the performance of the system or demonstrate utility. So that it gives a lot of leeway, I think, but it does need to be disciplined because of budget constraints. I do want to another FAQ that I anticipated is slightly different from the one that Ajay asked, which is do the sample cell lines data derived from them need to be shareable? So this is a different kind of answer to that question. So increasingly, NIH policy is that data resources should be derived from samples that are derived from individuals that are originally consented for broad sharing. And this is very important here because if the consents fundamentally limit sharing the data or any other products that come out of this derived data or even in some cases potentially potentially cell lines, then it will be the poor, the effort will be the poor for it. So please talk about the sort of the consents used of how the samples were consented in the recent sharing plan with attention to how really shareable the downstream data will be. Here's a question. How well-defined does the strategy for functional assays in the proposal need to be? Should one group focus on a defined proven strategy or evaluate feasibility or scalability between multiple strategies and presented as such from the proposal? So I think there is room. I think every proposal needs to have at least a plausible phase one strategy at least part of their work. And that was back up in responsiveness. This is not technology development, but clearly not everything is going to be optimized. So I anticipate everyone's going to have to do some optimization. So I think that there's room for both. I don't expect that every application is going to be exactly the same in this regard. But there should be some core element that you really can't justify. You think it's going to produce useful data relatively soon in the program. I hope that answered the question. Again, in my opinion, in my opinion, the sort of more optimization stuff part. And it does say in the FOA that if you shouldn't really be doing de novo technology development, but you can use funds for adopting technologies that have been developed elsewhere or for optimization. And you should talk about both about what you want to do and also sort of how you would bring new methods and or new optimizations in to what you're doing. Because I again, I don't believe that everything is ironed out here. And I think there needs to be room for optimization over the period of these grants. We have four minutes left. And I think we'll just can I don't see any more open questions we have about two or three minutes left. And we'll wait to a few minutes to start up the next one in case people were on a schedule. All right, I think that we can wrap up here for this for the discussion of the production research and development FOA and move into the next one. Thank you all for your questions. And don't forget that you can always can always ask questions. And if you have further questions, you can contact contact us at the Morphic HRI email address that was listed in I think the second slide. All right, Ajay, I think you can go ahead. Good afternoon. So I'm Ajay Pillai and I'll talk about the second RFA in the Morphic program, which is the data analysis and validation centers. So the primary goal of the data analysis and validation centers FOA is to make sure that the consortium's data variability is controllable. Data is useful to understand basic biological processes and that it is understandable for undertaking future hypothesis driven science by the community. So the projects proposed within this FOA have to have a high potential to eliminate the strength and weaknesses of the data being generated within the program and to ensure that the data is utilizable by the community and that we obtain needed feedback from the community for that purpose. Next slide, please. The FOA lists a bunch of non-responsive criteria and I wanted to highlight a few of these here that in this FOA, wet lab data generation would be considered non-responsive. Applications that do not propose to use Morphic data and that do not address collaborations within the consortia and also that do not have a data sharing plan that will be non-responsive. There are other criteria listed in the FOA and you should pay attention to that. Next slide, please. Some of the key challenges for applicants to this FOA is to realize that year one is special in that there'll be more likely than not no Morphic data generation within the consortia. So we expect that you will bring your own data, so to say, or publicly available data to address questions that go to the root of what Adam described and what the FOA describes as the basic goals of phase one for the Morphic program. These methods that you would use to develop and to sharpen using publicly available data is one approach to dealing with the special year one requirements. Next slide, please. Some of the other challenges that we need to keep in mind and a lot of this is not very novel in here, there are standard data analysis challenges that there aren't that we can identify and correct technical bias in the data, that batch corrections can be performed and that other issues of large-scale data analysis need to be addressed in the application. You should also think carefully about what metadata need to be made available within within the data releases for the consortium, what types of quality amount control and quality assurance need to be undertaken for making sure that the data that the underlying biology and the data generation challenges are reflected appropriately within the metadata. And as I mentioned before, one of the main goals is to be is to have various labs within the community use this data to do further more detailed studies for all of the knockouts and their biological role. And so you should think carefully about methods and APIs and other things that you want to propose that will make the data more easily accessible by the community. Next slide, please. There are also consortium responsibilities as the FOA states in multiple places. This is phase one. We need to gather a bunch of information about how this is going so that we can make decisions about scaling this up to genome-wide. So one of the questions that keeps coming up is how are we going to select the thousand genes as a consortium? In general, essentially how good and useful are the data and how good and how we work together to understand and evaluate the utility of the data and the metadata and the APIs that the consortium is going to make available. Next slide, please. A few things about the application and review. So as Adam described earlier, these are all cooperative agreements. And please follow the instructions in the FOA about the research plan and research strategy. Please note that the budget is $300,000 a cost per year and for a period of five years. Also, again, data sharing is required in a separate section, but it is reviewed and scored, as again Adam mentioned in the beginning. There are also specific review criteria for this FOA that you should pay attention to. Next slide, please. And that's basically all I had to say and it's open for questions. I think there was a question that was asked previously that I don't think was answered. So it's effectively something that I said about year one. So the question is for computational teams, it is difficult to envision a data analysis center given that assays are still not defined. Will there be another opportunity to synchronize post-letter of intent period? So it is true that we will not know the precise nature of the assays that are going to be part of the consortia and waiting until the letter of intent period, since it is not required, is not likely to necessarily answer the question. I think generally, so I offered one way in which you can approach the first year when there is no data from the morphic consortium itself. Generally, the FOAs try to tell you what types of assays and also the question and answer sessions today. And that will be reflected in the fact. Generally, indicate what type of assays you can expect the consortium to have. So that's what we will have to work with. So that's the short answer to the question that's posed earlier. If you have any more questions, please type them out. So we have a few more questions. The first one is will new algorithm develop be considered responsive to this FOA? I think the answer is yes. There's another question which says it would make sense that each production center designed their own data analysis depending on the assays. So this question is I think Adam tried to answer this. Let me try and repeat Adam's answer. And if he has to add anything else to it, I'll ask him to chime in. So yes, the data production centers have to validate their own assays. They have to make sure that their assays are reproducible, that they give you sufficient biological signal. One of the roles of these data analysis centers is actually to look at data across all of the different production centers. So there are goals and there are consortium white goals that the data analysis and production center applicants and grantees will have to contribute towards and answer questions about. There are also questions that cut across individual data production centers that would also need integrative analysis and other methods to look at the fidelity and biological signal within all of these data sets. Adam, do you want to add anything else to that? Yes. Ajay, I think what you said is right. The production centers can propose analyses on their own data and it is very likely that certain analyses will depend on the data type. That's limited in the budget. I think that the hidden question here or the spin that I'll put on this question is that it's also important for the data to get out and be analyzed by the same data to get out and be analyzed and analyzable by the analysis and validation centers for all the reasons that Ajay says. And this, together with the last question I think that was asked, it is going to be a challenge. Again, because the analysis center can't anticipate everything that's going to be proposed, but it's something that's going to have to be, there's probably going to be a lot of discussion around this at the consortium level once applications are funded. To Adam's point, it is often much easier in some ways for data analysis to be done by the production centers because they have a lot of metadata available to them, which people usually don't realize are important for when you release data sets. So one way to think about these analysis centers is that they're your first outside group looking at the data and looking at whether enough metadata is available to actually interpret the data accurately. So that's another thread in the same. So I move on. The next question is, should we expect consortium wide portals channels for disseminating algorithms, tools, results, or is the expectation that each of the data analysis and validation centers have individual dissemination channels? So yes, we will have consortium wide portals and channels for disseminating this, so you don't need to have a build up your own websites for public consumption. And I think Colin will address that in his next presentation. I'll also add that, you know, I mean, in the end, the usability question of your own tools are in effect your responsibility. So how you communicate that with the individual community members is going to remain your responsibility, but you don't have to set up portals. Okay, Sean, do we know how we are doing on time? I know there's a lot of it left. Yes, I believe we are ahead of schedule right now. By my estimation, for this RFA 030, we are 15 minutes ahead of schedule. I think we should wait. We should wait another two or three minutes for additional questions. And then in anticipation that somebody might be waiting offline until Colin starts to present, we can start back up in maybe another 10 minutes or something like that. I will remain. I will be sitting here looking at questions if another question pops up. I'm happy to try to answer it. But I do think that we should wait in case somebody was planning just to show up for the final presentation. Oh, so Adam, I just realized I was texting everybody, not just the host and panelists. But can we put up, I think we should just go officially on break and tell people what time we're back. I think that's completely reasonable. So what time would that be? Back at 225. Okay. Sean, can I ask you to reply that just says we will be back for the final presentation at 225? Yes, I can. Thank you. And I ask the hosts to stay live, please. And panelists to stay live. So as Adam said, I think both he and I are available to answer any more questions. We have another question, I think that was addressed previously, but asked a different way. So I'm happy to answer it about the production groups. So there's a significant emphasis on organoids and IPS models. Does that mean the cell lines are not going to be responsive? So cell lines could be responsive, but you have to justify them in terms of generalizability. I'll go a little bit further than I did before. I can see that in some cases, cell lines are going to have advantages. I don't know how that plays out across different cell types or different tissue types or different assay types or informativeness or any of that stuff. But I can see that in principle, for example, there are cell types where organoids are not well developed, for example. So I could see that there are justifications. So it wouldn't be unresponsive. I'm technically speaking for a program director and the review officer. Non-responsive means that it's either so obviously far off the topic or that it is something that's listed as one of the responsiveness criteria. So it could be responsive, but my intuition is that it's going to take careful justification to do this. Well, I've got a question for you, Adam. Immortalized cell lines, cancer cell lines, cells that are not totipotent, those kind of cell lines? Be appropriate or not? Again, I think it depends what kind of advantages they have for striking this right balance between informativeness and generalizability and scale. There's a lot of trade-offs here. And again, they don't play out. In my thinking at least, and I'm sure the people on the phone have considered these issues more carefully than I have, but my understanding is that again, things are developed unevenly. The state of the art is uneven across different cell types and across different kinds of assays and things like that. And the RFA does not absolutely exclude, does not say cell lines are not responsive. It does state of preference for organoids mostly because they can be extremely informative. All right, Colin, I think you're up. Just waiting for that slide. So, third RFA for the Data Resource and Administrative Coordinating Center. This is the FOA that's going to pull it all together for the entire program. The, I'm going to break it down based on the instructions in the research strategy section, which you should very carefully read. And we've essentially laid out five different tasks for the, for the DRAC as we call it. And though generally there'll be sort of two large components. One is being a data host and disseminator and analysis center, which will include the web portal and the dissemination API solid activity. And then the second thing is a tracking and collaboration and coordination activity, which is the coordinating center. So, I have a few slides that sort of cover these different topics in detail. So, the basic data resource, database, an interesting thing here is then collaboration with the DAFs that Ajay talked about. So, there's going to be a relationship there where they work on the analysis and validation of the data, which they get from the production centers. They should do that in a collaborative way. But the final analysis, annotation and dissemination of the Morphic data will be handled by the DRAC. And then another interesting task will be integrating external data and information. So, this program we think will benefit greatly from, from bringing in outside data to help with interpretation and understanding the implications and biological ramifications of the findings generated here in cell lines. For example, conf data. And then finally, serving as administrative coordination center. So, we can discuss these in a little more detail starting with the next slide, which is the data resource. So, imagine that the DRAC will primarily receive Rangel and QC primary data in collaboration with the data production centers. And that will require defining very detailed sort of data output standards. So, this includes not only the primary data, but the associated metadata and protocols from the various centers that probably be overlap with various centers doing similar types of assays. So, that'll have to be worked out on the consortium-wide basis. They will develop a database to store this, which will be obviously molecular data, metabolomics, transcriptomics, that sort of thing, probably imaging data, 2D, 3D imaging data and other sorts of things. You'll have to anticipate that and do that adhering to the fair principles and then make the data available to the consortium and community briefs and use. So, immediately, of course, is making that data available to the DAVs and being able to work on the analysis of the data. The next project that Dennis is collaborating with the DAVs, and since that's covered in the other FOA, and I just talked about it quite a bit, I really defer to him on how he envisions that happening. And then, based on the outcome of that, playing with the data and understanding it, understanding how to analyze it properly, understanding how to annotate image data versus quantitative data, categorical data, the final pipelines or the final repository of all that information will be in the track. So, when you look up a gene and you look up an asset, you'll be able to find the R package that the scribe site was analyzed, you'll be able to find or reference to where the R package can be downloaded, the metadata, the equipment that was used, that sort of thing, and the annotation terms that are applied to that analysis. So, that will identify outlayer phenotypes as mutants. You'll use some kind of ontology or controlled vocabulary for annotation. In COMP, we use a mass phenotype ontology, we use a variety of different controlled vocabularies for image data. I think image analysis is going to be particularly challenging, but with new AI applications, maybe there'll be some breakthroughs there. You know, histology, staining data, forest and stator, you can imagine that you do not want to be hand annotating that depending on expert review. So, that's an interesting question. And then providing a web portal, which we expect you to also work on user interface experience to make that facilitate the interaction by the outside community coming into Find and discover data and provide, of course, robust APIs. I think the next slide talks about data integration. So, it's up to you to identify complementary information resources to implement some kind of interoperability and make that data available. Obviously, my background is working on the COMP program, so I'm very keen to understand how cellular phenotypes and molecular phenotypes for genes that are knocked out in this program relate to the intact organism phenotype we see, say, in the same gene knocked out in COMP where the mouse, you know, maybe deaf, may have a bone disorder, may have obesity, and then how that relates to the various cell-based or organ-like-based assays that are taking place here. So, if you have a way to federate that sort of consolidation of information, that would be what we want to see. And then the second major prong of this effort is going to be administrative and coordination center work, so providing communication platforms, facilitating interactions within the consortium, tracking consortium activities, experiments and data, which is probably going to be part of naturally flowing from defining the data upload standards and tracking the metadata and then analyzing that to understand how progress is being made. And then, of course, it falls to the outreach efforts to promote the consortium resources to external users. And that may be social media, maybe meetings, and maybe workshops, there's a variety of different avenues you can imagine for that. I think the final slide is next from correct, the two slides in. So, again, application review is an important consideration, particularly if you have not previously had a U award. So, ask yourself if you've been involved in a cooperative agreement with NIH, and if you haven't, there's some things that are quite different about these kind of applications. So, for instance, please follow the instructions in the research plan. So, the research strategy section lays out in detail the things you have to address. This is a little bit unique to use. We lay out things in much more detail. Second, check the application review criteria, because we add the general specific questions for the reviewers to address in the review process. So, this should help you understand what the reviewer is thinking about when they go through your application. And then, finally, check the award administration section, because that contains the cooperative agreement terms and conditions. And look at that carefully, because that lays out a very detailed structure of how a cooperative agreement works. For those of you who had used, you know, how this works. So, the budget is $1.5 million total cost per year over five years. And I think that's most of the highlights of the FOA that I want to point out. Next, I think that was my last slide. I have not received questions in the email box. So, we can take questions now if there are any. Your question to Ajay about whether everyone should have their own individual dissemination channels versus the centralizing this. And do you want to sort of address that question again from the perspective of the coordinating center? I think Ajay was trying to dissuade people from doing that. Am I correct? I think we want to have a uniform. We want to have a one-stop shop that disseminates the morphic data that's authoritative and has version control. And you done a lot of from here, you know what you're getting. We really don't need to have three or four different websites that somebody has to go to to make sure that they're getting all the data. So, we really want to consolidate it at the draft, put it in a stable database and disseminate it from one portal. Colin, you have a question in the question and answer box. I mean, so given the lack of year one data, will this FOA address public user supply data as for the analysis centers? I actually think there's going to be an awful lot to do that won't require morphic data. For example, all you have to know is the analysis that's being planned by the production centers to be able to get to work on the markup to specify what the upload is going to look like. And to collect that information and to collect all of the metadata that's needed to be associated with that is a lot more time consuming and harder than you think. It took us years to get that perfected and comp. I think we spent the first two years just working on that, facilitating that whole, automating that whole process so that it worked reliably and robustly. So that's a lot of work right there. Second, data integration with an outside source like comp, that's going to take a huge amount of effort to get that done. Third, evaluating ontologies and control vocabularies, you can get started on that as soon as you know what the assays are. And that's going to take a lot of work. So yes, you won't have data to run QC analysis. The data will be waiting for that too. But I think you will have your hands full in the first year. So it's processing analysis of the data expected to be done by the track. Well, yes, in collaboration with the DAF. And I don't know if Ajay wants to say something about how that shared responsibility and exploration of the data will kind of work. The drag is not going to operate independently. It's not going to say this is our final pipeline, that this is the analysis procedure. They're going to have to do it collaboratively. And we're really putting this resource forward to the DAFs to assist in really exploring the data. Yeah, I mean, in general, I'll use slightly different words to say what Colin said. I mean, a lot of these things are going to be consortium-wide discussions, optimization of pipelines. This may be more easily achievable for the more familiar types of assays than for other types of assays. So we are going to get some amount of imaging data. And with imaging data, there are a bunch of challenges like cell sedimentation and so on, that are going to take a lot of work to resolve and maybe also think about making available the images themselves and get other people to be able to process data and things like that. So there are going to be a range of questions with the range of different answers that we will be addressing during the whole consortium five-year phase. But in general, what Colin said, we will make these decisions as a consortium as to what is best, which pipelines are best, how you bring experience from other work that you have done and other work that other people in the consortia have done, outside the consortia have done. Yeah, so I mean, Colin put a lot of effort into using micro CT on embryos. So we had volumetric data. So there was a volumetric, you know, encyclopedia reference encyclopedia. So we could do automated analysis of that. That was worked on very hard by a lot of people. The histology data was handled by manual curation by experts by a team of histology folks who got themselves organized to do that and use controlled vocabularies. I would say the laxie expression data is a little more ad hoc, you know, for lack of a better word. You know, so there's a variety of different challenges once you get into image and data, for sure. I think we should give it another couple of minutes for additional questions about this RFA. And if we don't see any, we'll open it back up to general questions about any aspect of what we talked about today for the last few minutes. And based on the flow of questions, we will keep going or we'll wrap it up depending on on the volume of questions. I wanted to remind everyone that this is not your only opportunity to ask questions. If you are one of those hesitant people to ask questions in public, you can send us a email. I think we have another question for Colin here. Any more details on what or how much external data to integrate? You know, so we're doing a pilot. We're trying to explore what's really useful. I think we've made it pretty clear that we have sort of open ended requirements. So this becomes sort of a question of review question where it depends on your argument that you justify this data is vitally important to integrate this other data not so important. So we're, you know, counting on you to put that forward in your application. That being said, I would opine that copying an instance of a database is not, doesn't sound like the greatest idea. We have really robust APIs with COM, our INPC database. And we know other people use those and they set up queries on their own on their own portals and they just come in and grab data out of INPC via the API. So I would think a live federated system like that would work better, but defer to data informatics experts. I'm looking at you, Ajay. Yeah, I mean, I, yes, there is really no point in replicating another resource within through ETL or any other method into the more fake resource. But there are things to be learned from all the other resources. And one can potentially write applications and algorithms that do integrated data analysis to some extent. And if we have to worry about providing those sorts of data sets within some cloud environment or something, there are other methods available at NIH to make that sort of thing possible. So that's going to be part of the learning experience. I mean, there's obviously a lot of large scale screens going on. There's the RAID screens going on. There's databases for all that. There's functional variant screens going on. I think there's a lot of stuff out there that is ripe for integration. I think this next question is for you too, Ajay, in the Q&A. Yeah. So the question is, how do you see prioritization of data analysis applications in terms of their complementarity versus quality? I'm not sure I understand this question. Let me take a particular interpretation, which is what I think this question is about. If the question is about how the reviewers are going to react to an application to figure out whether or not that application's data analysis method is directly applicable to a particular data production enterprise, then the answer is nobody, the reviewers are not going to know. So I think the general expectation is that we have Colin and Adam and I and the FOAs have talked about what you can expect generally to be the types of assays. And we should try to focus on what your core strength is in being able to address analysis of those types of assays. I'm not sure that I have answered the question. Maybe there is an interpretation of this question that my colleagues have. Or please clarify your question. Ajay, I read this as being, and the person who asked the question should please clarify if I'm wrong. As being a question that's related to the selection criteria that we discussed way back at the beginning of this whole presentation. So I agree with Ajay, the reviewers won't know which other components are funded or not. And it's not a review criterion. So they shouldn't be using it anyway. They should be using the quality of the application and the review criteria. But there are selection criteria after that. So it's a two-stage review process. One is the scientific review that assigns a score, and the other is the council level review. And in the past, NHGRI programs where there have been a big response like this where there's been a big response in particular have done things like select between similar scoring applications based on things like diversity of approach. For example, not to have multiple grantees doing exactly what we propose exactly the same thing or very, very close to the same thing. So that there can be circumstances under which the selection criteria would push things that way. Adam, it's worth pointing out to people where they could find the selection criteria in the application. I mean, in the FOA. In the FOA. Well, I have to. I have to. I can do it. It's under section five application review information item two review and selection process. I just want to clarify the statement that's there in the Q&A box, which says if multiple applications are focused on the analysis of the same types of assays would only one be selected. It was addressed. So I just want to be sure I it might that might be the case if they really were very similar applications. But for example, if they took different approaches to the analysis of the same types of assays, then they might be different enough. I think it's too. It's too. It's too difficult to sort of game out all the specific circumstances to make a decision tree here. So I can't give you that that level of detail in advance at all. But it's the kind of thing that we think about when we get to the fund exchange. But I'm going to give another two minutes in case another question appears if no more questions appear in two minutes. I think we can wrap up. Sean, maybe it's worthwhile putting up the first slide with all the links and the email addresses and stuff. All right. I have not seen any further questions. Again, you are not limited to this form to ask those questions. Please use the program email contact to ask questions. And I want to thank everyone for attending and for all your great questions. And we are looking forward to seeing your applications. Any final remarks from Colin or Ajay? No, I'm fine. Thanks very much, everyone. Take care.