 All right, but why don't we move on to the next? We're now going to move into the first of three concept clearances. And the first one is by Elise Feingold on the future of ENCODE. And I just want to say that DD Meldrum is on the phone from Lyon, France. Is there anybody else who's on the phone? OK. OK, so I'm going to give a summary of the concept clearance we have for three RFA's for ENCODE. Hopefully, you'll have good context for these after my talk this morning. So the first RFA, well, first of all, I want to say that we had input from several planning meetings. The first was on the genomics of gene regulation. Planning meeting as part of the broader planning for the next strategic plan. As I mentioned earlier, we had an external consultants panel in the course review of the ENCODE and modicode projects last April. And then we had a lot of relevant discussion at the Early House strategic planning meeting last July. So to summarize the collective input, there was recognition that after the projects of modicode and ENCODE projects are finished, we'll only have interrogate a small fraction of the cell states in the organism. So the long-term goal of completing the catalog of function L, which still be a priority for NHGRI, we had to think about the fact that the data needs to be generated at a reasonable cost and in a reasonable time frame. And we need to consider what work requires a consortium infrastructure versus what can be done in uninitiated work, investigator-initiated work for the community. There was a recognition that current technologies were not sufficiently robust to really totally complete the catalogs. There were some limitations and reagents. And there was also a need for enhancing data analysis activities. So we use this collective input in developing these initiatives that we're going to be talking about now. I should also say that at the February council, we addressed one of these limitations, the concept clearance in technology development. And Michael Pazin gave a concept clearance that was approved to talk about R01 and R21 pilot projects for improving the sensitivity and cost of being able to identify functional elements and also stimulate development of better methods for high throughput biological validation. So that addresses one of the issues that was raised. So these next RFAs are moving forward with the rest of the program. The first is on expanding the encyclopedia of DNA elements in the human and model organisms. So in constructing this plan, we were taking into account several goals that we thought were important. And the first is that we felt there was a need to balance technology development with continued data production. We feel that in our experience is that it's important to have both technology development and production that fuels progress. We wanted to expand the code projects to generate as complete catalogs as feasible with the current technologies, limitations of current technologies. We really wanted to capitalize on the progress that has been made in establishing high throughput and efficient production pipelines through the support of centralized production efforts to take advantage of economies of scale, centralized management, and centralized coordination. And so in this initiative, we really wanted to focus some of the data production and analysis efforts to really maximize the utility of the resources and thinking about the most useful data sets to collect. So the proposed scope of the expanded in code would be a big emphasis on continuing annotation of the human genome. This was felt to be a very high payoff project useful for the community. There's a lot of new RNA data that's now available that will fuel the continued annotation of the human genome. We also wanted to expand this annotation of the mouse genome since this has proved so important for the human project. We also wanted to expand the repertoire of data types in the code in particular emphasis on more classes of RNA molecules and functional elements within RNA molecules for the human and mouse genomes. It was clearly when the code project started four years ago, we did not have as good appreciation of the different classes of RNA molecules that we do now. And so we feel this is an important area to expand it. Then we also wanted to take existing data sets more deeply in the human genome with more limited studies in the model organisms fly and worm and the mouse. And these deeper studies would include mapping binding sites for all transcription factors using at least two cell types for each new factor. This is a really large, big limiting step in encode is getting reagents to map all the different transcription factors. And so we felt that if we can get at least some information, at least on two cell types for each factor, that will go a long way in increasing our knowledge of the binding sites and finding functional elements. We wanted to go deeper in cell types mapping sites for open chromatin, mapping selected histone marks, and other relevant chromatin proteins also more cell types, and then mapping sites of DNA methylation and additional cell types. In terms of the genomes to be studied, we feel it's important to have a primary emphasis on the human genome, secondary emphasis on the mouse genome, and have a reduced emphasis on C. elegans and Drosophila melanogaster genomes. The fly and worm projects, the monocoat projects, have been very successful, as I discussed this morning. We recognize that they are not complete at this point. We feel that they've provided a firm foundation for the research community that now actually has there's much broader access in the community to the different technologies that are being used. And we feel that individuals can now have the ability to ask more biologically focused questions with these technologies, and so we feel there's a reduced need for consortium infrastructure. And we'd like to actually use the monocoat catalogs and the model organisms to, for the next questions, we want to ask about interactions of functional elements and the regulation of gene expression, and Peter is going to talk a little bit more about that in the next concept clearance. So we still think there's a lot we can use these model organisms for, but we'd like to shift the emphasis of those a bit. So that was the first RFA. The second RFA is a data analysis and coordination center for ENCODE. And we'd like to support a single centralized database to serve as a data coordination and analysis center. So this would be the data coordination involves housing and maintaining the databases to track and store and provide access to ENCODE data. We want to provide informatics resource to ensure consistent data analysis and to facilitate integrative analyses, and to work with ENCODE's analysis working group to identify the types of analyses needed to perform unnecessary data transformations. So what we're doing here is actually consolidating efforts. There was a data coordination center for ENCODE, data coordination center for MODENCODE, data analysis center for ENCODE and data analysis center for MODENCODE. So there's four activities that we're now proposing to condense into one, which we feel should result in operational efficiencies. And one of the things that has been important to us but we haven't really stressed enough is the analysis goal of trying to define what we're calling a minimum set of elements or marks that are needed to identify a unique molecular signature of the cell to optimize data generation by ENCODE, as well as other related projects. There's a lot of interest in disease studies in generating all this omics data. And we'd like to actually provide some guidance as to what are the most powerful data sets that can be generated, because we realize this will be very costly for all the different studies that are going to be undertaken. If there's any way we can help streamline that, we thought that we would like to try to do that through ENCODE analysis. And then the third RFA is computational analysis of ENCODE data. And this is sort of a new idea. We like to enhance data analysis activities beyond ENCODE participants. And this is something that we've heard from our advisors, various working groups that we really want to make sure this data gets out to the community and that people are using it. We'd like to bring additional people into looking at ENCODE data. So we wanted to support individual research projects to use ENCODE data for studies that combine ENCODE data with related functional genomics data to drive new biological insights, to use ENCODE data to improve on the analysis of disease mapping studies, and also develop new methods to improve on the analysis interpretation of ENCODE data. This slide outlines the mechanisms of support and proposed budgets for these three initiatives. The first one under the ENCODE production, we'd like to continue the U54 Cooperative Agreement Center mechanism. We're proposing somewhere between $15 and $25 million in total cost per year for four years. From this contrast to approximately $33 million that we're spending now in modern code and ENCODE, and we're anticipating, depending upon this level of funding, somewhere between six and eight awards. And we'd certainly like your input on this target number. The second RFA, the Data Analysis and Coordination Center, we'd like to support through the U41 Cooperative Agreement mechanism and putting in approximately $3.5 million total cost per year for four years. And we anticipate making one award. And lastly, the computational analysis of ENCODE data. This will be a U01 Cooperative Agreement mechanism. And we've set aside $3 million total cost for three years and anticipate six to 10 awards. Belia, that was it. Yes, so I forget those numbers. It's about $5 million now, I think. For a year, it's about 1.5 for the Data Coordinating Center, about 1.3 for the Data Analysis Center. That's for a year. That's for human, right, Peter? That's for human. Right. And about the same for mouse. Mouse is covered by the human. No, I'm sorry. Modern code is about the same. Modern code is about the same. That's total. So one question, certainly, is whether this is enough. I mean, we'd certainly like your input on that. Can I just ask how you envision the distribution of responsibility or activity between the data analysis and coordination center and the computational analysis of the encyclopedia? So there's an overlap between the charges for the second two groups. And I'm just curious how you're going to manage that overlap. I'll start, and then I'll let Peter correct me. So the RFA for these U01s is really meant to augment the analysis activities are going to be supported through the Data Analysis Center. The Data Analysis Center does a lot of activities that really organize the analyses, that organize the data. It helps working with the analysis working group to identify what analysis should be done. But I'm going to turn this over to Peter, and he could expand on that. I view the data analysis and coordination center as responsible for doing the work that's required to generate the encode product, being the derived data that the community will use, and an initial integrative analysis of this data to show that the power of what it's being used. And this is going to be run not necessarily by the Data Analysis and Coordination Center, but it will be part of the analysis working group. And it's the analysis working group, which represents all of the production centers and the DAC, the DACC, that will do these types of analysis. Whereas I view the third part, the other analysis centers, as not being as tied necessarily to being responsible for generating a product, but to do analysis of where you can take that and go further with it. Either by combining it with disease data sets or by just methods development, which may feed back into the analysis working group, but it may not. They're not as tied to this idea of generating a product of what we think that encode should be. Claire? I'm assuming that the $15 to $25 million that you have under the first initiative there is what you believe you need to complete this project and use complete a couple of times in your slides in quotations. And so it would be helpful, I think, if you could clarify what you mean when you say complete. And do you think that the technology has reached a point where it would be necessary to have this amount of money devoted for four years? Or is there the possibility that you could get to completion sometime before the end of four years and shift some of that money into the computational analysis? Because I see that as being a particularly exciting aspect of this whole concept clearance in that. And it just looks like if you're contemplating six to 10 awards, $3 million per year, that's not a whole lot of money. I think it's very hard to predict how close to completion we're going to be in four years from now. I think that we're certainly working towards that. But I think we're going to have to come up with a working definition of complete and know that we're not going to really get there. I think the big challenge for encode is probing enough different tissue types, really getting to single cell analysis, which is something we're pushing in the technology, RFA. But I think it really depends on the scope of what we end up funding. I think you can be as wide as you want to be with all of these different elements. And that's why we're hoping the analysis will drive down and give us this idea of what a minimum set might be. What are the most important elements to be probing? But I think that's something we're going to be working on, is trying to define a reasonable endpoint. Rex? So as I understand, you're trying to roll the human and model mod encode together. Could you just talk a little bit about how you imagine that to scale proportionally? And could you talk about whether there are any sort of cultural differences between the two communities that might make that more complicated? Yeah, so I think part of our nonspecificity of how we're going to do that right now depends on how much money we have. That will certainly determine how much we're able to fund in mod encode versus encode. Our feeling is that there's so much biology that the model organism communities can now do with the technologies and with the data that they have that's really reduced the amount that we're going to fund for these large consortium activities. We haven't put a specific number on it, but it's going to be a very small fraction compared to encode, which will be unpopular, I'm sure. And I would say that I think that we need to find those activities that are best done in a consortium and focus on those very high priority activities. And then, as Elise's point out, I think that the model organism communities, they actually want to do a lot of this in their own lab, because they want to do it on their strained background. And this data may help to some extent. But in the end, they'll become limited. And just to expand on that point, I think the types of data we're thinking about being very useful would be those that can increase the quality of the data set and also fill in gaps. Mike? So Elise, as following up on Howard's question, the total it sounds like is sort of a 40% cut for data analysis and coordinating center activity. And so I was curious, do you think that's going to come from the efficiencies of going from four down to one? Or are things sufficiently reduced to practice for the two projects that there's less to sort of develop in terms of approaches? Or where do you see that as coming from? Because this does run counter to some of the things we've been saying earlier on today in terms of more funds for analysis and computation. Great. We do worry that that might not be enough. And Peter, do you want to respond? So I think there will be some economies of scale, because if you think about what modern code and encodes support, they support somewhere on the order of 20 PIs. And they also support, I don't know how many co-PIs. And you talk to the DCCs, and they struggle. It's the slide I showed earlier on the data management talk. It's this submission process that is really time consuming and which is driving a lot of their costs. So by having only six to eight producers making sure these are large producers and not fragmented, I think we'll drive the number of submitters down. Shera Lee says, you, that we are concerned about this and we seek counsel's advice. Given that no matter how much effort is put in, this is still, in some ways, just scraping the surface. And you mentioned that the communities around model organisms are interested in taking this up. To what extent will there be an emphasis here on packaging the knowledge of how to do this in a way that makes the methods more accessible to these communities so that it can be taken up very efficiently? It seems to me the technologies themselves now are so scalable, it is feasible, as you mentioned, for smaller labs to do this. To what extent will the aim here be to enable that? Well, one of the activities that are ongoing now with encode and modern code working together to develop community standards for RNA-Seq, for ChIP-Seq, and really describing what our consortium's experiences have been in testing out different sequencing depths, peak collars, those types of things. And so that's a very important part of what we want to accomplish here. And I think certainly by this last RFA, by bringing in additional people using that, we think this will be one way of disseminating at least the analysis of the data. If you have any other suggestions of how we can make this data more useful to the community, we'd be very happy to hear that. Dave? Elise, you pointed out in your talk this morning how many GWAS hits hit into these areas. And that makes me wonder if you're going to be able to do cost sharing. Could you do cost sharing with disease, with other institutes that have their suite of diseases? In terms of doing normal tissues then or? Well, in terms of funding additional work that makes it clearer how all this regulation works. We certainly can have dialogues with other institutes for this. I mean, what we're hoping to do is actually to, as I said before, help those researchers figure out what the minimum set of data would be. And that should help cut down the costs in some respects for these disease studies. But we can certainly have dialogues. I know that Terry Minolia and other people have been talking with other institutes about possibilities for partnerships. We can explore that further. My guess is that we are not going to be able to get other institutes to contribute to the development of a fundamental resource. But that if there are projects that they want to initiate, disease-specific projects that they want to initiate, that use these approaches and want to apply these approaches in their particular situation, then we could either collaborate, maybe even cost share, but certainly help advise them. I'm doing that. Does anyone? All right. This is Dede here. Can I ask a question? Go ahead, Dede. Can you hear me? Yeah, go ahead. OK. I didn't know if it was a good time or not. The video is lagging behind. So when we discussed this in February, Initiative 1 was about technology development and seemed to include the opportunity to have technology development and innovation. Although it did say one possibility under consideration is that there would be to do the completion of the ENCODE with limitations of current technology. And it seems that that's what has been implemented here in this concept clearance in the first one listed. And I'll state it at the beginning to do it within the limitations of current technology. So will you entertain innovative new technology development projects in this, or they will be excluded? There's going to be a separate initiative for technology that we discussed in February, Council. But the grantees for that will certainly be encouraged to work with the grantees under ENCODE. I see. OK. So I realize, yeah, that's a different world. OK. I just wanted to make sure about that. Thank you. OK. Now we're in the dark. Dave? The other point I'd like to make is I just think it's a great idea to push the mouse in this, as it sounds like you are going to do. Because you can just do a lot more perturbations in the mouse than you can. And it's in vivo rather than in some cultured cell. So I think it's great. Does anyone have any other comments about the budget level for the first RFA? Do you have ideas about how you want to distribute that between human and mind? I think we're going to really have to see what applications we get in. But as I mentioned earlier this morning, the primary emphasis will be on human with a secondary emphasis on mouse. Howard? Please don't take the silence as we're happy with that amount of money. It's just that we listened when Eric talked this morning. So I mean, you do what you can. But I think if there is a general opinion that this is a modest amount of money for what's been outlined, then that's worth having on the record whether we can implement it or not. Like a modest amount of money for what's been proposed. But then you can ask the same question about the second RFA. So we've alluded to the fact that there may not be enough. Would the recommendation of council be that this be increased? I am worried about you're putting four groups, I mean, in principle, four groups together. You said maybe the model part of it's going to get sort of de-emphasized. So maybe that's OK. And if you try to think of this as a bucket of money and you're just moving things around, it does worry me that you're putting a lot of emphasis for the success of this project on one group. And you need to make sure they have the resources, especially as you think about the problem that you alluded to earlier in your first presentation about data coming in. There's still a lot of connections that are going to need to be made, presumably. And especially since this has been handled by four groups in the past, there's going to be the fundamental problem of how do you bring those four groups into alignment. Ross? So along those lines, we were talking earlier about trying to get other institutes to buy in. And we've all recognized that individual labs can do the kind of work that's outlined there in the first part. And many, many are. To be really useful, all those data need to be available. And it's better if they're in the same place. It's really better if they've been subjected to common standards and quality controls and all of that. Now, was there the thought, or at least it seems to me, in an ideal world, the data analysis and coordinating center could handle not only the in-code stuff, but the in-code relevant things from other investigators. And if that were to be part of the idea, you'd definitely need to dial up that money big time. One more comment? Well, I think you mentioned it. Just to follow up on that, in your earlier presentation, there was three presentations from in-code and 20 from non-in-code at one of the representative meetings you put up there. And so it sounds like there is a lot of potential for a lot of data to be generated out of R01s, out of foundation money, out of whatever other source. And so if that becomes one of the goals, which I think it is a good one to have, then that will have to be dialed up appropriately. I think having one place, certainly we've seen in other networks where one place allows you to allow them to say no a little bit easier. And so certain types of data just don't go in the database anymore, because if mom says no and you go to dad, if there's two, you know, whereas in this case, there's only one parent. But I think you also do need to be able to accommodate all the other aunts and uncles or cousins or whatever you want to, and how much you want to make. So is it fair to summarize this discussion by saying that as we move along in developing this and as we, the resources that might be available come into clearer focus, if we ended up with deciding to put more money into the production RFA or more money into the DACC or a different proportion of the two that we've outlined here, council would not be field betrayed. I'm not sure I understand what you said, but let me try this. So I think what I heard was maybe if you needed to move some from production into the coordinating center in order to get an appropriately meaty activity there, I think council, my sense was that council would be behind that. I'm not sure council would be behind going the other direction. No, but also if more funds were to become available so that we could or we decided to put a larger proportion of extramural funds into this activity, increasing both, within reason council wouldn't, there was the opinion that maybe 15 to 25 million for the production RFA might not be enough. So I heard the indication of certain amount of flexibility there too. And in parallel is there some plan for maybe not an RFA, but an RFP kind of thing to bring in more R01 requests for some of the smaller groups to participate? Is that part of the, could that be potentially part of? Well, I think that's what the third. That's a more focused on analysis. So we've always opened. You're talking into production? Yeah. We've always opened in code up, but it's, there's a question of what's the benefit to the small users to join and to provide their data. So that's the challenge. I guess what I'm saying, I guess what I'm saying Rex is that it's always difficult to make good recommendations about funding levels when you're talking about programs in isolation. So that ultimately we're gonna have to do this in terms of the overall program and addressing all priorities, but I just didn't hear strong feeling that we should necessarily be constrained precisely by the numbers we gave here. Mark, can I just say that it would be helpful for me and I think a few of us to see some of these numbers in the context of the extramural program in general and if you're complicating shifting numbers to know a little bit more about how that affects maybe our ones, et cetera. Right, we've sort of periodically presented that kind of analysis to council. We haven't done it probably in two years maybe. So we should do that. Right, I just want to say that I mentioned that this, even the $25 million is a considerable decrease from what's currently invested in code and monocoque production. Ross, do you want to say something? Well, back to the fact that many institutes in their extramural program are funding research along these lines and we see the papers coming out and we'd love to have the data as easily available and actually subjected to quality standards as was going on in in code. And I know that these concept clearances didn't really get into it, but I just as we were talking about what that could be doing, we do, it would be very beneficial to the community to have a repository for the data being generated from many other investigators. And I was really expanding the umbrella. And it actually harkens back to a lot of discussions we were having earlier today about the data handling, data storage and so forth. And this is, if it is, if this initiative could, oh yeah, maybe we'll get to another one of my concerns, but we've used the term completeness more than once. And I think this next project will get, I don't want to talk about it until he's gone through this, but everyone would love to have completeness. And if you have more data coming in from a wider community, you're much more likely to approach something that might be considered complete or at least useful. So I would really love to see support for bringing in the data, making it easier for people to submit their data. Oh Peter, you were saying, why would anybody want to join the ENCODE consortium if they aren't funded by it? And that's true, but if we made it easy for people who are generating these data to get the data in and people use it, we are judged also by how well our data are used. I think it would be a real positive. At any rate, that more information you can get into one place, the better your analyses can be, the more complete that the integrative analyses will be. So it's a long-winded way of saying I would really like to see that second initiative expanded and with plenty of support for it. Going back to my earlier talk, what you're basically saying is you want this ENCODE DACC to be a data broker for all functional genomics. Related to ENCODE. I would check it a bit, that concise, thank you. But I guess now I'm slightly confused because you also showed in your talk, you emphasized the fact that the operational cost of interacting with large numbers of groups could make a lot of these things. As a data broker, you would have to. Very expensive, and yet the reason we were trying to do it this way was to reduce the number of production groups to reduce the cost of intaking all that data. But if we then go out and make that same group now almost like, I mean, a data broker, I mean, that's a huge response, a whole bunch of R01 investigators, then you're going to be trading off what they can do with their doubt relative to the ENCODE versus non-ENCODE groups. I mean, you would have to give them more money, but also, since they're dealing with the same type of data, they, the data broker would be easier than just saying, it's my first response to Ross is, isn't that what GEO is for? But you're right, you do need a certain amount of data brokering to make sure the data is in the right format. The quality values are clearly indicated, et cetera. The added complication is all those other R01 projects are not under the same data release standards as the ENCODE projects are. But even if they, even if the data are not put into the repository until they're published, it's still a good thing to have the data in the repositories. Yeah, I mean, I think Peter said it, isn't that what GEO is for? So then the decision as well is GEO, are the requirements for putting your data in GEO sufficient in terms of metadata and other things, would this be a better way to distribute the data? That's a whole nother discussion, but I think you're right that what you're talking about is son of GEO or child of GEO or GEO part de or something. And that's not a bad thing, but we all know what the quality of some of the data is like in GEO and that's not what you'd wanna replicate. Oh yeah, so to follow up on Ross's point, I think that if you were to go down that path, I would suggest you really couldn't do it in a very, in a reasonable way without allocating some of that production money to the controls, the standards and so on so that other data producers know what to do and know that their data are of sufficient quality to be submitted and accepted. Otherwise the DAC part of the job will scale out of control. It's better to publish standards for people to be able to validate against those standards and it's not just published standards, it's the appropriate controls that they can get and test and then know that their data are going to be of sufficient quality and in the right format that they can be accepted and appropriately annotated so that it doesn't create a lot of extra work for everyone. Are there any other comments about this concept proposal? And if not, I need a motion to approve the concept? Let's ask, what are we approving? You're approving this as a concept so that NHGRI can then go ahead develop funding, develop grant application solicitations and publish them and receive applications. So if this vote will be approval for NHGRI to go ahead, proceed to issue this as a program. With the current budget as it was described or with modifications that we discussed in council? With the budget that was discussed as a guideline as a fairly strong guideline but contingent upon what the actual, the institute's actual financial situation is when the awards are made. But what about some of the other kinds of modifications which were not necessarily budget modifications? I mean, there's the concept in principle and then there's where the rubber meets the road and the last five minutes or so of discussion were slightly of a modification nature I thought. I'm just trying to understand which is I think what Jeff was trying to do. I think as we develop these, we take your advice into consideration and we have a reality check of the budget numbers or the reality of what other activities are going on at the institute and in other institutes and we factor all of those things in as we develop these initiatives. We can't go too far in telling you what specifically is gonna be in the solicitation because that will potentially disqualify any of you who are interested in applying. So this has been a public discussion. It's the concept. We've heard the issues that you've raised. We will take those into account in writing the specific solicitation. I don't think you'll be particularly surprised by it when it comes out, but it may be better because of this discussion. I'll move approval. That's a great pace initiative. I'll go second to motion. Is there any further discussion? All in favor of approving the concept? Anybody opposed? Dee Dee, do you want to weigh in? Yes, I approve. Thank you.