 Sometimes when we look at these diseases, they're being used as models for examples of common disease or rare disease or whatever. So do you see it as being modeling of a disease or can an investigator come in with a very, very specific disease focus that isn't necessarily relevant to other disease types? Because I think that makes a big difference. Yeah, so I think I'll start. We haven't actually discussed this in depth about what types of diseases we study. I think we would like to see these as models for other diseases because we do want to develop a generalizable approach. But I think we'd be open to a particular disease that was particularly compelling because of samples or other existing data. Having said that, I think we are especially interested in the generalizable components that could be learned. And while one might study two very different disorders, which have very different biology and phenotype, with respect to how best to study the genetic role of that disorder, how best to learn which variants are important, there we think there could be a lot of generalizability. Eric. So I have a clarifying question that other people may have, but I'll ask now. What's the difference between the computational analysis centers and the data analysis center in the coordinating center? I'm confused about that. Could you clarify please? Yeah, I'll start and then I can turn over my colleagues. So the data analysis center is really going to be focused on creating the encyclopedia, analyzing the data that's been submitted, and figuring out how we can actually annotate the genome for identifying the functional elements. The computational analysis groups are really the best ideas from the researchers about how to enhance, and you'll derive biological insights to learn about disease, develop new methodologies. I had a similar question as Eric in general scoped there. So the complexity of the data will only increase from this because you've got the functional characterization and so on. And the computational, the new scheme in computational is investigator initiated, so that's very good. But you're focusing, there's less in the data analysis center, right? The discussions from the working group and other, is there sufficient analytical horsepower behind the data generation given the greater complexity? Right, so I think we were considering that for recognizing that ZORI pretty large budget, and some of this analysis can be, we want to get the data out there so the community can analyze it, but I think your point is well taken that we do have new data types coming in that may require different, kinds of analyses early on in understanding really what the data is saying. So I don't know Dan or Mike, do you want to say anything? So I would just add to that that part of the encode philosophy is that the data are out there as a resource for the community. People use the data in very different ways. So part of the analysis is use specific individual users say I need to get this out of the data and then they extract it. We don't think we could pre-compute all of that for everybody. We do some of the basic data processing like uniformly process the data, call elements and so forth. So just a general comment. I want to express my enthusiasm for the transition from cataloging to using the information in the translational aspect. It's challenging. It's enormously challenging. But I think you all and the encode investigators and the advisor, you know, the workshop you've had is pushing that envelope as much as you can. I think it's greatly beneficial. Then one other comment is you emphasized in several places of community engagement, which again I think is very important going forward. But the emphasis tends to be on samples and data. The community has samples and data. I would encourage you to have focus groups and get questions from the community. So the encode investigators, whoever is funded in the next round, are working on relevant problems for the community. And again, that's going to be challenging because investigators tend to be focused on a particular tissue or a particular assay. But again, I think as much as it can be pushed to work on problems that are relevant for the community, I think is very important going forward. And I'll just put in a shameless plug right now for the users meeting that we're going to be having at the end of June and beginning of July that Eric referred to. I think part of that is we do want to get input from the research community about the utility of the resource and how we can improve that. And people are interested in that workshop. You can go to encode2015.org and register. It's my shameless plug. So can we now move on to go back to the specific initiatives? Rudy, how would you like to do this? Are there specific comments or questions about initiative one? So I like this very much. And I just would ask that the focus on mouse I completely understand. But I would be more comfortable if it was written in a way that if somebody came in with an interesting idea that wasn't mouse-centric and they could justify it and it was good science, that that was enabled rather than restricting it to an organism. Because you're trying to capture more ideas. I completely understand that there's a lot of data in there, so it'd be hard to imagine that. But I always liked the flexibility that someone could put an idea in that we hadn't thought about. So are you saying that we could expand mouse or are you saying we'd be open to other organisms? I think somebody could come in, I would say, other organisms if there's a really, really good justification. How was that worded in the past? It had to be that the new model organism had to somehow enable discovery in the human? In the previous round of ENCODE what we did was we didn't have it wide open. We had the initiative in 2012 was for first priority for human, second on mouse, and then continuing modern code fly and worm if it was very well justified. But yes, it didn't have to support the analysis of the human genome. Bringing in additional model organisms creates a whole lot of other challenges for the data coordination and analysis center. I don't know if anyone would comment on that or other knowledgeable people. I'll be more specific. I think zebrafish is becoming a more and more common model that's being used to rapidly assess functionality. So I just feel like that would be another place where it could come in. But I mean, I understand it makes a complex environment, but I just don't like restricting if we can avoid restricting. Harold? Okay. So I think where that comes in, Howard, is more on the functional characterization of the elements. But I don't think you're saying that you want to create a functional element catalog. You want to add another organism to the catalog part of this. So I think this initiative is more about defining the functional elements. So because of the amount of funds, if you put that over too many organisms, now you don't do a deep enough cataloging effort. But I think the characterization part did include any model organism as long as it was fully justified. So I think maybe these are different. So I understand that. I just always find it uncomfortable when we restrict it to something. So I don't know how somebody would go in and map this and make this effective. But if you don't allow ideas, we don't have a chance to get the ideas. I think you can say that it's, you know, I think you can put, there'd have to be some really special reasons why there'd be another idea. So how would you do that if you had one that was doing one data type in one organism, one that was doing another data type in another organism, and then you want us to fund five different organisms and competing with a human and mouse going under the same egg-cack. It seems a little daunting. My concern also is that, so I understand your point, but my concern also is you don't want people putting a lot of time in writing for polls. I mean, what is fully justified or well-justified? And I just worry a little bit about that part. On the other hand, I can, you could put in here any organisms eligible as long as it's really well-justified. Then they have to contact the program staff to really see if they can pull it off. But I would put in here strongly encouraged to contact program staff before you go down this road if it's other than mouse and human. Perfect. We have a couple of our council members who we had wanted to lead this session, Jay and Eric. Jay, did you? I can chime in. So I tend to agree with Carol and specifically for that reason. I think if, particularly that we're already at 90-10 human mouse, which I think is something that merits discussion, further diluting it is challenging. And also if there's a programmatic desire to not fund more than human and mouse than probably having people put in that work doesn't make sense unless they contact the program officers first. So I'll just make a more general comment, which is I thought the workshop was outstanding. I was there, but I wasn't one of the organizers and just hats off to Carol and Eric and the other organizers of that. And then also to Mike and Elise, I feel like the concept document that came out here really closely echoed or matched I think a lot of what the main themes were at the workshop. And you did a great job, I think, of capturing it. On this first concept, I think, or this first subpart, some of my questions that I think I still have, which I think have come up a little bit before, but can you remind us from like a 2012 perspective when the funding was at the same level, is this a smaller pot that's going to mapping relative to the other parts? That's one. And then two, I was wondering if, and a little trouble remembering, are there specific functional, specific data types that you have in mind that have not been captured to date that are high on the priority list or is it more kind of we want to keep things open in case people have a great idea? There's a lot of questions in there. So for FY 12, the data production and mapping centers had $22 million in activities that were ones that were continuing on through. We had a couple of bridge fundings, where we actually closed out actually some, or we can bridge some of the modern code projects, so the money was a little bit more. But the $22 million was for the mapping. Then that went down actually to 19.2 and then currently is 18.5. A smaller fraction would be devoted to mapping program. And then I guess the other question was whether you had specific activities or specific assays in mind to consider adding or whether it was more of an open call. So I think we are open to new ones. I think in the concept clearance, we specifically did talk about mapping all transcription factors and at least two cell types. I think that was laid out there. I think we're definitely open for new technologies, new methodologies that maybe have better information contact, higher throughput, more cost effective. We can imagine supporting more of long range interactions because we don't support a lot of that now. We know that that's very useful. So we're open to all of those. I mean, the field is rapidly changing. So I think we would be foolish to be too narrow to start. Yeah, I just wanted to echo the comments of Jay about the meeting. I thought the presentation captured very well what was discussed at the meeting. I also wanted to comment on Howard's concern. I think the idea of using other model organisms within that functional category is wide open for the second thing that we'll talk about, the second Parfait. I think within that, it's fairly broad. I think that would probably be the place to really explore model organisms which I highly support. Within this particular mapping, the prescription of some ratio of mouse to human, I think it's difficult to try to get your arms around that because some of the disease focus, for example, in the mouse might be fetal, right? So you might have transgenic mouse models that mimic a disease and you want to use that as your model system, but if that's in fetal, that wouldn't be part of the plan, I guess. So I just wanted to throw out the idea that both the mapping resolution as well as the disease models could encounter stages of development which might not be covered in the existing description of the center. And that we should be wide open to those kinds of possibilities. One second. Bob Nussbaum, are you on the phone? Did you have comments? I am on the phone and I had no issues with this first functional element mapping center. Most of what I have to say is about two, three, and four. Okay, we'll bring you back then. So I find the proposal, the concept very interesting. I'm daunted. I find daunting the dimension, the infinite dimensionality and I'm, you know, several times during the write-up you have this very carefully chosen phrase about bounding the space or something like that, right? And so, but I guess I'm wondering and sort of, you know, executing this, how you actually deal with the infinite dimensionality of things. So you just mentioned it would be great to map all transcription factors into cell lines. Well, okay, there's an all. Well, just to point out that very sentence sort of illustrates the problem. And then if I think about, okay, and you're going to convey, you're going to translate that into a disease space. So I'm sort of wondering what is a disease sample? Is a disease sample a cell line? Obviously there are only certain diseases from which you can meaningfully derive cell lines. We're talking about tissue samples. How do you map transcription factor binding sites from two cell lines? How do you actually connect that in any meaningful way to disease studies? And so I'm just, I'm very enthusiastic. But I just think about operationally. How do you reduce the infinite dimensionality of this thing into something that applicants and reviewers can handle? Yeah, I think that that's a really great question. So the idea of this sort of very, very large matrix, it's overwhelming as we've come up against before. And this was discussed in the workshop. And one of the ideas that was brought out was that if you could sort of aggressively dimensionality reduce the matrix and identify different elements, key cell types, key disease samples that would sort of maximize your ability to impute to other other spaces of this matrix that that would, to make it more generalizable, that that would be a sort of conceptual direction. You go, I mean, how do you do that practically? That's a research question. There are a couple ideas that were sort of brought up at the workshop. One idea was just to get the data where you can get the data, you know, get the samples that are easiest to study first and then model based on that and try to identify areas of the matrix where you can extrapolate to another suggestion was to start out with a modeling component where you try to figure out how to impute to some of these areas of this matrix, see where your models break down, realize that now we need samples in this area and sort of do iterations of that to identify regions of this matrix that you might fill in. But this is a great question. And I think the idea was that we would have the applicants make a proposal and then the group would come together and strategize, you know, in the mapping centers but also the EDCAC as well. Jay, you wanted to, oh, I'm sorry, Jaden and Carol. So the bounding questions Dan pointed out came up many times during the workshop and I think in the end there were many different viewpoints in the participants about how that bounding could happen. And a lot of it was dependent on the biological question at hand and so I think in this concept document the issues brought up without a specific solution because the idea would be that the investigators would have to justify that what they're proposing has a reasonable bound on it rather than imposing a particular strategy for that bounding here specifically it would be left as a principle upon which the proposals would be evaluated. That was my recollection. Let me just echoing things that I think of sort of circle around this table fairly frequently. We talk about actually, we talk about as a council and actually the language of these clearances talks about the need to sort of expand the tent to bring more people, you know, more investigators into the mix. And Carol, I like what you've just said but I want to flip it around and say just framing this as a grantsmanship challenge. This strikes me as being an enormous challenge for people who are not inside the tent to read the comments that you just made Carol about how, you know, there's so many sort of unspecified variables yet there will be, as always occurs in a study section the right answers emerge, right? And so this strikes me as posing an enormous grantsmanship challenge for people who are not already completely inside the tent to figure out, okay, what is going to satisfy a study section with all of these unspecified dimensions? And I mean people with great ideas and great biological insights having to navigate and chart these grantsmanship waters. It's the classic problem, right? You don't want to define things so tightly that you leave things, you leave people out but yet you have to define it well enough that people can know how to respond. It's a classic problem. I wish I had a specific answer. We can also have a broader discussion of this, I think, big challenge with broader swath of the community as well to come up with ideas. Other questions or comments about initiative? Joe? Let me just comment. I think the user workshop that will have great opportunity to get input about the kinds of things the community is doing and would like to see. There may be unique opportunities. I think that was part of the discussion we had where some systems may be really ripe to study some developmental process, let's say just cells in a dish that are differentiated and there's a disease model that you can overlay on top of that that's been well-validated and that could be a collaboration between those expert labs that have spent 20 years developing that particular model to then put that into the pipeline. So that would be one example. One you could be opportunistic. A five-second rule if no one jumps in then it's time to take an action. So can I get a motion to accept or approve concept initiative number one? In a second? All in favor? Aye. Thank you Bob. Any opposed? Any abstentions? Thank you. On to initiative two then. This is Bob. Could I speak up about initiative two? Sure, why don't you leave this off Bob? Okay, thanks. I had a lot of difficulty reading this document and trying to understand exactly what element two was about and after talking to Mike at length about it listening to the conversation this morning and then reading it and re-reading it I think I finally understand it and the problem that I had I think really just has to do with some language and that I would like to recommend some different language to describe the functional element characterization center. I think there needs to be some it's not really that you're trying to create a set of well characterized and validated functional elements it's that you're trying to enhance the catalog of candidates by characterizing and validating those functional elements in healthy and disease states and that you have to say something in there that's not enough to show a biochemical assay that is thought or is implied to have functional impact from which one could then infer perhaps an impact on health and disease. There needs to be a demonstration of a connection between an element that has a biochemical signature of some kind and a functional activity in health in the healthy state and in disease and this is not different from what I think this concept is about I just think the language needs to be clearer so that people who are perhaps at the periphery of the tent or perhaps even outside the tent can really understand your intent a little bit better. Well taken thank you. I mean for example if you go to the what's new page which is page two under the functional element characterization centers you say it's a new activity targeted at validating and characterizing candidate functional elements well I think that really says what it is that you want whereas on page one on the overview I think that actually gets obfuscated in this statement of the purpose. I mean I guess what really helped me understand this functional element characterization and the difference between the mapping more than anything else was a model and the model that I used to think this through with BCL 11a how you have an enhancer site within BCL 11a that was identified by ENCODE as one of the catalog of candidate functional elements you have the locus control region and the globin locus which had been identified before ENCODE even existed as a both a candidate and a real functional element and it was a demonstration of a variation in that enhancer binding site BCL 11a that resulted in a quantitative difference in the expression of BCL 11a and therefore a difference in the degree to which beta globin switching occurred resulting in enhanced fetal hemoglobin and an improvement in the critical cell anemia because of having additional fetal hemoglobin I mean that connection I think that's a postage child for what you'd like to see in the functional element characterization center it's a beautiful trans acting transacting mechanism I'd like to see hundreds, thousands of those particularly since the BCL 11a one was a GWAS hit so connecting it to the GWAS I think it would be an important thing to try to at least suggest not require but suggest in functional element characterization centers that's it, that's all I had to say so I know when I read an FOA one of the things that really helps me understand what this is all about is the section where they say examples of such research projects could include that's left out of these concept documents and so I think the message we're getting is give a serious thought to how those will be framed thank you aside from the sort of the generalities of getting consented samples how would the use of samples that are specifically consented help or is that just an NIH wide policy is the idea to go back to specific patients or is that part of this RFA or how does that work the idea is the data does not go through dbGaP to get to the data the primary data through the encode for all I think that part is critical I think keeping this data that's generated by encode as utterly accessible as possible even if it means giving up some good samples is important 5 second rule is being imposed can I get a motion to approve or accept the concept as proposed approve second all in favor aye any opposed any abstentions thank you on to number 3 I had a question about 3 and 4 together and I think this may echo what Eric said earlier which is and since I don't need to create software for analysis or work on databases is separating the computational analysis expertise from the database into the data and resources to support analysis is that a clear distinction or should there be more emphasis on the coordination of those two elements Bob I think this is less about separating out the basic data processing and more about separating out projects that are designed to use the resource in different analyses and I would point out that this is the way the current encode project is set up we set out some broad areas that people could put in computational projects either better ways to identify elements better ways to predict their function or better ways to look at elements associated with disease investigators that suggested projects like Robert Klein I want to see if I can use the encode data for disease or others looking like Sundas Kalas Colin Dewey looking to see they wanted to learn are there better ways to map to repetitive elements and so forth so for the computational analysis the idea is these are individual projects that are not the core mission of the consortium because the data analysis center would be doing things that are core mission of the consortium like defining how the basic data processing is done and the uniform data processing is done that makes sense and I think the idea of essentially carving out a place for people with smart ideas and how to do specific analyses for specific projects is a good idea can I pick up that question so I think one of the things that encode can be most proud of is this something you've already highlighted which is the enormous number of publications that investigators not funded by encode have published not just referencing encode but using encode data I think it's fantastic so but when I look at this one then I get in light of that enormous community use of the encode data I get a little confused because I say is this an invitation to users of encode data to apply for funding to do their analysis of encode data and where's the line you see where I'm I think you see where I'm going I mean I so I'd love to hear some commentary on what distinguishes one of the middle encode user from the applicant or program three I think it's pretty standard and we've heard it in today's discussion this tension of how do we bring more people into encodes that we get the best minds participating and then how do we just generate the resource and leave it to individual investigators to use the resource we're trying to do both at the same time but we have balance to bring in a limited number of computational projects that would hopefully illustrate how the data can be used to other users but yes there is this tension of we brought everybody into encode that's one way to to twiddle the dial as Eric would say where we could say we wouldn't have anybody in this and solely leave it to outside users this is the balance that we're looking at I'd just like to add that these groups primarily I think I'm not even speaking are largely focused on methods development and not just on their favorite research disease or project although I think we are open to that in this initiative and also by being part of the consortium they really are getting in on very early discussions about the data and looking under the hood as you will at the data and sort of bringing in an outsider's view about processing data bringing in specific expertise that's been helpful to creating a better resource for the community Joan? I just wanted to comment my experience with this group the current group is that they participate in a very integrated way for example with the mapping centers although their function is to develop these novel tools often times during the conversations what they're doing initially will come up we can use that to validate across all the data sets which ones are outliers for example we've got some new approach to do that so there's a real mixing beyond their own interest with the production groups and ways of looking at that data that really are fertilized by having them as part of the consortium Eric do you want to say something? I find the juxtaposition of initiated and U01 on the same slide is sort of drawing maybe you're not the right person to make a comment on that I can understand why you want to do it that way but it feels like it's investigator initiated all right but the institute is still going to sort of direct the way things happen If I had a chance to edit this slide earlier today as we were having a discussion I would have but it's already on here I can understand that that's misleading I think the idea here really is the ideas from the investigators we're not saying we want this exact work being done but it is an initiative with set of slide funds I would just add to that that with the current computational analysis programs projects they are they have a great deal of freedom and a lot of the cooperative agreement part stems in things like making sure that they follow the consortium practices that they're sharing their software rapidly and so forth as opposed to the consortium or NHGRI staff telling them they need to change their project or this is what it is they do their projects as they wrote them and as they see So as you were describing the function of these groups I now got confused we actually have three groups doing analysis here right we have those in the production centers which historically have been pretty strong and now these investigator led ones now I'm confused about the function of the DAC and the production center analysts themselves So the analysis under the production center is really just to do quality assessment of their own data of course they're free to publish on their own data more than that I think I'm not sure and the DAC is to bring those data together from the different member of different production centers So the DCC houses, accessions the data displays the data and gets it into GO and so forth whereas the DAC the data analysis center is involved in the basic processing and harmonizing the data deciding how that should be done the DCC actually does the computations So I mean these are complex data types and many decisions to be made for instance we wrestle with what's the best way to display transcriptome data you could say this is an expression by Exxon, this is an expression by transcript this is an expression by Gene and once you've made one of these decisions which some people will grumble about no matter how you make it then how do you actually implement that again there's no one best practice So the DAC is very involved in that part setting up the basic computation that should be done with each data type and making sure that the data are uniformly processed to maximize the chance of comparison across groups and data type Maybe that was not the intent when it was written but the way it appears is that the model of download is still predominantly what will be done in this type of project whereas it might not be the case with very large data sets and having people come and compute where the data are so I wonder if any consideration was given to a a different kind of setup in that case if having a separate cornea center in the analysis center would be a good thing to do Are we jumping to number 4? Yes Well it is related because this 3 is assuming everyone will download data which might not be the case but the model that we use now which was suggested by the investigators as part of their application is that not only do we have the data at Geo term of award we have data at Amazon cloud and we also have the uniform processing pipelines at the Amazon cloud so we think we are offering users both options they can download the data they wish analyze it with any tool they have they can use the data in the cloud if they wish they can analyze it with any tool of theirs or any tool of ours we are trying to give people as much flexibility as possible so I apologize if that doesn't come across Sorry my comments related to the next one because we were drifting a bit Let me ask you to clarify how much of initiative 3 is we want high quality analysis done of all these data produced in 1 and 2 versus we want new methods of analysis is there a balance there or is it trying to do both I think it is more focused on new methodologies and new approaches to analyzing data rather than I have got my favorite data set you really have to bring something novel and new to this it is going to be generalizable not my favorite project Other questions about initiative 3 can I get a motion to accept or approve the concept Second All in favor Any opposed Any abstentions Thank you You said you had a question on My question was just a clarification that this group so additional functions that will have to be taken on that wouldn't be for example part of the other groups either the production center analysis that happens there or the number 3 groups would be interacting with the community now to create process their data to work with them that is going to be a completely new function that is embedded in this that really requires a dedicated effort because we do want to capture there is a lot of great science out there these tools are now in the hands of many laboratories but you need to have standards that are in place and quality checks and people dedicated to that to bring that data in to make the encyclopedia more rich It's really open for any other comments I had a question about the outreach to the community and some of the tutorials and teaching I've seen there's some good videos on how to use encode data I think there was an ASHG workshop which I think was very quickly subscribed to fullness if I remember correctly I don't know how the registration is going for your June meeting which is in Potomac just if you could just give me some idea about how the outreach and education component of this is going so that one was announced just recently and so with that shameless plug-in there is still room I think we can have up to a couple hundred people that can attend and I think we're trying to get the word out for that yes you're correct that the ASHG workshops I think were sold out very very quickly Yes it's fairly short notice for the Potomac meeting but I'm just thinking in general maybe some additional efforts with geographical localizations west, middle and east might be considered part of the EDCAC's responsibilities Yeah I think that's a good suggestion I'd also point out that we have plans to videotape much of the upcoming users meeting and we're going to try to put many of those sessions freely available online as tutorials to increase use of the encoded resources Great, thanks In the slide you have, you expect the DCC and the DAC to function as a single entity so why then are you splitting them up into two funded mechanisms and are you expected then to have pairs of applications come in in a coordinated manner? Yes so we had we felt that the expertise for the DCC and the DAC were sufficiently different than to have people up front necessarily come together if we have one application that scores incredibly well for the DCC part and not as well for the DAC part or vice versa and it's more difficult to come up with a rational funding plan and so we certainly like your feedback on this we've gone back and forth it has worked well under the current plan a current funding for our code and so we decided to go along with that for this round I think they are the same you could realize some cost savings in might be easier to integrate because it's often the case that whoever sets the data analysis environment has to try it out has to you know really know what kind of environment they need to set get the permissions right and everything and having two different groups doing that is a bit of a coordination challenge I mean I don't think we see a lot of duplication of effort at this point but we can certainly consider that I think you're ready for lunch so if there are no further comments can I have a motion to approve? second all in favor any opposed any abstentions ok thank you very much team function or team encode whatever you're calling yourselves now so we're going to take about an hour break I want you to please reconvene at 1.30 but before you bolt from the table this is council photo day we need to update the photo that's on the web so if you could just sit tight there's a room and a photographer set aside for the picture everybody everybody gets their picture taken