 So let's transition into the discussion for topic two. And I guess I don't want to dictate this, but let's give it a little bit of structure and think about this relationship between what's happening in the genomics of disease within the NIH portfolio and the relationship between that portfolio and how that could guide NHGRI in thinking about A-next phase 4 and code and what it might look like and what that relationship is. And I could see in multiple areas, one was a lot of quote follow-up of various signals. The other that Nancy focused on is really the role of transcriptional regulation in disease and the other is several of us mentioned annotation. And I'll just cede this conversation a little bit. You know, there's several of us in this room have spent a lot of time over the last probably three years thinking about loss of function mutations and how we would leverage loss of function variants in proteins and mostly premature stop codons. Can that guide us not the loss of function variants in proteins, but can we think about loss of function in ENCODE sort of annotations or marks? Is that an analogy? Loss of function in an enhancer? Naturally occurring or engineered and how we could use that in the understanding disease, just to cede the conversation a little bit. Maybe I'll call on Daniel first just to sort of pick up on that and then Dana had a comment. I think it's a fascinating idea and I know Tully and I actually had some discussions yesterday about this idea of identifying these regulatory true loss of function mutations which may be challenging. There are certainly people in this room that know far better than I do. What proportion and what types of mutations within these regulatory regions would really result in these types of very severe effects on gene expression. But definitely in the way that we know that we can use coding protein truncating variants to really identify human models for gene in activation in a way that's very useful and you were involved in and we've got a number of projects in this area as well. If we could identify those tissue specific knockouts that would be phenomenally powerful as well to help to dissect out the specific roles of particular genes in a particular tissue context that would be a really powerful in vivo model for gene activation. My fault. I'm certainly very interested in that topic of sort of very deleterious non-coding mutations as for sure lots of people in this room are. I think a better comparison though is not the loss of function is between synonymous and non-synonymous in the sense I don't think there's any mutation and say a TF binding site that's quite the same as a premature stop codon. I mean I just don't I don't think it's the same thing. I think there are things that disrupt the motif and you can see they radically disrupt the motif but I just don't think you're going to see something like a premature stop. Maybe other people will disagree with that. Nancy do you have a comment on that? I mean what about small what about deletions of enhancers? Well sure fair enough. Exactly. I mean deletions that's the classic. Yeah. The equivalent of a stop codon mutation non-coding DNA. The question is how many and what's a total cumulative burden on disease and I would wager that the cumulative burden on disease is going to be made. We know it's much greater. So the point is that I mean we all understand what the problems may be but not all binding sites are the same. I showed you a very small example where they're you know three both three compelling factors but they have very different effects. So I think these kinds of effects exist. So far we've not had a good way of pinpointing exactly which ones we should pay attention to. Despite having encode we need still a lot of work and so having some kinds of examples where we know how to I don't prioritize would be great. Dana's next. So I have two comments. The first one regarding all these functional studies I think they're important but they should be taken in a context of understanding you know to get a code what type of mutations because for genetics we have to be broad we have to look at a lot of SNPs. We're not going to be able to interrogate them experimentally all of them. So we're going to have to think of not what experiments might be good for a specific disease but what experiments might be good to give us an understanding so we can impute which variations and how they might influence transcription and the second comment also relates again to seeing which transcripts might change and relates Nancy's talk she showed on the GTX 30 cell lines just how much we gain and I think one of our problems in using encode and an encyclopedia towards any genetic and disease is it all matters not only what is the gene but in what cell type is it being expressed and I think we haven't even begun to even list the cell types that we have in our body let alone begin interrogating which elements might be relevant for each cell type so I think that needs a very big expansion. So I'd like to put a bit in for Nancy's I heard Nancy's talk as being a advocation for endophenotypes molecular endophenotypes as a way forward now I'd like to rewind at least five years back ago when I settled down I said one of the things they encourage you to do is a systematic study of gene expression in a number of different individuals lymphoblastoid cell lines and you know at that point we couldn't cross the stream of encode and variation without somehow setting off some explosion inside of NHGRI and I think we got a you know I think NHGRI has got to get over that so so I think you know it's you know GTX is a starting point but there's no real reason why RNA is the blessed endophenotype for this. I don't think this so I think Nancy showed one way of using that in disease association I think when you use these things you can also use this for a lot of basic biology as well so you understand which enhancers are doing what and the links and you know how environment feeds into this a credible reason a credible place for memory is epigenetics on the chromatin for how long-lived environmental effects persist in some of these responses it's not the only place so I could imagine three credible good reasons to do molecular endophenotyping in cohort studies that's which is basically what's going on now in effect this is partly happening in the blueprint in blood because Hanks project has that there's another project for you to track which is this stem cell project happening in the UK between welcome trust called Hipsky and the MRCs which is basically doing molecular endophenotyping of stem cells I think that's going to be very useful many 300 300 normal 500 normal healthy volunteers and that's ramping up but I think there's going to be a lot of gaps in the tissues in the endophenotypes to be done there it's a much more complicated landscape than it was five years ago because there's a lot more existing projects so working out what one does when where how is tricky but at the very least things need to be coordinated between encode and these other projects like GTX but I think I think there's much more space there than just coordination this there's really something missing I just wanted to to come back to protein because I I think we have more data today on the transcriptome and so it's possible to do some things with transcriptome that's useful and interesting to do we we know that some of this is about how transcript levels carry through to protein levels but there's no substitute for doing more in the protein space and the the the opportunity to push that technology and get better quality data and more data in single cells in tissues I don't think can be avoided this is this is the fundamental biology that we know is driving disease and if that's our topic we have to do better in the protein space I mean that argues for that argues for shotgun proteomics really let me interrupt here I'm assuming that NHGRI has gotten over whatever phobia existed five years ago okay good because that's off the table evidently there was a distance this was before so maybe you and can help me a little bit there was previously an anxiety about doing a lot of encode type assays and related to it if it's if it's all water under the bridge I'd let's leave it among individuals looking at genetic variation yeah Aviv first then Mike then John I'll try just to say a few words on the protein and transcripts you I think several there are several items to consider if one would want to go after protein levels the first is that without a good annotation shotgun proteomics doesn't work so no matter what one decides to do with the proteins which I think should actually be measured and very important one would be the very good annotation the second is that for certain applications and especially when you want to be quantitatively very precise or go into very large number of individuals or go to single cells and people become a major piece of the puzzle in the third aspect is that ribosome profiling sits in a very interesting it does not give you protein because it gives you protein it gives you a measure of protein translation slash or actually initiation of translation slash production it's not all exactly the same but for most proteins it would give you a lot of very useful information again in the face of good so this is one of those problems that you would have to triangulate with various methods in order to get results that are reliable enough that you could map with and so on but I think we have sufficient number of pieces in place some which can go into production like this and some which will come at the tail and let me get the product yeah we're talking about several kinds of projects here this last one being phenotyping and I think it's an attractive space it isn't in the space in between g-text and encode and I think it's valuable space to explore and and and do I think protein levels is one thing you mentioned on riboseek is another assay that's certainly good and very amenable to high throughput actually another one that's incredibly high throughput is is metabolomic so if you really want to explore the space there's several assays that aren't that expensive that could be done could pick up variants could be mapped on to the genome in a useful fashion does give functional annotation to the genome so there there is that class of things I think you introduce this originally Eric with the idea though of more of other kinds of assays and we've heard about CRISPR and and I guess in the last talk with the you know I guess reporter assays I think that is a useful thing to explore too with all the limitations each has obviously CRISPR knockouts there's problems with redundancy when you're knocking out one element and getting a picture in the assays being sensitive enough if it's only subtle but they're still incredibly valuable do I I mean it all comes back to this issue every one of these assays does have some value with some more meaningful than others what I like what I like about this last set of discussion is that these are real variants that affect people's physiology at some point and so they they certainly should have high impact in terms of you know human disease and human phenotypes on the proteins I want to make make two comments one there is a set of protein measurements that directly intersects the current type of encode data and that is that could directly intersect and that is quantitative accurate measurements of the levels of all the transcription factors for starters okay if we could actually achieve that because that is a variable that has a direct impact and read out on what the other assays are measuring and and it would also be incredibly useful in you know in in a broader context so that's something I think that's fairly concrete the other thing I want to raise and this is something I was going to bring up yesterday and this is this question of what are we not measuring and and what is the role of the encode catalog in determining the things that get measured and so I was just at this meeting in Monterey that was kind of a celebration for Goldstein and Brown's you know Nobel Prize from 30 years ago and there were a litany of just amazing talks on finding you know various aspects of you know starting with with disease phenotypes and then and then coming down and unraveling the molecular basis and one of the amazing things that came up in one of the talks is the fact that there is a class of proteins in the genome that are short peptides that are parked inside long non-coding supposedly non-coding RNAs and these have turned out to have an incredibly important role and they're not annotated systematically in the genome because they fall below the false discovery threshold that was picked for orfs and and here they are they found them and and and you kind of wonder you know these things are 20 amino acids 25 amino acids you kind of wonder how much more of that kind of stuff is out there that's incredibly important when you find it that we're just not part of the catalog at a really basic level and and I think that again just kind of this general topic that if we don't know it we can't go and measure it has to be you know considered other comments on this thread you know there there are obviously many you know many excellent ideas I mean I'm just gonna be very pedestrian for a minute and ask everyone to think what is it that we can do in the next few years with all these literally thousands probably now tens of thousands of variants with respect to many trace that we've already met and and in some sense you know it is calling success and moving on but we need some I mean beyond the basic understanding of what encode and what all these elements do there should be some attempt to sort of resolve the meaning of all of these non-coding variants that we've already found the problem is they're not precisely mapped right well so so that's the problem that you got uncovered there away add them all up you're probably talking about a lot of genome but you have to put them in the context of how they were mapped of course I agree I'm not trying to design the project but all I'm saying is there could be a few examples whosoever as examples they are where it might teach us a lot as to you know of even finding the remainder that we know exist but we haven't yet detected and and I think it'd be some aspects of those project are soluble so are you suggesting that basically there's a parallel effort of selected deep dives that are intended to establish paradigms assays enough basic data in you know in systems for which either the tissues have been looked at enough you know enough has been looked at to then go and find a phenotype where that can be tested out that is what is the you know right now the value of encode is being tested by essentially random picks and sometimes yes we know they may be a potential element but the cell type isn't studied or we don't know expression whatever but it may be helpful to have a few examples could be in the human could be in the mouse or we try to figure out whether knowing the non-coding elements not the full biology we can not only map the identities of the variants we've already mapped but find all the rest of what the heritability of that means I really you know anyway I think that's the problem we need to solve let me make one comment before we go on how about looking at the problem this way lost I think it was at this matrix multi-dimensional matrix cell type assays and environments and somebody calculated 18 million people chuckle haha is is use this comment of you know based on phenotype driven to pick the cells to do the deep dive so you don't do 800 million cells you pick the cells of interest to the community not not the cells now these are cells in a matrix that are driving their interest to the community I would actually somewhat disagree with this I do actually really I mean we obviously need the deep dives but I don't I think the biggest priority of encode and the power of encode is that we're starting you know whatever nice keyword you want to say figuring out the regulatory code or whatever but the thing is we don't actually know what is going to be important right so we don't want to then go backwards and say okay you know we're going to decide this is the most important thing and we'll focus on it we'll find something right I mean obviously if we zero into an important disease and we zero into the variants around genes that we sort of know are implicated we will find nice stories and there's no question that NHGRI should find some funds some of it and hopefully some of the other institutes will pick diseases you know so this is very inspiring right every talk that we've heard today but I do think that the fundamental value of encode and this and what we really could only do through a project like encode is this broad coverage right and that's going to be where the disease I mean even if you don't want the basic biology argument if you want purely disease argument right how we're going to figure out which ones of those non-coordinated variants are functional by looking at which ones affect chromatin right and we see that from this broad coverage of encode data and of course we do need to get more functional we need to look at enhancers we need to think of functional assays but I think in the context of encode we really have to focus on high throughput assays that will allow us to look at these things maybe not perfectly but with a reasonable estimate of their accuracy right and then we would and I think the random actually random checking of basically confidence levels and our functional assays and how well they're working is exactly what we need to do because that will let us know how well things work right so that would be my my big push I think this is kind of echoing what Dana was trying to say earlier but let me get on the other side of the fence just for discussion purposes is the problem with that is you end up investing a lot and doing an enormous amount of work that has no relevance to the human condition and my guess is NHGRI is under pressure to to to translate to continuously translate these resources into human health and disease and and if you we try to look at fill out all 800 million cells oh no many of them will not be relevant to human condition just to quickly answer I completely agree my point is just we don't know how to pick these cell I do think we should pick cell types that are relevant to human conditions I just don't think we should pick a cell right and because we don't know which so we we probably haven't seen the most important on coding there it's that's what we're seeing from G was right like that essentially we're missing the vast majority of heritability okay Aviv and I've totally lost control by the way to to I think to kind of bridge the difference a little I think the assumption that we know which cell is relevant in which disease is not necessarily true in a lot of the complex diseases there might be more than one but it also might just not be the one I think one of the lessons learned from some of the G was sees it wasn't necessarily where people thought it was and I think it's increasingly so as more and more genes are so that's one of them I think that that matrix is not going to be spanned just by heating on a few cell types but it's also not going to be spanned but just sparsifying completely and randomly choosing from the 800 million elements there's going to be some choices to go deep in some efforts to go broad just like there has been before but how these dimensions would work is going to be different in the specific knowledge of what is expressed where if you already know where the variants are mapping in terms of genes is going to be particularly important for that okay air vended and mark by the way it's Aviv said to split the difference I actually don't think that there's a lot of difference I just want to clarify one aspect that Olga said you know that she disagrees I don't think there's much to disagree in the sense that you know much of the history of genetics is nothing but studying you know extreme phenotypes and mutants I mean that's how we've understood the normal so I think I'm not speaking of taking a phenotype or a disease I think we're looking at it perhaps incorrectly as saying we will learn normal biology and then apply it to disease I think the disease itself will informs us about those aspects of the normal biology that perhaps we need to focus on most I agree I mean we don't know how to choose but my point is practically we will be to making choices of those however many billions of experiments so an example can often focus our mind this was a basis of me asking this question a very green yesterday I mean you know the biology is easy to focus on and the clinical you know applications may be easy but finding good examples that can make us bridge the gap I think is now a particular challenge mark so I just want to sort of come back to the thread that was raised by I guess you and in my can Nancy were talking about how sort of encode relates to human variation and you know I just want to point out that what happens all the time we talk about an annotation you've very much noticed in your talk is you look at the annotation then people talk about how it relates to disease variants and so there's always this desire for this connection and really I think the missing step though is how natural variation fits into the middle of that and you know I think it really would help you know encode if we had some sense of that natural variation in the middle or something like that before we connected its disease variation and I should say that you know a lot of the critics of encode we should think about a lot of these critics have really looked at what we've done and said that we've done it without really looking too much at natural variation and evolution and I really think that's an important thing to incorporate very integrally into the whole thing I mean it seems to me you nobody you can avoid the variation issue right it's everywhere and it's now emerged as a major application of encode but it seems the role of the project is to provide the framework for the interpretation of the variation there's just too much variation out there right that's why like people were scared of crossing the streams and exploding right because there's too much and you can't go and measure it all so you know a focused effort to figure out a way to predict the consequence of variation with some testing right enough to show that it works and then deepening whatever other aspects of the project but it can really only be addressed at the framework level I think that removes the question of what's that you know if you pick this disease or that disease on a formal basis the rest of the people are going to complain right if it's not their area but if you just kind of make it known that you're just focusing on the framework enabling everybody few test cases that then then it takes on a different complexion and it's also more trackable Mike first and Dana than you and oh I'm sorry I told you I've lost control well this was I was going to say something that John just said and obviously you're if you focus in one area a lot of people feel left out for good reason now Nancy I think you showed us for type 2 diabetes that there was a lot of sharing of regulatory programs among tissues I think that was on one of your slides you know it's that time okay and and and I would there be value and if we do choose something to focus on focus on these genes that which are widely expressed they're still regulated and their regulation is important but that the that the there are multiple cell types that are sharing these regulatory paradigms so that might be one one sort of that's my compromised position I'm offering so I guess my thoughts are that if we chose several key systems that would it could impact a lot of the discussion going on now for example obviously the cardiovascular space is huge and if you pick something you know like cardiomyocyte differentiation as a good model system I think if we picked a half a dozen key systems that became part of the next phase of encode where we did deep dives it would be a reference for a lot of things could have impact in many ways could bring in all the variation things we're talking about as well and then you just pick some of the most important disease there is now I granted there'll be a tussle over that but cardiovascular disease you know number one killer something in that space is reasonable diabetes would be great although until recently I haven't heard about good systems for being able to measure things so there's just some cell based systems that makes that that part tricky but you can you know one reason we picked liver was because you know liver diseases are out there and and now you can get a parasite differentiation going you know an IPS cell so I think you could drill into a half a dozen of these key areas that encode could do a deep dive on be a great reference for all the reasons that Olga and others said that would help the community you can finally get dynamics in which I think would give us some basic principles that would be really great in this case probably developmental dynamics and that's fine and then I think also you can roll the variation in either in collaboration with GTX or somebody else and you know possibly add other assays I certainly would like to see that and I think you can marry all this together and have a half a dozen reference systems that would just be hugely impactful and many to many many individuals around the world many scientists in many respects. So let me ask Elise a question I was actually asked her there but she's how much co-funding is there of encode from other institutes and could is this a way of I believe Eric uses the word partnership frequently is this a way of forging new partnerships to help support the future of encode by these deep dives relevant to particular conditions. So currently there there is no co-funding from other other institutes but clearly as it would be an opportunity for forming bridges where we have had conversations with with specific institutes or interested in learning about encode data but we haven't moved into concrete ideas for proposals. I actually want to make a new plug for natural variation not in terms of we need to catalog it and characterize it or for its own right but we've been discussing a lot how we're actually going to see what's functional what's not the limitations of all these functional assays and natural variation is a very powerful natural assay to figure out what's functional and not functional things that nature has already chosen it chosen itself and we could look at impact without having to engineer ourselves of a lot of variation and where and how it impacts different regulatory elements and different epigenetic elements and we can actually that's that's one way to assay it rather than all these sort of CRISPR derived or whatever genetic screens it also another advantage of it is it nature chose these variations to do something they're subtle it's not a hard deletion of removing an entire element so we can actually get quantitative modeling in from such natural variation it's a actually very powerful way to understand what these elements are doing so I was just gonna we haven't in this topic mentioned model organisms and I just want to point out that there are these huge advantages in trying to work out how about building effective models in these organisms they really give you opportunities that it's very hard to match in other species so you know Drosophila C. elegans zebrafish all of them give you the ability to do things at such incredible scale and with such incredible detail so in C. elegans that the lineage is remarkable to leverage for this and I think we're gonna now whether this is part of an established program or whether this is just yet another argument to continue to do work in model organisms I don't know but there's a real I think many discoveries about how chromatin components interact and feed into gene biology at a conceptual level will be made in those species and so I would want to see I would I think NHGRI should should make sure that the links to those leading model organisms are still capped in place thank you I think we're going to have to keep ourselves on time here because we have a really packed schedule before everyone has to leave so I think also that during the discussion right after lunch we're really gonna have to get concrete in terms of some of these recommendations and I am seeing some themes already between the different topics so I think it's it's gonna be very feasible but I want to thank all the speakers this morning we're gonna have a quick coffee break