 While we're getting set up, I just wanted to thank the organizers for actually giving me an opportunity to be here, to learn, I'm outside of ENCODE, always looking in, so it's great to have to actually be on the other side. One of the things that I wanted to focus on, and I've changed my talk 50 times since I've been here this morning, so I kind of might not even know what I'm talking about because there's so many things that stimulated my thinking. So I think one of the goals of my talk is to not only describe ways or ideas or my thoughts about improving the syntax for annotating functional elements in the human genome, but how we can use those functional elements to understand disease processes. And so I really like this quote from Galileo because I think it's very fitting for ENCODE. And hopefully at the end of this, we can make measurable what is not currently measurable. But for the purposes of today's talk, even though we've been discussing that our catalogue of functional elements is incomplete, largely ENCODE is focused on those elements that control gene expression. While I won't have time to talk about it today, I do think it's important to expand that beyond the gene-centric view. And Ross touched on that a little bit. And understanding of these functional elements or at least how to annotate them, where they are in the genome, et cetera, has really derived or has gone very far in the last couple of years due to the development of high-throughput technologies, CHIP, RNA-seq, DNA-seq, the C's, everything. And from those types of data, we've been able to integrate them to come up with some rudimentary definitions of things that characterize gene elements, such as promoters, promoter classes, transcriptional and post- transcriptional elements. But also something that's been a very big focus over the last several years are these non-coding DNA regulatory elements. And certainly we still don't understand what the catalogue is, but one element in particular that I feel has gotten a lot of sort of bang for the buck over the last several years are enhancer elements, or at least putative enhancer elements, I should say. So historically, enhancer elements, even though we've known about them for decades and decades, have been pretty hard to find in the genome. It's a little bit like a needle in a haystack because there were no conspicuous sequence features that you could say, oh, that's an enhancer. And most of what we knew about the early enhancers were born out of genetic studies. We know a few clues about how they work. And prior to, I would say, the studies of five, six, seven years ago, we only knew a handful of examples. And now current estimates, and I don't even know if this is right as of today, suggest that enhancers could make up 10% of the genome. That's a lot of real estate. And I don't think we still understand what the definition of an enhancer is, but I do think that we're starting to build the vocabulary in order to more discreetly define their functions. So I'm going to focus on enhancers only because the way that my brain thinks is usually to come up with a specific example, really think about it a lot and extrapolate from there. Maybe not the best way, but it is what it is. So very broadly, enhancers, as we all know, have gained a lot of attention because they're thought to control cell type, temporal and spatial expression patterns. If we know their target genes, then we can put together very clear gene networks that might really define cell state. And how disruption of those networks can actually cause changes in cell state, for example, what happens in cancer. So over the last several years, there have been a lot of correlative data that we've used to define enhancers in the genome. And some of those are chromatin marks. And so chromatin states have been used to define these through the genome. Accessibility, chromatin accessibility, DNA's hypersensitive sites, certainly transcription factor binding. I think one of the big surprises over the last several years is the observation that you can actually get pull to binding, as well as RNA being produced from these regions. And we still don't know what the purpose of having polymerase transcribing the DNA into these small RNAs is in terms of enhancer function. But clearly it turns out that the RNA levels actually seem to correlate with enhancer activity. I think we still need to learn a lot more about that. But I think that's one of the surprises from the encode work and beyond. So we also know that these enhancers likely form loops with their target genes. However, there are different instances where a loop is instructive. For example, the loop is formed when the gene becomes active. But there are other cases where the loop is already formed prior to the gene becoming active. So I think there's still a lot we need to know. We do know that these enhancers, they bind quite a few transcription factors. They can recruit chromatin modifiers, chromatin regulators, all kinds of stuff that's supposedly important for inducible gene expression. So here are some of the things that I think that we still don't know about enhancers and I use the term enhancer very loosely because I still don't think that we have a really good operative definition. But one of the key questions is how do we validate an enhancer? There are a few assays that we use, but I think that really has to involve not only computational, experimental, genetic approaches, and all of these things have to come together to really understand the function of an enhancer. One thing that often comes up when we try to validate enhancers in reporter assays is what defines the boundaries of these elements. What is actually the minimal sequence necessary to produce a transcriptional output that is physiologically relevant? So I think that in order to address some of these questions, we really need to very clearly analyze transcription factor binding and transcriptome data in multiple states of activity. And that could be by inducing particular cells and asking how enhancers become activated. But also taking multiple states during lineage commitment. I think one thing that is very interesting to me is what is the rate limiting step for enhancer activation? And so for example, in terms of the gene loops versus producing RNA, what has to actually happen to sort of tip that balance? There are a lot of enhancers that appear to be in sort of a permissive state. Sometimes they become active, sometimes they don't. What's rate limiting for that activation? And again, I think we really need dynamic cell state and cell cycle models to be able to start to ask these questions. And I think one of the things that is also very important is that we call enhancers very broadly. And in fact, the data really suggests that there are different classes of enhancers in that not all enhancers are created equal. So I do think that we need to expand the dictionary to delineate these classes of enhancers. And we've touched upon this quite a bit. We also need large scale efforts to really link enhancers to target genes in multiple contexts to understand the impact on transcriptional output. So with that being said, I'm just going to go through a few examples. And so to validate enhancers, oftentimes we use reporter assays. But that requires taking the enhancer out of context and guessing at what that element is in order to produce an output. I would say the models where you actually can look at the enhancer activity at an organismal level is certainly a huge benefit. But most of the time you only get about 50% or 60% that validate. So why is that? Are we just taking it out of context and it needs other elements? How do we better understand what enhancer validation is? A lot of times we use evolutionary data and we use the minimal region of conservation to define what is driving that enhancer. But in many cases enhancers are not conserved. So I think in combination with these reporter assays, certainly if you do see activity it's suggestive, but it's not always definitive. And I think more perturbations studies are needed to really understand enhancer and what activates that enhancer. So just to touch on the different classes of enhancers, this is a nice review that was put out by Eileen Furlong. And it suggests that not in all enhancers are created equal. And certainly the effect of transcription factor binding can't be measured just from ChIP-seq studies in static cell states. So for example, in any of these different models, if you just did ChIP-seq you would see transcription factors binding. And you really wouldn't be able to infer much more from that. But there are examples where you have cooperative binding, which is critical for, in most cases, inducible gene expression. In another case you have more flexible binding where the order of the sites is not that important, but just maybe having some of those transcription factors binding there is important, or some amount of protein-protein interactions is critical for enhancer activity. So I also think it will be of great interest, so we know we can use chromatin states to overall define putative enhancers. But can we use chromatin state maps or DNA's hypersensitivity or any one or several of those techniques to actually distinguish among these models? And again, I think we really need time scale and perturbation studies to better understand how these various models dictate transcriptional output. And I just want to give you a really nice example of one of those models. And this really is born out by early studies by Maneatis. This is a beautiful example of how really discrete protein-protein interactions, it's even the way that a protein is bound to the DNA and, you know, basically you can't even move them half a helix away on the DNA, or else you will not get the same transcriptional output. And you need all of these factors to actually bind cooperatively to respond appropriately to viral infection and to turn on this gene. So I'd be really curious to see if there are ways that we can actually distinguish these types of enhancers from other types of enhancers. And also this type of enhancer where you have very strong cooperative binding, really where the factors are bound on the DNA makes a huge difference, may be more susceptible to variations or to common variants in these regions, can certainly disrupt that type of output or that response. And so that's a good segue where then I would like to just spend the last couple of minutes talking about how we can use enhancers to begin to reinterpret GWAS data. So, you know, this is something that everyone's been talking about, determining the impact of common variation on human diseases and traits. And, you know, another key aspect of that is can we predict functional variants? Can we use these functional variants to predict upstream regulators and ultimately downstream targets? Well, that's a big question and I'm only, I'm going to focus actually on a particular set of traits just to show you a concrete example of how I think that enhancers can allow us to interpret GWAS data and lots of labs and lots of people that are here have actually been using this as a mechanism to try to understand these variants and have been very, very successful. But this is where, you know, we can use in-code data as well as many other data types that have been produced and are available through NHGRI to learn something new about a particular trait. And I think for us, one thing that was really important is selecting a system where you could use all of these data sources to actually get phenotype-level information. So, I'm not going to go through this. Everybody here is very aware that genetic variation most of the time does not overlap protein-coding genes in these GWAS loci. So, what are they doing? The long and the short of it is that actually if you take enhancers and you ask which SNPs fall on enhancers in these loci, you can significantly reduce the search base for functional variants. So, doing this, you know, you can choose loci, for example, that you want to prioritize based on SNPs that might overlap an enhancer that's active in your tissue of choice. And we chose QT interval because we know the target tissue. We know it's a ventricular, you know, trait essentially. And even using unbiased studies where you ask, you know, look at all the enhancers and ask which ones overlap the GWAS loci associated with it, you come up with active cardiac enhancers. So, we think it's a really good way to actually try to fine-tune some causal or functional variants that are associated with it. And also, the fact that oftentimes enhancers can alter, and this can go in either direction. You can either create or destroy transcription factor binding sites. And that these may also help you to prioritize variants that you want to test on a more detailed level. So, I just want to give you a proof of concept example. So, here's a locus that we identified with SNPs that overlapped an enhancer. We then scanned this region using a number of data sets here mostly in heart, in left ventricle chromatin level data as well as DNA's hypersensitive sites. And we identified a SNP that was basically in a DNA's hypersensitive site in an enhancer and kind of was in between sort of this dip in chromatin marks. So, I'm not going to go into too much detail about this other than the approach that I'm telling you just about for one locus is certainly scalable to a large number of loci. So, one of the things that we wanted to look at is whether or not we could predict just using encode type data, whether or not you could detect allele-specific differences. This is binding actually disrupted at this locus based on the DNA's data that we had. And you could actually from get genotype level data from the DNA's reads and you could actually see that there was a predicted nuclear factor there. And if you look at the actual differences in the DHS reads at each allele, clearly there looks like there's something going on here. So, for example, the lower reads suggest that something is bound there where the higher reads suggest that something might be missing. So, you can use this and across looking at a number of individuals, if you start to see that there's a correlation with a particular genotype, suggest that there might be something going on there. So, you can also use 4C. And here's an example of where our particular trait that we decided to use was very good because you can use cell type information. We could use IPS derived cardiomyocytes to look for potential target genes and you can also use these cells for phenotype screening. So, you can engineer IPS cells to either rescue a variant or create a variant and ask what the consequences on gene expression, but you can also ablate that gene and look at the consequence of electrophysiology or other traits that are known to be associated with QT. So, the great thing about it is there's also an organismal model that is really great at mimicking alterations in QT interval. So, you can test these enhancers and see whether or not they actually regulate expression in the heart, which, for example, zebrafish is a fantastic model for looking at action potential duration, which is the cellular correlate of QT interval. And you can, based on the genes that we identified that might be linked to this enhancer, you can start to test these in functional assays. So, we could knock them down and we could show that our target genes, there were three that we found, actually do result in changes in action potential duration. And so, that led us basically from a genotype, a change in a SNP, all the way to an actual phenotype leading to change in action potential duration that may be a function of loss of this particular gene. And interestingly enough, this gene was recently implicated in something called Brugata syndrome that has exactly this type of phenotype in human. What more we were able to gather from this is that this gene also, which is next door, has a mild phenotype, but these two are in the same family. They have the same domains, something called the Popeye-containing domain. And so, we were able to search more loci and found other genes associated with enhancers in these potential GWAS loci that also had these domains. So, that allowed us also to find potential families that were involved in these traits. So, this is pretty much my last slide. One of the, I would say, biggest complaints over the years about these GWAS studies is that the significance threshold is set so high due to multiple hypothesis testing that there's a lot that we miss, right? So, how can you get the best bang for your buck? Well, one way to do this might be to exploit functional elements to prioritize novel disease genes that don't reach genome-wide significance. And so, you could take everything you learn from these enhancers that overlap GWAS loci, and then you can ask, are there other loci that share these same characteristics that are in the sub-threshold region? And so, that's what we've been doing, and it looks like that that's going to be a relatively good approach. And of course, that needs validation, but that allows us to get more information from existing cohorts. So, there are a lot of things that still need to be done. We need new computational experimental approaches to understand the predictive value of these SNPs and enhancers, or predictive value of SNPs and loci in general. And can we use functional elements to prioritize novel genes within existing cohorts? It's going to be really important to connect enhancers to cognate genes so that we can understand how these SNPs might disrupt these interactions or the expression of a gene. In these types of studies, we can start to leverage the GTXN encode data to learn more about how expression might be disrupted. And I think some key points moving forward that would be great to have more data on is allele-specific expression and how variants might actually affect allele-specific expression. And something that hasn't got a lot of attention but I do think it's going to be very valuable are these copy number variations in the genome and how they might affect genome organization and expression. So, with that, I will stop and take questions. And I think that, you know, one great thing that encode could do in my opinion is to support individual labs with unique systems or technologies to close the gaps. And also to provide opportunities for greater collaboration between labs outside encode with those that have this current infrastructure. So that's it. So, I'm just going to ask you that, you know, all of us sort of buy into this model of sequence specificity at a level of enhancers. You know, what is this thing in genetics called neo-centromeres, right? When you don't quite meet the canonical structure. So, so I know we look for conservation and other ways of looking for the sequence specificity. Is it really known of how enhancers can seed themselves? So, I still don't think we can predict an enhancer based on sequence. And I think that's something that's going to be very important to report. Again, we'll take one question at the end of the session.