 That was great, and I want to thank Sharon and the panelists once again. Our next speaker is Brad Bernstein from the Broad Institute, and Brad's going to be talking about systematic survey of human epigenomes. Well, needless to say, it's really a great pleasure to be here on this very special occasion. I have the privilege of representing a burgeoning field that was enabled by the Human Genome Project, if not created by the Human Genome Project. That's the field of epigenomics. So the field really is about the study of physical and structural organization of genomes with sort of the bold goal of ultimately connecting genotype to phenotype. As you will see, thanks to major technological advances and coordinated efforts that were really inspired by the Human Genome Project, a lot of progress has been made. So for my talk, separated into three sections, I'm going to give an introduction on the epigenome and its features. I'll talk about the mapping technologies that are now being applied very broadly. I'll talk about the NIH Epigenome Mapping Centers to tell you about some of the biology and how epigenome maps really inform on developmental, not only lineage, but developmental potential, but also with the goal of making the community aware of this resource, which whose value bridges across, I think, all of the institutes here in terms of disease. And then I'll turn to the ENCODE Project and really near and dear to the National Human Genome Research Institute, talk a little bit about how epigenomic mapping is now taking us beyond the genes and shedding light on the non-coding or the dark matter of the genome, and I think with important implications, as you'll see for human disease. So the fundamental challenge here is that we have a three billion base sequence of DNA, and this is two meters of DNA that needs to fit into a nucleus that's a tenth of the size of the head of a pin, right? So how does, how has biology dealt with this immense problem? Well, the DNA is, as we know, wrapped around the histone protein, spooled around these histone proteins to form nucleosomes, and these nucleosomes are then sort of arrayed into complex structures and fibers that then form into these higher order structures that make up chromosomes, right? But the structure is not static, right? This is a dynamic structure, and it varies greatly across the genome and with the genes and elements, right? So this is just a picture showing a Drosophila-Polytine chromosome, and what you can see here, the point is that the genes, if they're transcribed, the chromatin opens up and nucleosomes spread apart so that RNA polymerase and the transcriptional regulators can get access to the DNA itself, right? Whereas much of the rest is compacted. This can also be seen in this image of the nucleus where these silver grains show sites of transcription, which primarily occur internally in the nucleus, and these white structures reflect the heterochromatin, the peripheral heterochromatin that is compacted and sequestered away, right? So in this way, the genes and the control elements are made accessible to the polymerase and to the other regulatory factors that, they help those factors find these beacons of information in the genome. So one of the critical advances or critical elements that has enabled the field of epigenomics is that these structures are really predicated on chemical modifications. So these chemical tags exist not only directly on the DNA in the form of methylation of CPGs or cytosine bases. This is just shown here. It's often a repressive event that's associated with heterochromatin and transposed on silencing. And in addition, there is this, this is just shown, the nucleosome and the histone tails are subject to many different kinds of chemical tags or post-translational modifications, as we've heard about a little bit in Eric's talk today. So these modifications are intimately connected to the structure of the genome. Open structures have particular characteristic modification patterns, closed structures have yet others. And as I'll show you, the information content here is very high and can tell us a great deal about the DNA elements as we survey across them. So you know, a lot of the excitement in the field has stemmed from its connection to this concept of epigenetics, right? Epigenetics in some ways is a fancy way of thinking about development of how a single genome can begin early development and then through a pluripotent cell give rise to all these different lineages without essentially any change to the DNA sequence yet we know that these different cell types or lineages have, you know, markedly different phenotypes that are maintained throughout the lifetime of an organism. The different phenotypes are closely related to cell-type specific gene expression programs, right? Each of the different cell types expresses different genes and the gene expression patterns are in turn, as you might imagine, related to how the DNA is organized into the chromatin and at a certain extent some of these modifications such as DNA methylation and a select few of the histone modifications such as the modifications associated with polycomb regulation are epigenetic in themselves and some of these modifications actually help propagate and maintain the state of an individual cell and thus are epigenetic. And this is sort of the partial but not complete relationship between epigenetic and epigenome. Epigenome really studies the full structure in all these modifications and which can help us understand epigenetics. So I told you about development. What about human disease? Why is this so important? Well, it's, you know, increasingly being highlighted across many diseases. So cancer is a genetic but it's also an epigenetic disease. A baron DNA methylation is a hallmark of adult cancers. There are now increasing numbers of prevalent mutations being identified in a wide range of chromatin enzymes and other chromatin associated proteins are turning up in the large scale sequencing projects like such as the TCGA which we'll hear about next. You know, there are, there's increasing evidence for epigenetic dysregulation in neuropsychiatric disease, metabolic disease, developmental disorders, mutations in chromatin enzymes, aberrant epigenetic landscapes. And there's this sort of, I think at this point, admittedly ill-defined concept that we know there are long-term consequences of early environmental exposures and an open question remains, you know, how does this relate to epigenetics or epigenomics? And this is something that I think the field is also wishing to explore. More broadly, as I'll show you today, epigenomes enable a functional annotation that really can help us understand the way genotype gives rise to a phenotype in any tissue or disease that can identify the places in the genome where the activity is going on. And that's why I think it's very broadly applicable across different disease states. There's consequently this urgent need for, I think, reference human epigenomes. There are, of course, many of them for the different cell types and different disease tissues and for tools, importantly for tools to enable disease researchers to study and to understand the defects that might be present in their particular model system or disease. There are a number of groups of NHGRI, and ENCODE has put a lot of effort into this. There is the, I'll talk also about the epigenomics project, which has now spawned an international-level project as well. How, you know, again, going back to the genome project, and it's really, it's pushed technology to such a great extent that we've heard about the sequence technologies and how they've changed our ability to sequence genomes. They've also changed our ability to understand epigenomes, right? So these are the features, DNA methylation, accessibility of DNA, the histone modification patterns, high-ordered structures, and at every level, next-generation sequencing has really just changed what we can do. Whole genome by sulfite sequencing is now possible. You can sequence all the methylated DNA, pick those up. DNAs can be sequenced, hypersensitive sites, open genome, chromatin IP, chip-seq, as I'll show you, can be used to map histone modifications across the genome. And of course, I won't really talk about RNA-seq, but here's yet another technology. I don't want to say that, you know, we've solved the answers, right? There's a lot of challenges that remain. You know, I think key things in the field are people want to know how to do direct readout of methyl cytosines. Right now, everything we do is a little bit indirect. Single-cell analysis is critical. All our studies are ensemble analyses. We still need better ways to handle tissues and heterogeneity, a lot of challenges. We haven't really begun to have, you know, Eric alluded a little bit to how we're now studying higher-ordered organization of genomes, and we're just beginning. But I think what I'm going to really focus on today are histone modifications and some of the DNA methylation patterns. Because these are relatively mature technologies, and I think they provide an opportunity for researchers across many different fields. So first, some vignettes, a couple of vignettes from the epigenome mapping centers that the NIH has initiated and now has really extended to an international effort. One of the key milestones by Joe Ecker and Bing Ren and colleagues in the consortium was the whole methylome sequencing of ES cells that was published about almost two years ago now. Basically, bisulfite sequencing technology was used in which unmethylated Cs are converted to Ts. Deep sequence to 20x coverage, get a genome-wide map of methylation patterns in a given cell. So building upon this now, Joe Ecker, Jamie Thompson and colleagues have now taken a very broad look at many different ES cells and induced pluripotent stem cells, right? These are the reprogrammed cells and asked about the patterns of DNA methylation in these cells. They found a striking difference in the IPS cells. Many, many, these are just sites across the genome, and you can see the white and red and all these cells on the right in these IPS cells here really indicates there are many regions where there's a difference between the IPS cells and the ES cells, and in many cases this difference appears to reflect something remembered from the initial cell that was initially reprogrammed, right? Clearly a lot more work to be done, but it's an example of how this technology can help us to vet and quality control and analyze reagents for regenerative medicines such as IPS cells. Alex Meisner and Kevin Egan at the Broad Institute and Harvard Stem Cell Institute have undergone a similar analysis of DNA methylation patterns in ES cells and IPS cells. What they found was a very wide variation in DNA methylation patterns. Some sites in the genome tended to be very similar between cell types, but other sites in the genome such as some of these genes, S100, and some of these other genes had a very wide range of methylation patterns across these different cell types. Is this noise or is this functional? Well, what they were able to show is that actually the methylation patterns in an individual IPS cell or in an individual ES cell actually predicted the different lineages that the ES cell would be capable of differentiating into. So I'll just give you one static example of the CD14 locus which is important for macrophages. These are some of the Hughes ES cell lines. Here's Hughes 6 and you can see that there's partial methylation of the Hughes 6 promoter and you can actually, these cells, Hughes 6 can be differentiated into macrophages. The CD14 promoter now comes, the gene comes on. In contrast, they show that the Hughes 8 line which is completely methylated in the ES cell stage, CD14 cannot turn on CD14 and cannot make macrophages. So here's just one example where epigenomic maps can actually read out the potential of cells of clear value I think for regenerative medicine. So switching gears now a little bit to histone modifications. This is sort of, you know, another way to read out cellular state. Basically, as we heard this morning, we can use antibodies to a modified histone to pull out the chromatin that's marked by that particular modification. We can then use these new sequencing technologies to deep sequence the enriched DNA. Basically, take the reads, you count the reads and you integrate them into density profiles that basically tell you about the histone methylation patterns in these regions. Part of the beauty of the histone methylation is that there are many different marks that light up different kinds of entities. So for example, there's this green mark, lysine 4 trimethylation. This red is the lysine 27 methylation, a repressive mark. In ES cells, many developmental genes such as Pax 5 are marked by bivalent domains where you have sort of a mixture of these two marks. And increasing evidence suggests that this reflects sort of a poised state where the gene can be turned on depending on the fate of the cell in differentiation. And in fact, one can look at now down the lineage at hematopoietic stem cells. Pax 5 is still off in hematopoietic stem cells. You still see sort of a bivalent marking suggesting that it may still be poised in the hematopoietic progenitors. And as you can see in B cells, Pax 5 is a classic B cell marker, B cell lineage determinant. Here you have a switch. You keep this on mark, you lose the repressive mark. And this is a K36 methylation mark we heard about earlier today that is a mark that's telling you about a region of the genome that's transcribed. So here's the switch and basically kind of illustrates how one can use epigenomic maps to sort of follow gene expression in the fate of the cell. So the epigenomic mapping centers have looked very broadly at different classes of cells, pluripotent ES and IPS cells, different blood lineages, solid tissues, and then there's also cultured cells from the ENCODE project you'll hear about. We've mapped these cells using principal component analysis. You can take information from the methylation maps and project the cells into two-dimensional space based on these principal components. And I think you can see that regardless of the modifications we look at, we see sort of tight clustering of the pluripotent cells and each of these other types of lineages, blood or solid tissue, and culture cells seem to segregate into different parts of the map. So I show you this for a couple reasons. One is it's a resource that we can then project disease cells, regenerative medicine cells, and other lineages upon this and we can learn about how closely its epigenomic landscape maps onto what one would see for a normal tissue, right? Or whether it's looking a little more like culture has altered the epigenome. And the other reason is I could show you a little bit of some of the biology that's going on. And again, this is the K27 methylation mark associated with epigenetic repression. ES cells are clearly in this own space up here and there's this axis here in the PC1. What does this axis reflect? So in the next slide we'll look a little bit at this. And I think what I'm trying to show here is this is a large locus. And what you can see in ES cells is that the background for this repressive modification is very low and there's some peakingness across the genome. But as you differentiate into some of these terminally differentiated cell types here, you see a spreading out of this repressive modification. And this is something we see again and again in almost all of the tissues we've looked at. It's a thing that distinguishes pluricotin cells from committed cells. There are large portions of the genome sort of sequestered into repressive epigenetic or repressive chromatin. We think this reflects sort of a compaction of chromatin shown here. We can recreate it in cell culture by taking ES cells and differentiating them into EBs, neural progenitors into neurons and this leading to the sort of spreading or compaction of the genome. So I think this sort of, you know, together with the DNA methylation data shows how the epigenome mapping center data is highlighting a prominent role for epigenetic restriction both at specific loci but also across broad regions in lineage fidelity. There's a rich resource here too for developmental regenerative medicine and disease research. So for the last section of my talk now I want to sort of explore a little bit more deeply. What's going on in these very large portions of the genome here where you don't have any genes? Why has the epigenomic machinery gone to such a length to sequester these large portions of the genome away from activity? What's going on there even when they're non-genic regions? What's going on in this dark matter? And so I'm going to turn to another project. This is the ENCODE project and give you, tell you a little bit, a little story from that project. Here we are using signature histone modification patterns, so sort of combinations of histone modifications to very powerfully identify non-coding DNA elements such as enhancers or insulators or silencers, right, these things sitting out in the non-coding genome. What we found is that combinations of modifications are a very powerful way of picking up these different types of regulatory elements across the genome. We've collected very large data sets in the context of ENCODE. A matrix or a compendium of all these different cell types across all these different modifications map genome-wide in each of these cell types for well over 100 maps. We've heard a little bit about the importance of computational biology, software engineering this morning, and so I do want to use this opportunity to just really emphasize that we were so fortunate here to have a team of exceptional computational biologists led by Manolis Kellis at MIT Computer Science Department really mapping out an effective strategy for analyzing this very large data set as you'll now see. So the strategy, the goal was to annotate the genome very robustly by chromatin state. The strategy is to take an unbiased look, use one of these hidden Markov models, and learn about what are the recurrent modification patterns. So which modifications like to show up together? Which show up in different patterns? What are the different kind of combinatorial patterns that exist across the genome? We call these different patterns states. Once we learn these patterns, we then can annotate the genome by these states and we can pick up enhancers and all these other regulatory elements, right? So this is a little complicated, but this is sort of the different states. We found about 15 chromatin states. They each have different combinations of these methyl marks. I'm not going to go into detail here too much, but I want you to understand that, you know, these top three states, these are promoter states. They coincide with annotated TSSs. They are, you know, accessible and there's TS bound to them, right? Some of them are active promoters. Some of them are sort of poised promoters. We have a number of states here in orange and yellow that are really, I think, important for the story here. These look a lot like enhancer states. Some of them are known enhancers. They bind different TS, right? And they're accessible and they're very cell-type specific. So an example of enhancer state, we have insulator states. We have regions of transcription. So we're basically marking up the genome now based on chromatin to learn about the functions of all the individual regions and bases. So, you know, this kind of approach allows us to compress a very large amount of information, almost 200 different chip-seq experiments, 2.4 billion reads in about 100 billion bases into a single genome-wide annotation for each of the cell types, right? So we can then show you these are what the annotations look like. They're colored according to those states that I just showed you. Here's ES cells, erythrocytic cells, immune cells, hepatic cells, right? These are the patterns. We can see coordinated changes. For example, this is the witless gene in ES cells. It's got this little pink here, poised. It's off, but poised. In these cells here, it's active, it's transcribed, and there's a lot of these orange enhancers across the region, right? So we can study coordinated changes in chromatin states. I should show you that we've spent a lot of time validating that when we call something an enhancer, we light it up orange and we then go and do an enhancer assay. It, in fact, does enhance luciferase reporter activity. And our calls are cell-type specific. If you take an enhancer from an immune cell and you test it in a liver cell, it doesn't do anything. It has to be an enhancer annotated in the liver, works in the liver. So in the last couple of minutes, I want to talk to you about how we're trying to take this information and integrate it into a regulatory network that goes beyond just incorporating the genes and their interactions, but now brings in these distal elements, these enhancer elements, and links them into a genome regulatory network. So first thing we need to do is to take elements such as promoters and cluster them based on their patterns of activity across the different cell types. We did this for promoters, and what we can find is that some promoters that are specifically active in lymphoblastoid cells, well, they tend to regulate immune genes. Promoters active in hepatic cells tend to regulate cholesterol transporter lipid genes. But we can do the same thing for enhancers. We can cluster enhancers, and again, enhancers in immune cells tend to be near, not right at, distal, but near to genes that have immune functions. So we're doing something right here. Here are the ones in the liver cells involved in lipid metabolic processes. What can we do next? Well, we can now take coordinated patterns of enhancer activity, and we can correlate those patterns to gene expression patterns of nearby genes, and we can begin making probabilistic linkages between enhancers and genes. So now we're linking the enhancers to predicted target genes based on patterns. I don't have a lot of time to tell you all the details here, but basically we're using correlated patterns to link genes, and there's evidence that we're doing something right here because there's a rich field of quantitative trait loci studies. We haven't heard a lot about this today, but it's a genetic approach for linking enhancers or regulatory elements to distal genes. The linkages that we've drawn based on patterns of chromatin activity across many cell types to genes actually correspond nicely with a high significant enrichment to those linkages drawn from genetic studies. So there's something that we're doing right here. We're beginning to draw a model of the enhancers, their gene targets. What about the upstream regulators? We can also link enhancers to upstream regulators, upstream TFs by identifying enriched motifs, looking at the expression patterns of the cognate transcription factors, identifying signatures. Again, not a lot of time to go into all the details here, but what we can see is that an ESL-specific enhancer cluster, well, it's highly enriched for oct4 motifs, and it's on when oct4 is on, right? Liver enhancers tend to be enriched for H&F sites, and in fact there's evidence that H&F's regulating these. We can then go to do, again, our enhancer assays, we can take those enhancers, we can mutate the H&F sites that we predict are activating these enhancers, and sure enough, the activity is lost or goes down significantly in these examples, okay? So to summarize here, we've got now a model, a regulatory model, where we've identified enhancers and we've linked them to TFs and to downstream genes. Probabilistic is just a start. We're only linking about 10% of enhancers that we've identified. There's much more work to be done, but here's where it gets interesting. So we've now intersected these enhancers with the SNPs from the GWAS studies. So these are the SNPs associated with human disease. So most of these SNPs are non-coding. So what are they doing? Well, it turns out that disease SNPs are enriched about two-fold in the enhancer chromatin states. If you look at control SNPs, they don't show any preference for any of our different chromatin states, so, moreover, the correspondence between SNPs associated with a particular disease and the enhancers is cell-type specific. So erythrocyte phenotype SNPs are associated with this erythrocytic cell line. The SNPs fall in enhancers that are specifically active in the cell line. SNPs associated with lipids sit in enhancers active in hepatic cells. SNPs associated with immune diseases sit in enhancers activated in immune cells. Here's just some connectivities that we're drawing here. To give you just a little bit of examples of looking at the individual SNPs for the blood lipid disease study, you can see clearly there's a nice little stripe of enhancer activity going down in orange and yellow, specific to hep G2, liver hepatic cells, right? Drawing the connectivity, these things are falling in enhancers that are specific to the liver cells. Same for erythrocytic phenotypes. Same for SLE. We can pause it based on our model, some of the target genes that might then be regulated by these enhancers, possibly affected by these SNPs. And getting a little speculative, but I really think this is kind of the future of what we're going to see here, we can now, I think, assist in the interpretation of genome-wide association studies in the following way. First of all, in some cases, the lead SNP in a study is sitting in sort of an unannotated region, but an associated variant is sitting in an enhancer with just the right cell-type specificity. This may be a means to try to triage variants. This may be a likely causal variant as it's sitting in enhancer, a speculation, but I think it's something that is going to, should and will be followed up upon. We can also make predictions about regulatory interactions with TS. So what we find is that some of the SNPs affect the binding sites for some of the predicted regulators. So here's an example for an erythrocyte phenotype GWAS where the SNP is affecting a site for GFI-1B, which is one of the predicted, actually repressors of enhancers in erythroid cells. And finally, there's an example from SLE where there's an S site that is affected in a immune-specific enhancer element. And in fact, S is in SIS, already been associated with lupus, which is a variant in the edlocus itself. So my last slide. I tried to convey to you how valuable epigenomic maps are for finding the non-coding elements in the human genome and even for determining their cell-type specific activities. I've shown you how chromatin dynamics can help us link enhancers to regulators as well as to target genes. I've shown you some evidence of these annotations and regulatory predictions can be useful for interpreting GWAS studies. And I would like to just finish by arguing that integration of ENCO data, there's a lot more data here for T transcription factors and RNAs, hundreds and hundreds of data sets. As well as epigenomics data, the in vivo tissues, the mapping of the in vivo tissues that are really the site of these disease processes can provide an incredibly rich resource for interpreting many different types of human disease. And I've acknowledged people as I went along. So thank you very much. We have time for a question. That was just a fantastic talk. Hi, I'm Steve Lincoln from Complete Genomics, and obviously one of the problems that we wrestle with is the question of how to sort of make the bridge from exome sequencing to genome sequencing. When will it be that perhaps using the techniques that you described here, that one wouldn't just be able to apply the annotations to GWAS studies as you described, but actually have an equivalent of Cypher polyphen, if not better, that can help annotate variance in genome sequences outside of coding regions? Yeah, well I think what's clearly happening is that the field is moving so quickly. The sequencing technologies are moving so quickly thanks to places like Complete Genomics that very quickly, whole genomes will simply be sequenced, and that will be the most straightforward approach. So I think then this kind of information on epigenomes, possible functions of the non-coding regions and the non-coding elements can become even more important as we get more data down the line. If you use the same cell line to make IPS or mRNA stem cells, which one will be more lineage restricted? Say making brain cells or heart cells. The issue here is you can make IPS cells, but you can't make ES cells. So I think the question then becomes which is the best cell to start with to make an IPS cell? And I think I'm a skeptic in science. I think we need a lot more data. I think we need to take a much more in-depth look at the different ES cells and the different IPS cells as my colleague Alex Meissner has shown. There's a lot of variability across different ES and across different IPS. And I think we need to get a better handle on the nature of this variability before coming up with a clear picture on what the best approach is. More data.