 For the opportunity to be here. And to talk a little bit about how I see sort of past and future interplays between questions of basic biology, fundamental mechanisms, and sort of unbiased approaches to collecting genomic data. All good? Okay. Okay. And so, and I want to note that a lot of the things that sort of I want to talk about today are things that come from not only my head, but also discussions with John Liss, who unfortunately cannot be here today, but where I sort of want to make a plug for the ideas that he and I would probably both be presenting here during these discussions. So basically from the context of basic biology, one of the things that we've heard again and again is that as we've started to describe what these functional elements are, or these regulatory elements are, it's going to be important to understand really how do they function. And I think we've heard a few examples of where we can't assume that all enhancers or all silencers are the same, that they function identically, and that the logic or the grammar underlying them is going to be identical. And so one of the things that I think we've learned as of, you know, the past few years is that these early studies showing that enhancer regions were actually regions of transcription, that's not just the locus control region for beta-globin, that's actually a really generalizable thing that we can say about enhancer regions and these regulatory elements is they're not just sites where transcription factors bind and only function at a distance, but there are local events taking place. There's local transcription and that's something that I think really needs to function, to get its way into our catalog of these regulatory elements is to understand the transcription of non-coding RNAs that's happening locally. And so I think that this really begs the question of what new readouts and what new techniques we should be developing, we should be focusing on. And so I think we've heard a decent amount about possible readouts for regulatory elements, things like everything from reporter assays to some really lovely sort of unsophisticated phenotypic examples. And so I think that we really need to think hard about if we're going to attack function, how are we going to define function in specific contexts. But I do think that it's important in our description of these elements to take the step beyond chromatin state, beyond histone modifications and to start asking questions about how these things are affecting gene expression and in fact how the expression of these elements, how these non-coding RNAs is being regulated. And so I make a sort of push for not just doing this by thinking that expression is summed up in an RNA-seq experiment. That tells you about processed, stable, polyadenylated RNAs and they're really cheap, really easy experiments that you can do that tell you just so much more. And so I'm going to sort of argue that we need to start looking at nascent transcription, nascent RNA species so that we can study the direct and proximal effects of enhancers. So we've heard arguments that enhancer effects could be either amplified or suppressed by everything that functions downstream of that, right? If you're specifically looking at transcription at an enhancer and at nearby genes, then you don't have to worry as much about processing or stability or localization or translation of the protein, et cetera. And so I really think that we need to come at this from both angles. I think we need better assays of the proteome and I think we need better assays of the nascent transcriptome to really understand how these two things interface with each other because there's a whole world in between there that we don't fully understand. Okay. So I want to also make the argument that doing so doesn't just tell us something about basic biology, but if you can figure out a way of taking let's say human-derived IPS lines or some sort of differentiated tissue and looking at nascent RNA, you can quantitatively and simultaneously measure the activity from an enhancer and all the local genes in that region as well as understanding how changing expression in that region affects all the other genes in the genome. And so I think you get basically a quantitative effect of how changing, for example, sequence or deletion of an enhancer affects not only local target genes, but the entire genome and I think then you start to get some insights into networks of interactions. Okay. So again, I'm going to sort of suggest that we need to look at more sensitive measures of transcription and how it's affected by regulatory elements and again not just identifying the direct target in the local region, but also sort of getting at these big networks. And I want to just point out that if you can identify where transcription takes place in an enhancer, then understanding the transcription factors that are regulating that is easier. Now I'll give you examples of how identifying where enhancer transcription takes place pinpoints a spot, a nucleotide around which you can look for transcription factors and that gives you a lot more focus, a lot higher resolution than looking at a 5KB region that has some histone modification. All right. And so also one of the things that we heard from Joe yesterday is that ultimately to understand how these things function, how these regulatory elements function, we need to start to get a handle on how these might be affecting gene expression at the level of initiation or elongation, RNA processing, RNA stability. And by defining exactly where transcription starts and ends, I think you can start to put a bound on what these non-coding RNAs do. If they're really short, if they're long, can you find hidden orps by better defining the actual extent of that transcript? And so I think that we need to work harder to find assays that can be high throughput. Those that require a small number of cells. And ideally they would be able to yield meaningful data in a sort of cheap way, meaning that you can get some information without having to sequence incredibly deeply, which is an issue with some of the histone modifications, so that we could actually multiplex and evaluate different time points, different conditions, multiple either patient samples or deletions and SNPs. Okay. So this is my main point is I think that in a next iteration of ENCODE, we can look more closely at RNA and not just a full, stable RNA but also nascent and newly produced species because I think that this is efficient and direct way to look at function. And I just want to give a couple examples of how unbiased genomic data collection has already interfaced with fundamental biological questions and how they've both really led to new techniques that have then fed back and fostered. I don't see, again, as a view pointed out, these aren't two sides where if one gives the other has to take, I actually see these things really as functioning together. Okay. So the example I want to give derives from the studies of John List. So for many years, he studied a couple of genes in the fly. And he had discovered this phenomenon called promoter proximal pausing, where the polymerase would transcribe a short RNA and then just sort of sit in the promoter proximal region waiting for a signal, in this case heat shock, to come and release that pause signal and let the polymerase go rapidly into the gene. And so he studied this sort of in isolation for quite a long time before we had the ability to look at things genomically. And so even though a few other genes had been demonstrated to be regulated in a similar fashion, before we could do unbiased data collection, we had no idea how broad this fundamental biological process really was, right? And so we thought this was a rare phenomenon. We thought this was something that happened in a handful of genes. And so this is a pathetic example of an experiment I did 10 years ago in John's lab. And this was like the state of the art. I did some chip, and I had some primer pairs, and I said, look, there's a peak of polymerase at the promoter of this gene. That's where we were. And fast forward to being able to do chip-chip and chip-seq. So Ginger Muse in my lab did these experiments in Drosophila as two cells and looked across the entire genome. For are there other genes in the genome that are regulated using the same strategy? So fundamental biological questions identified a strategy, but we needed genomics. We needed unbiased data collection to ask how broad is this strategy, how broadly used is it in cells? And so just as an example of this here, I'm just showing some chip-chip. At all 16,000 Drosophila genes, there's a heat map where I think you can see that the intensity of signal for every gene that has polymerase at the promoter, basically, the intensity of the signal is really enriched right around the promoter. So that pausing isn't something that happens at five genes, but it's something that's really a regulatory strategy that now we have to integrate into the way that we think about how genes are regulated. And so I really think that that's where genomics enable basic biology to make a leap. But then basic biology had a question. It said, well, here's Paul too, and yeah, it's sitting there at the promoter, but is it engaged in transcription? Is this actually the same thing that John Liss had been talking about for three decades? And so both John, both Leighton-Corin, John's lab and Serginichaev in my lab wanted to develop techniques to look at the nascent RNA generated by either specifically by pause, Paul too, or by elongation-competent transcription complexes, again, across the genome. And so as many of you know, Leighton developed this global run-on-seq technology where you isolate nuclei and then you feed them a labeled nucleotide and you then purify RNA containing that labeled nucleotide so you can identify very specifically with very beautiful signal-to-noise where transcription complexes were sitting in the genome that are competent to elongate. So this just gives such nicer estimation or evaluation of the actual transcription rate in a cell at a specific time than you're going to get from RNA-seq or from Paul too, chip-seq or from anything of the like. It really tells you something about where engaged elongation complexes are. So the strategy that we took, just focusing on transcription start sites, doesn't give you information about the level of transcription going through the entire gene. Oh, so anyway, the basic biology question that this answered was that we could find nascent RNAs almost everywhere we could find Paul too. All right, so that was the answer to the basic biology. But what John and I realized or what a number of people have realized is that these techniques have a life beyond pausing, right? So they have a life way beyond what our geeky little initial question got us into. And now I think these techniques should be more broadly utilized to address genomic questions. And so just to sort of say something about our technique, which is a start-seq, and John now is doing a five-prime cap a version of gross-seq. They really focus on the five-prime end of these nascent transcripts. And so this gives you single nucleotide resolution of exactly where transcription is starting, right? And so we pulled these either from the chromatin fraction or out of the nucleus. We get them when they're short and still associated with the polymerase by and large, and we purify them by use of the fact that they have a cap. So we can decap, we can identify that first nucleotide. And the nice thing about doing something like the start-seq is that you don't require the polymerase to put another nucleotide in there. And so you can get these RNAs even if these polymerases are not stably paused, even if they're just actively transcribing through, if they're arrested in transcription or if they're in the process of termination. So you can get these short RNAs from wherever transcription is initiating. And again, the thing is that all these reads get focused at the five prime end so that you don't need a billion reads to cover your sequence space. Because you're really just looking at exactly where transcription is initiating. And so I'll give you some examples of that. Okay, so here's just an example of what data looks like. So you have the start RNA-seq, the same thing can be said for the five-prime grow cap. And so you can get sent strand reads that line up to transcription start sites. And you can get antisense reads that tell you something about divergent transcription. So this is a surprise. Again, something that people would have never guessed at. But genomics told us it was there. So in mammals, the polymerase goes in both directions at many coding genes. And in the antisense, upstream antisense direction makes a non-coding transcript. And so again, we know this because of these genomic assays, because of unbiased data collection. And then, but then this raises questions that basic biology can delve into further. All right, and so we can use these techniques to specifically say, well, I think the start site of this is slightly different from annotation in the cell type. We can use this to annotate exactly where the antisense transcription is originating. And again, we can also use these to define the end of that transcription. So now here's just an example of what the data can be used for is you can start to get a handle on promoter structure. So if you align, these are in mouse macrophages. You look, align all the data by the start RNA sense strand. And you sort of rank over to the genes by how far away the antisense transcription start site is from the sense. You can see, as Frank Pugh showed for yeast, that you actually have two distinct Paul II complexes going in opposite directions. And you have, and you can see TBP is there. This is this tautobinding protein. And you can see that you have this big nucleosome deprived region in between these two sites. And so now we have an understanding of promoter structure that we didn't have before we could precisely identify where the antisense, where this divergent transcription was coming from. And again, this is, you know, this is new information from a technique that is being repurposed, I would say, to do an unmore unbiased search of the genome. Okay, so this is not just something that can be used to look at promoters or divergent non-coding transcription around promoters, but it can also give us insight into some of these unstable non-coding RNAs that are generated in enhancers. And so what I want to point out here is that enhancers are described often by the presence of DNAs, hypersensitivity, and a number of histone modifications. And while this is really beautiful and it's given us lots of sites to look at, one thing I should note is that the regions defined are awfully broad. And so here I'm just showing an example from this cell stem cell paper, where they were identifying the enhancer regions in embryonic stem cells. And then later, after those stem cells were differentiated into epiblast or epiSCs. Okay, so the things that are labeled here in blue are the things that they define as enhancers in mouse embryonic stem cells. And you notice that a lot of these key enhancers right around this KDM5B gene are actually lost in the epiSCs. And so you lose the monomethyl, DNAs, H3K27, settle, etc. And then they would argue that there are new enhancers that are coming up. And so again, here's a situation where the perturbation, or in this case the time course of development, is informative because you see these things changing. Okay, so is there a way, then, of better defining enhancers and asking whether or not everything in this region is identical? Or are there hot spots? Are there hubs or specific loci within this enhancer that are potentially more relevant to their function? And so one thing, when we can think about this is in this big region, where's the action? And so I just wanna sort of show our work that we've done in collaboration with Joanna Wiseca characterizing the same transition from ESCs to epiSCs. And this is super preliminary data. We've done one lane on a my seek of our start seek data. And here in the mouse embryonic stem cells, this is the forward strand. And then this is the divergent transcription on the reverse strand. And you can see that in fact in these loci that are described as enhancers, you can see both forward and reverse transcription in the embryonic stem cells that's totally gone in the epiSCs. So again, in a lane of my seek, you can sort of identify where transcription is taking place within these regions. And one thing I just wanted to note is that the alignment of these reads is really beautifully sort of lined up with the DNAs, which sort of, I think tells us that there's a focal region within these enhancers that we need to be thinking about what the transcription and the DNAs are pinpointing the exact same region within this sort of multi-KB locus. Okay, so the only, the thing I wanna point out here is that, again, in the lane on a my seek, we can easily monitor changes in the enhancer activity in these altered contexts. Okay, and then again, because it's a high resolution definition of five prime ends, we've looked around these five prime ends and tried to understand which transcription factors are present and absent in the ESCs versus the EpiSCs to try to get a handle on this. It's easier to do if you're looking in 200 base pairs upstream of a promoter than in a six KB region. Okay, so I just wanna say that what we've done is sort of what I've tried to get at here is that you can take techniques that are developed to get at a very specific biological problem and use these techniques. And I focused on grow seek and start seek, but I also am in love with this metabolic labeling for SU seek that Aviv has done a lot of, because it gives you this broad sort of agnostic data collection possibility. And yet you can use it for everything from annotating functional elements to answering very specific biological questions. So I would argue that NHGRI should continue supporting technology development, even if it seems like it might be set out to answer a specific question. Almost everything that is done in ENCODE was started that way. And then it can be repurposed to then ask the same kinds of questions. How broad are they in a bigger context? And I would just like to say that I think that investing in new technologies is important. But I also wonder if some of the large consortia projects could be structured in a way so that they could flexibly integrate new kinds of techniques, new sorts of data analysis, you know, add on a new collaboration to a consortia to integrate some of these new data sets. And I think that these are things that are already in the works, but I think in the next phase doing even more of that would be great. Okay, so these are the questions that I was asked. I just want to say that I think we need better assays of direct consequences and better readouts of phenotype and function of some of these elements. And I think that there are ways of doing this in an efficient manner that can be then easily scalable and multiplexed across many perturbations, many time points, many conditions, which I think ultimately is a goal that I hear sort of reiterated here is to understand the functional genome. You sometimes have to push it a little bit and ask how it responds. But we can't do that with 800 different assays. And so there are probably some low hanging fruit and I would just argue that RNA should be among them. We're running behind schedule, so we're going to leave the questions to the discussion session and move on straight to Anjana Rao, who will talk about analyzing cytosine modifications in genomic DNA.