 there at the end, it's a crucial part of encode three. So now we're gonna go on to the official function, functional annotation of encode element session, and function we sometimes jokingly refer to as the F word in encode, because it's a little bit in the eye of the beholder where you draw the line in function. We'll start with Tim Reddy from Duke University talking about his really cool assays. Carry this around, is that gonna be okay? Yeah, okay. Much more comfortable moving. So I'm gonna talk a lot today about some work that we've been doing as sort of functional follow-up to some work that was done in the second and third round of the encode project. Looking at basically the question of, now that we've identified all these chip seek peaks, what are they actually doing? Can we start to identify some of the underlying regulatory mechanisms? So this is a picture I've, shows us from a trip down to Patagonia last year, and I think that the jagged mountains are kind of reminding me of looking at the genome browser, but in real life. Okay, so I think we're all familiar with the challenge that we're facing, but I'll just make sure we're all on the same page. So when you complete a chip seek study, this is typically what we end up seeing. We have some number of binding sites that we find, 10,000 has sort of got the right number of digits usually. A lot of times you can start to try to, you can knock down this gene or over express the gene, figure out how many genes are being regulated directly by this transcription factor, and oftentimes we find that numbers on the scale of hundreds. And so we've got this vast excess of binding sites for transcription factors in the genome, and now we need to try to figure out what are all these sites doing? And so this is kind of what I think about is the major question, once you've completed your chip seek studies, is how do we actually understand the biological mechanisms that are underlying that gene regulation? So the model system that I'll talk about today, and that we focus a lot on in my lab, is the cortisol system. So cortisol is a stress hormone. It's released, I'll talk about it in a second, but it's a small molecule that basically is responsible for suppressing the immune system, reducing inflammation, and it also increases your blood pressure, increases blood sugar. And so the idea is that if you're being chased by a bear, your body releases cortisol, and what that does is it makes sure that your knee that's really sore and inflamed, that inflammation goes down so you can run away, and it pumps a lot of sugar into your blood so you have energy to run away from that bear. And so it's that anti-inflammatory effect that we really love, and that's why we use it, we use cortisol or cortisol in creams, for example, to get rid of inflammation due to, for example, poison ivy. But there's a lot of problems associated cortisol as well, it's that elevated blood sugar over the course of weeks and months, if you're taking it systemically, say for rheumatoid arthritis, you'll end up developing all sorts of metabolic issues, including diabetes. So, you know, how does cortisol, how is it produced, just real quickly? So it's released as part of a signal cascade that starts in the hypothalamus, where you have corticotrophin-releasing hormone, a small peptide hormone that impacts the pituitary, leading to the release of basically ACTH through some cleavage steps of a prohormone. ACTH travels through your bloodstream, where your adrenal glands sit above your kidneys, receive it, and then translate that into the release of the actual steroid cortisol. Cortisol then goes back and represses a lot of these systems to keep everything in balance. From the point of view of a system to study gene regulation, the reason that we really love the system is it's extremely easy to turn on and to turn off. So cortisol freely diffuses through cell membranes. It binds the receptor, the GR, the glucocorticoid receptor here in purple. Without cortisol or without glucocorticoids, the GR just hangs out in the cytosol, doesn't really do anything. But once it binds cortisol, once cortisol comes into the cell, it is basically released from all the chaperonin proteins, dimerizes, translocates into the nucleus, and binds DNA where it acts as a transcription factor. So you can basically take cells, we grow them in a dish, throw cortisol onto the cells, and you can induce sort of nuclear localization of this receptor, you can take the cortisol away, and it goes back. And so it's a system where we can induce and repress the activity of this transcription factor very quickly and really comprehensively. And so it's a great system for us to study how these TF binding events impact gene regulation, gene expression. So this is kind of getting back to where we started. So we've done chip seek for GR several years ago now, and typically what we find is like we do for any transcription factor, around 10,000 binding sites across the genome, probably more than that, but on that order. And on the order of hundreds of genes that are regulated by those initial binding events. And so in order to start to understand the activity of these sites, one of the first things we did was started to use these Luciferase reporter assays that we talk about a lot. So briefly you just take the glucocorticoid response element, the GRE, you can stick it in front of Luciferase, you can treat your cells, either with ethanol, which is just a control treatment, or with dexamethasone. This is a synthetic glucocorticoid, a cortisol analog that we use in the lab a lot. And once you treat the cells with dexamethasone, you can increase Luciferase activity, shown on the right, you can get really nice dose response curves, you can really recapitulate a lot of what sort of you would expect pharmacologically from adding glucocorticoids. And so it's a really nice system, these Luciferase assays work really well for studying the activity of these GR binding sites. So the problem is that we would have 10,000 sites, and if you guys have ever done a reporter assay or know someone who has, generating this is slow and tedious to get to 10,000, I think, it would be infeasible. And so this is the motivation why we're seeing a lot of these high throughput versions of reporter assays. And so there's lots of ways you can imagine making a reporter assay high throughput. Basically they all boil down to, instead of looking at Luciferase, can we look at, use high throughput sequencing to read out the different reporters. So one way you can do it is you can take a small molecular barcode, just basically a random DNA sequence, put it somewhere in your reporter gene, Luciferase or GFP, which you effectively ignore. And if you know which barcode shown here is linked up with which binding site shown there, then I can just use sequencing and sequence these different barcodes and measure the activity of these different sites. And there have been several examples of doing this with a barcode in different parts of the gene, et cetera. We use a different approach that we've really enjoyed using. And the idea here is rather than putting a barcode in the GFP and then putting a regulatory element in front of the gene, what we instead do is just take the regulatory element and put it in the three prime UTR of the reporter genes. This is the StarSeq approach from the Stark Lab. And so what happens when you do this is, and we know this now, that regulatory elements such as an enhancer, they don't care if they're in front of the gene or downstream of the gene, even if it's in the three prime UTR, it can still act to regulate expression of this reporter gene. But when it does, it's now regulating a gene that in its mRNA transcript contains the regulatory element itself. It's within sort of self-transcribing. And then, so in other words, if this element's very strong, you're getting a lot of copies of a GFP mRNA that have your regulatory element in the UTR. And if you have a different element that's very weak, there's not very many copies of it. And so if we use high throughput sequencing, and basically we do RNAseq, but we're only sequencing the three prime UTR of one gene. So it's a very specific RNAseq reaction. And then we just count the number of times each regulatory element appears in the mRNA that gives us a proxy for how active that element is. So there's some real nice advantages of this system as opposed to some of these other approaches. For our uses, we've been able to take this and use it in others as well to assay millions of different regulatory elements at once. Of particular relevance for studying complex gene regulatory interactions, these elements can be hundreds of base pairs, kilobases, the largest we've measured have been a couple thousand base pairs. So we're not limited to a few hundred base pair elements, which you often are limited to sequencing, or sorry, with synthesis. And all you have to do is really put your regulatory elements in the UTR. That's a simple ligation reaction, and people have been generating libraries with these simple ligation reactions for decades. And so what we've done now is we've taken this, and this is again to study all these GR binding sites, what we've done now is we've taken our ChIP-seq libraries, and up until here, all we have is basically our sequencing libraries that sit in the freezer after they've been sequenced and you don't do anything with it. We thaw them back out, we ligate them in the three prime UTR of this reporter gene, and now all of a sudden we have a massively parallel reporter assay that can assay the entire ChIP-seq library all at once. Once we have this library generated, and I would love that this is four steps, this is like six months of work. We can transfect this into cells, then we can treat our cells with our drug cortisol, dexamethasone, whatever it might be, or our control, and then again just collect the mRNA out and sequence that, and basically what we'll get in theory is sort of a functional measure of the activity of every single GR binding site that was pulled down with the ChIP-seq study. So cutting to the chase a year later, what we've done now with this is we've assayed all, in this case, 12,000 GR binding sites that were in the ChIP-seq libraries that we started with, we started a few different libraries. And the bottom line, and this is concordant with lots of other approaches to look at high throughput reporters to measure the activity of transcription factor binding sites. Basically what a lot of people are finding is somewhere in the order of 10% of these regulatory elements actually seem to have function. In this case, for us, because we're looking at a hormone response, we're sort of internally controlled, and so we're actually asking as how many of these sites are dex responsive, how many to respond to hormone, and it's about 10%, and so this is shown here. So each dot on the scatter plot is a different GR binding site. On the x-axis is the baseline activity, so they see the activity of the element in the control conditions. And then on the y-axis is a log-full change, so going up from zero, our binding sites that have increased activity in response to hormone, and then going down are decreased. And so about 10% of those dots, we can confidently say are different from zero, and that's shown in red. Interestingly, all of the elements that respond are activating. I'm not really convinced that this 5% of repressive elements is anything else than just false discoveries, but I'm not gonna get into that today, that's a whole nother story that we're looking at. And so this is the sort of our take home, and the question now is how do we interpret this? So one possibility is you could say that only 10% of the GR binding sites in the genome have any activity, the other 90% are places where GR binds, it doesn't do anything, and that's the end of the story, and now we've solved encode, and I don't think that's the answer at all. The other possibility is that harkening back to what we're doing with these reporter assets, we're taking these elements, extracting them from the genome, putting them into a plasmid where they're by themselves, and when you do that, 10% of them work, and maybe the other ones, maybe they just need more, maybe they need more context, maybe they need certain promoters, maybe there's something special about them that's dependent on other contexts. And so what I'll talk about for the rest of today is sort of what we're digging into in that realm. So getting back to the GR in particular, and I don't think this is different, I think lots of transcription factors have sort of these properties, but some of the ways that the GR binds to the genome, there are dozens of different models of how GR interacts, I'm just showing a few here. Three of the ones that we think about the most are first direct binding shown at the top, where the GR binds a transcription factor binding motif, the GRE shown below, binds directly to the DNA and responds, and this is sort of our traditional transcription factor binding model. There are other models where GR can bind in kind of a cooperative fashion in these sort of composite sites, most notably with AP-1s, you could have an AP-1 site shown in a triangle, in a GRE next to it, or maybe a weaker version of the GRE next to it, and basically these two sites helping each other bind at the same site can sort of create a composite element. And then there's other mechanisms where the GR actually doesn't bind the DNA directly at all, but binds indirectly, so you can have an AP-1 site bound by AP-1 in the green triangle, and then GR interacts in a protein-protein mechanism with AP-1, and so it sort of binds the genome without actually interacting with the DNA directly. So we can distinguish between these different models, in some ways, just by looking at the DNA sequences. If we see GREs on the top, we expect direct binding, if we see AP-1 sites, we see tethered binding, if we see both, we might expect some sort of composite binding. And so we went through this, we looked at all of these different motifs for transcription factors, GR, AP-1, Krev, FoxA1, Relay, or NF Kappa B, a lot of known cofactors for GR, and tried to explain the activity we see on our reporter assays. And basically what we found is out of all of the motifs, the only one that gives us any ability to predict activity is just the GRE itself. All the other ones don't predict anything. In other words, I'm gonna skip this. All of the activity that we're seeing is really due to these direct binding events, not due to these cooperative or tethered events. So this creates a problem. So on the one hand, we know that if you take glucocorticoids and you treat liver cells or lung cells or whatever cells, that between different cell types, there's dramatically different responses at the gene regulation level. We also know there's dramatically different responses at chip seek, using chip seek. GR binds lots of different places between these different cell types. And several people here have worked on this, John's worked on this. The problem is that those differences in transcription factor binding are all largely attributed to these cofactors, like AP-1, right? So if the GR binding sites that are actually doing anything are these direct sites, and all the cell type specific binding is due to these indirect interactions, if the indirect interactions don't actually do anything, then we're gonna really have problems trying to explain these cell type specific GC responses. And so this gives us really strong evidence that these indirect sites are doing something, they're just not doing anything in this reporter assay, and the question is really why? And so the model we've come to and we've been working on is that these AP-1 sites, and this is potentially general to other cofactors, but I'm just gonna focus on AP-1 for now, that these AP-1 sites modulate the activity of these direct GR binding sites, and they do it in some way that requires more context than just the site by himself. And so in other words, what we think is happening is actually that these AP-1 sites are interacting with these other GR binding sites, and that you need them both to be together to observe the activity of some of these more tethered interactions. So this is the model that we're proposing. We have some, I think, pretty strong evidence to support this, so here's an example of just one region of the genome or what we're looking at, and what we see is that just like other transcription factors, binding of GR is heavily clustered in the genome, so on the top are all the peak calls and the chip-seek data, the next two tracks below it. There's lots of GR binding sites in this one, kind of small region around ICAPA-B alpha, and then when we do star-seek to measure regulatory activity, really only one of the sites in that cluster has any activity in the reporter assay by itself, and the other ones are probably these tethered or sort of indirect binding sites. And one of the things we found, so here's an example where we have one active site among a cluster of different sites, and basically what we found is that as we look at clusters of different sizes, pairs of sites together, lots of sites together, the fraction of sites that are actually responsive in our reporter assay goes down as we look at larger and larger clusters. And what this says to me is that these individual GR binding sites are potentially nucleating these larger clusters, again through these interactions. Sometimes they nucleate a very small cluster of just a couple sites, sometimes they nucleate clusters of lots of sites, but to me this is consistent with the idea that one site, one direct GR binding site is nucleating binding it at these several other GR binding sites nearby. Another way we can think about this is we would expect that these GR AP1 interactions would be limited by distance on the genome, and if GR is really looping in to interact with these other AP1 sites, then the AP1 sites that gain that GR should be pretty close compared to the AP1 sites that don't gain GR. And that's what we're showing here in red and in black. So in red are the sites, sorry on the X axis is the distance to one of these direct binding sites that's responsive in a reporter assay. What we're finding is that if you look at the AP1 sites that gain GR, they're about 45 kb on average from one of these direct binding sites, whereas if you look at the AP1 sites that don't bind GR, it's about 140 kb away. So there's almost three and a half fold further away when they don't gain GR. And so again, this we think is due to these kind of chromatin sort of indirect interactions between these direct sites and these AP1 sites, thus forming these clusters. Another way we can think about these clusters is where CTCF is. So CTCF has an insulating activity and we would expect within these clusters between direct sites and then the AP1 sites that bind GR that there would be a depletion of CTCF, there'd be a depletion of insulator elements that would kind of interrupt that looping interaction and that's exactly what we see. So I won't explain all of our permutations but if you permit, you sort of get this distribution out here in gray with red and on blue is what we actually observe. So we see many, many orders of magnitude depletion of CTCF between some of these interacting domains. And so based on these and other data that I don't have time to talk about today, we're pretty confident that a lot of these GR binding sites that we're observing in the genome are actually direct GR binding events that are then looping in to AP1 and binding in an indirect fashion. And that might be interesting just sort of from understanding the structure of the genome but we still have to explain these cell type specific responses, right? And so what we need to know is to do these AP1 GR interactions, do they actually alter the dex response in a thing? They actually modulate the activity of these sites in a way that might help explain some of the cell type specificity. And so people looked at GR AP1 interactions for a long time. We've done some of that ourselves now. So if you look at, this table isn't laid out the best but if you look at GR binding sites by themselves shown here, I should reorganize this, you see a response to hormone which is the gray bars go up a little bit. But then when you add an AP1 site or add two AP1 sites you see massive increases in the activity of the sites, 10, 20 fold increases. So just by adding the AP1 site, again these are AP1 sites that have no activity, no dex responsive activity in a reporter assay but you put them next to a GR and you get like a 20 fold signal amplification. So I think there really are some synergistic activities going on. Another way to look at it is to vary the distance between a GR binding site and an AP1 binding site. And so we're showing here, so this is a series of reporter assays on the X axis is the distance between the GR site and the AP1 site. You know when they're very close this is the sort of composite interactions I talked about. You know we see pretty strong changes, 10, 20 fold changes in activity with response to hormone. And it's sort of an intermediate distance around 150 base pairs and this is consistent with the distance where there can't be a direct interaction between these sites but the DNA isn't flexible yet enough for these to actually loop over and interact. Then what you see is actually the fold changes go down. I'll point out this is on a log axis. So we go from 30 fold changes in activity in response to hormone down to two or three fold changes. So you see a big drop in the overall dex responsiveness of this sort of configuration. And then when you separate these sites even more you get to the regime where the DNA is actually flexible enough to loop over and interact. We see the dex responsiveness again go up in skyrocket up to 100 fold. So we think this is again consistent with the idea that once these sites can interact that you can get sort of these massive synergistic effects through these tethering GR AP1 interactions. And so just to summarize real quickly. So you know what we're thinking about now is what we're seeing in the context of chip seek where you know what we think we'll receive for example three chip seek binding sites in the cluster what we might be seeing what we think we're seeing at least in the context of the GR is potentially actually just one site that's nucleating a lot of tethering a lot of chromatin configuration changes that are then appearing as sort of multiple sites that are then crossing together in chip seek. So just to summarize so we've now sort of so we've developed a sort of a high throughput approach that allows us to now measure the activity of for example every GR binding site and as I said the starting point for this assay is just whatever sequencing libraries you have sitting in the freezer. I know hundreds and thousands of chip seeks have been done and we can now thaw those out ligate them into a reporter gene and do a reporter assay to measure all of the activity of all the sites at once. And what we're finding is that there's really a functional diversity of both GR binding sites suggesting these different interactions. We think that based on some other data I want to talk about that this type of diversity is general to estrogen receptor, other steroid hormone receptors and other types of transcription factors as well. And so it's helping us to kind of go in now through those 10,000 sites and get a much more nuanced understanding of what they're doing that's I think better and at least more nuanced than just active and not active. I think there's a lot more going on there that we're starting to look into. So this is just one part of kind of what my lab is doing and this is work that started as the ENCODE projects. I want to put a plug in so we're now continuing this work and what I've talked about today and a lot of other stuff through one of these genomics of gene regulation project where we're now, instead of looking at just one time point of glucocorticoids, we're actually looking at 12 different time points across 12 hours. We're measuring output with gene expression, with chip seek, with high C actually, DNA seek, star seek. We're basically doing sort of a mini ENCODE style project just focusing on this time course of glucocorticoid responses and this is paired with a whole bunch of statistical modeling and genome editing to model and perturb this network respectively. And the reason I'm mentioning this now is because we've already released some of the first data. So this is on the ENCODE portal that everyone's using today and there's a little under project. You can get the GGR link and so we've got some RNA seek and DNA seek data from this time course already up. I think the chip seek data is probably very shortly behind as well as many other data sets. And so this is sort of, you know, we're kind of putting this into the ENCODE portal because, you know, it kind of looks like ENCODE data, kind of acts like ENCODE data. And so if you're comfortable using ENCODE data, I invite you to come download our GGR data and start plugging away at that. It's up there right now. And then finally, I'm looking for postdocs. So if you're interested in what we've been doing, I'm looking for experimental and computational scientists to join the lab, continuing the work that I've talked about today, also other work where we've been applying this in the context of human disease and into some different genetic studies and looking at specific diseases, sort of a whole variety of projects that I'm looking for postdocs to join for. So I'd love to hear from you if you're interested. And then finally, the people have done this. So there's a ton of people who've contributed to all the different aspects of this. I'll highlight in red three of the students in my lab who did most of the work they talked about today. So Chris really developed the assay that we're talked about. Tony's helped a lot with some of the interpretation and Ian as well as a really strong statistician in the lab who's been driving a lot of the analysis on this. And Chris is graduating from my lab in two weeks, oh God. And he's gonna be looking for a postdoc soon. So if anyone's interested, get in touch with him. And other than that, I'll just take any questions. Thanks. Someone's monitoring. Yeah, that was awesome. Do you think that the GR, which is tethered to AP1, is the same protein that's directly bound to the other side or is their dimer formation or are both in a large complex? Do you have any sense for how those dynamics work? My guess is it's the same. I mean, GR forms a dimer. So my inclination is that AP1 and that GR that you're seeing is the same GR. But keeping in mind that we're looking at batches of 20 million cells. And so for example, if we have a cluster of several different things is that one GR binding at several different sites, I don't think there's enough interaction domains for that to happen. So it's probably different loops forming at different times across billions of cells. Here, Tim. Fascinating work. So I was just wondering how deep do you need to sequence your libraries? Because of the input from chip seek to your thoracic is not just 10,000 binding cells. It's millions of fragments, right? Yeah. How deep do you need to go? I don't remember what we've actually done. I mean, we basically sequenced to 50, 60 million. I mean, keeping in mind that there's 12,000 sites, there's millions of reporter asses just on those sites. And so even though each element is at low coverage, we have lots and lots of effectively replicates at each of those sites. And in fact, the stronger the chip seek peak is, the more power we have in a way, right? So it's not like you have to sequence to like a hundred fold what you normally would do. We sequence to, I forget the actual number, but it's comparable to what you would do for a normal chip seek. I see. That's great to hear. So the second is more like comments. I was just thinking, if you just do a standard star seek, probably you don't need to do it in the whole genome. Just do it back a few 10 million megabytes. Like then compare with the chip star seek, with the regular star seek, maybe you can get the percentage of the GR binding enhancers, their percentage in the overall enhancer landscape. Yes, we've done, I didn't talk about it here, but we've done about 10 megabases, maybe five megabases of backs that include GR binding sites. We've, there's maybe 40 or 50 GR binding sites that are in those regions. The results largely agree with what we do if we just assay the GR binding sites. We found one site that is responsive in the assay and doesn't have a GR binding site. And I'm not convinced that that's not something anomalous. I'm not yet sure what that is. So looking across, we basically, it seems like we're not missing too much. It seems like the sensitivity specificity is comparable. Thanks. Right here. So very interesting talk. I have about 50 questions, but I just want to ask, let me ask three, right here, sorry, three quick questions. One is the size of the peaks that you're, the size of the fragments you're actually cloning into these, into these reporter plasmids. The second is, have you tried different promoter constructs and do you get different results? And the third is, if you put these into different cell types, cell lines, host cell, recipient cells, do you get the same results in terms of the activities of the individual peak fragments? So the sizes are 100 bases up to about 2KB. The median's around 350 base pairs. We'd see, for example, that larger fragments are more likely to have activity, which you may expect. And they have, actually, the larger fragments also have stronger baseline activity, which is something interesting. We haven't fully pulled on that thread yet, but there's probably some interesting things happening there, we think. The second question was promoters. So the promoter that we're using now is pretty strong. We've replaced it with a minimal promoter. And basically what we see are largely concordant results, but with the full changes are much greater because we're probably starting with a lower baseline activity. We used a really strong promoter to start with, because we were interested in the potential for repression as well. But that said, we haven't taken this minimal promoter data set and put it through the same, but it looks largely concordant. Different cell types we don't know. That's what we're working on right now. Yeah, and that's kind of what we think. And actually what we wanna go back to and do now is look at these, because we can push these libraries larger, go back and make these libraries with kilobase-sized fragments, and then actually ask, because that's the size that we'll start to see some of these multiple sites together and start to ask some of those questions as well. Tim's very popular, so you guys can hold office hours or something later and ask him your questions. We have to move on to...