 More general environment can affect gene expression and we're interested in how chromatin domains can form and how they're regulated and how they can influence gene expression and then At an even more global level how the genome is organized in the nucleus of the sample evidence from a lot of systems that where a gene is Positioned can influence its gene expression and this can be regulated I Don't have to convince you that the worm is a good model system But just to point out for for looking at chromatin regulation a really great thing is we have a very small Very well annotated genome in the worm and it's 30 times smaller than the human or mouse genomes And that makes high throughput studies a lot faster and a lot cheaper The count there's a complement of chromatin proteins really similar To human and that means that the things that we learn are applicable We have a lot of chromatin mutants a lot of temperature sensitive mutants, which are very useful RNAi we can access Turn off genes that will and now obviously with CRISPR. That's Everyone is embracing that but also it's a natural system. It's a multicellular organism And so we can address these systems and look at effects in development and not as opposed to a cell culture system Which are useful too, but we really want to know how these processes work normally So to start So we're studying transcription and I just want to start by telling you something We did a couple of years ago and there's a kind of dirty secret that people a lot of people don't appreciate if you're studying transcription obviously you need to know where promoters are where transcription starts and Until a few years ago. We didn't know where transcription started for almost any genes because of trans splicing It's the majority of genes are trans spliced in in C. elegans and the annotated Start of trans start TSS that's in worm base still mostly is not the transcription start site But it's the transcript start site the start of the mature transcript after trans splicing so what happens is If this is the true TSS let's say this is a gene of two exons and here's the primary transcript That's the five prime end at the end of that little black bit What happens is in trans SL one this trans splice leader usually SL one is trans spliced on So you get cis and trans splicing here's our mature transcript So that's the start that's in worm base and this little bit's degraded and you never see it And so we don't know where transcription started and so you can't really study transcription without knowing where the promoters are and The Karen Adelman's lab had developed this technique of where you could map transcription initiation In the nucleus she was using this to map pausing And I realized that we realized that we could apply this and then find the initiation site So essentially what you can do is isolate RNA from the nucleus That's short Let's the short nascent transcript a hundred nucleotides or shorter and they're capped because that's a hallmark of RNA polymerase 2 and Sequence those and capture those five prime ends before there's trans spliced off Before you have this mature transcript We also made libraries of nuclear RNA to map transcription elongation So we took the same preparation and made libraries that are longer greater than 200 So we could monitor where elongation was in the nucleus and this worked Really well. So if you can see here, here's some examples Here's a non trans spliced gene and you could see that this we now just map the first base of those short Transcripts on the little browser. So here we can see piling up. There's a start That's just where the worm base start is and here's one of the most like most genes that's trans spliced Here's our start that's upstream of the annotated start and in the long RNA We could even see reads here over this thing that's called the outron because it's like an intron that spliced out off So we could identify the start sites and we learned some really interesting things when we did this and first Unlike what you learn in the textbook Transcription doesn't initiate just at a single place in most places in the genome Most promoters are a broad cluster of initiation elements. So these are all windows of a hundred base pairs Long and these are clusters of initiation events. So 70% of the initiations genome-wide look like this little clusters where you have Very abundant ones and less abundant ones only 6% of the initiation sites across the genome are essentially a single base And these are actually ones tend to be have tata boxes So tata boxes also are very rare only about five or six percent have them and they tend to have more sharp initiation sites We were able to Clust when we looked genome-wide instead of we couldn't we we clustered initiation regions and called them TSS clusters or and so we have seventy five thousand basically Initiation sites genome-wide and we could assign about a third of them to protein coding genes and about two-thirds Didn't look like they map near protein coding genes and I'm going to come back to that in a minute If we look the other thing that was very clear If we look genome-wide is that we could see that most sites if this is the plus strand We had strand specific libraries. These are plus strand initiations. Those are minus strands. They generally were paired So we'd have a plus and a minus separated by about a hundred and twenty base pairs So we have bidirectional transcription initiation In most regions of the genome and we could see many initiation sites everywhere And that happened both at the ones that we could identify as protein coding and as these other initiation sites So if we looked at the initiation sites the ones that looked that we could assign as promoters have Chromatin features of promoters as generally having high h3k4 Trimethalation which is a home often found at active promoters and lower h3k4 mono methylation The ones that we that didn't that we didn't know couldn't assign as protein coding promoters more often had chromatin features of enhancers which had higher h3k4 mono methyl then h3k4 trimethyl and this suggests that maybe these were actually enhancers rather than promoters and so we decided to as a way to Address this we asked if these unassigned initiation events mapped to transcription factor binding sites at the modern code consortium had mapped Genome-wide a lot of transcription factor binding events and when we looked at that we found that These regions indeed these initiation sites indeed essentially almost always mapped to These transcription factor binding regions and we could and these had bi-directional transcription initiation So we have an active enhancer signature, which is transcription factor binding and transcription initiation And a similar definition is has been defined in for humans I think one thing I want to kind of point out is that most transcription factor binding regions across The genome initiate transcription we can detect initiation and also the majority of initiation in the genome actually occurs The opposite is true at transcription factor binding sites and if you put these together It you can conclude that transcription at these enhancer region actually accounts for a large fraction of non-coding transcription So when you see non-coding transcription People talk about this and detect it most of this in the worm genome And I think probably in all genomes is is actually from initiations that are at enhancer elements Just to now to show you a little bit about what these initiations look like we have different kind of patterns This is the simplest pattern. Here's a divergently transcribed gene that gene goes this way this gene goes that way And here's a forward the red and the blue are these two initiation sites So this these are the two promoters for these two genes and they're kind of sharing a little Nucleosome depleted region in the middle Then we can have this one It's a bit more complex where this is I think the core promoter and made probably some enhancer elements upstream We can have very complicated regions like this as well where we have a hundred initiation sites in a 30 kilobase region Then if you look at all of these across the genome, you can see that these are probably doing different things It's hard to imagine that all of these Initiation sites are just promoters and enhancers and probably you know these They probably all actually Correspond with transcription factor. There's a lot of transcription factor binding regions here, but I think they're probably not acting It's unlikely that they're all acting as individual enhancers and maybe this Regions like this could be Active maybe to generate some open chromatin that has some other property rather than specifically Enhancing transcription in certain tissues, but it's something to that we're interested in trying to understand What's the function of these very active regions? So by from all of that work, we could say that enhancers and promoters have similar properties They bind transcription factors. They recruit RNA polymerase to they initiate bi-directional transcription And we can also see they can drive productive elongation So what's the difference really between enhancers and promoters and what's the function of? Transcription at enhancers and we actually don't have the answers to those questions and things that people we and others are thinking about If you have for example transcription initiating and an enhancer and maybe having it productive elongation It could help to deliver pull to to a promoter if it's not that far away Perhaps if there was a loop then you could have a re-initiation here and that could help deliver In transcription through a region could alter new chromatin accessibility And I think a lot of regulation of gene expression has to do with altering a chromatin accessibility to allow Factors to bind but this could remove nucleosomes for example the process And also possibly enhancers could act as alternative promoters Maybe elements can act as enhancers in some context and promoters in some context and given that they look very similar I think that's quite possible and we've decided to kind of explore the potential of enhancers to at least have that activity So To try and understand what these elements might could be doing we decided to take these little Individual elements that were defined by transcription factor binding and initiation sites and act to ask if they could Promote transcription in a transgenic context. So we have histone GFP here No minimal promoter and we'll just clone individual elements in front of this and ask if they can drive histone GFP and So here's an example So this is a gene here and this is a promoter and some predicted enhancers upstream If we take this whole region, then we got this expression pattern in the embryo. It's not in every tissue here It's missing. This is the gut that it's missing from here What we found in doing a number of these is first a kind of a surprising thing is that the Proximal promoter and this that's very near the gene often drives the full expression pattern from this whole region And it's we don't know if it's as a robust but overall the the the spatial and temporal patterns look very similar The second thing we could find is about half of the enhancers that we test Can function as protein coding promoters if you just put them directly in front of histone GFP And when they function They're always Driving a subset of the full pattern of the gene. So this is a very typical example We can see that that these cells at the tip of the tail here this in this embryo are the same ones that are that are here And they're generally weaker expression and a subset of the pattern so that was Very interesting and we now want to try and what we haven't done is to read for example in this gene Remove these and see if they're actually necessary in this context One can imagine wonder what are these four? They're actually doing anything. Obviously, they're here I think they must be doing something to help either to make the the pattern more robust and his work from Mike Levine's lab Looking at shadow enhancers showing that some enhancers seem to function under Stressful conditions to kind of prevent to make sure a gene is expressed properly Here's an those were all upstream enhancers. We've done some intronic enhancers as well, which In in a normal context for example here It's possible that these enhancers could actually function as tissue specific promoters in a normal situation They're in a position it given that if you once you had transpicing you don't know where the the transcript had initiated This one can't it's in an intron and this one also drives a very nice pattern This is the actual the known expression pattern for this gene bro one The other thing that we've tested is given that these elements have bidirectional initiation We asked what happens if we invert them? Do will they work in both orientations and here's an example of from an enhancer and The cases we've tested so far have all worked in both orientations But they've driven slightly different expression patterns in the two orientations So this is just a kind of color coded for a number of tissues here And you we could see that it was in a sub some tissues in this orientation and a subset of those tissues and weaker in the other orientation and But we never we never have completely different patterns in both orientations and we're now Expanding this so we can test hundreds of elements to try and learn some more principles where we have more examples This is very low throughput as opposed to something you may have seen papers where people doing high throughput assays and enhancer and promoter function Which are fantastic, but those are having to go in one cell type Rather than the thousands of cell types that are in an animal so we can't get that throughput in this assay But basically that enhancer or promoter will go through every single cell type that Of the animal and we can ask it will definitely be in the one that it would normally be active and The the enhancers that are inactive and also the ones that are active. We're asking Compare their activity as a promoter versus an enhancer and we have a new minimal promoter construct to do that and then we use genome editing to test things that we learn to test hypotheses in vivo Okay, so just Alongside this work in order to understand all the regulatory principles We need to know where all the regulatory elements are in the genome And so we've we're nearing completion of a project to do genome-wide identification of regulatory elements across development and this involves Taking advantage of the accessibility of regulatory elements to nucleases so Regulate when transcription factors Bind or also it promoters that they're generally nucleosome depleted for example here Enhancers and promoters both are usually nucleosome depleted and then they're accessible to digestion By from DNA swan or micro cockle nuclease and more recently you probably have heard of a taxi We're uses TN5 transposition and if you apply these to Nuclei that still have Nucleosomes these little for you can either isolate the little fragments of DNA generally that are released or the nucleosomes and Then sequence them and then you can see a little peak for example This is an example where these fragments are isolated and then you can see you have a hypersensitive site And that will show you where the regulatory elements are So we've done across the six developmental stages in the worm a Taxi to find this is very more sensitive method than DNA swan Hypersensitivity mapping, but we've also done DNA swan mapping I'll show you a little bit of data from that at different concentrations because that uncovers different information We've mapping transcription initiation at the different stages so we can map promoters Looking at nuclear RNA Profiling so we can look at where transcription is elongating in the nucleus and mapping a set of histone modifications And so most we have most of these data now So here's an example of the attack seek at the different developmental stages. You can see we see development We've have 31,000 elements that we've identified across the genome. Here's some embryonic Peaks that are in embryos and essentially not in the other stages Here's a peak that's in probably an active promoter. That's in all stages And we see about 5000 of those 31,000 are only in one stage But about two-thirds of them show develop of these peaks show developmental regulation and I just want to point out this nice This DNA swan if you apply this at different concentrations at low DNA this is a classic DNA hypersensitivity mapping you see a peak of Where It's cutting where you can isolate those little fragments. That's very similar to a tax seek But if you have a little bit higher DNA's Concentration this middle bit is digested away and you start seeing the plus one and the minus one nucleus Ohms that flank this hypersensitive site and the red are promoters and the blue are enhancers so we can see that the promoters have this characteristic very Labile Nucleosomes that flank the hypersensitive site where enhancers don't and then if you have high concentration of DNA's it digests completely away The hypersensitive site and these flanking nucleosomes so we can see promoters have a much more depleted They're much more accessible than the enhancers. So we use this also to annotate these elements Okay, so the second thing the thing I want to spend the most time on is looking at chromatin domains and how zooming out a little bit from enhancers and promoters and see think about how if there's a an organization to the genome of a Domain organization what kinds of genes are there? What how might active or inactive regions of the genome be formed and be regulated? So we know that chromatin activity we know that histone Tails can be modified and that these modifications Can are associated with different with different levels of gene activity even if we don't know what the functions of a lot of Modifications are and that you can describe activity by patterns of marks for example H3K 36 Trimethylation often marks the transcribe genes the bodies of transcribe genes eight that the Trimethylation trimethylation of lysine for marks often marks active promoters in H3K 27 Trimethylation often max marks in active chromatin, and I'm going to talk about this a bit more This is put on by this classical complex a polycomb PRC to complex And we've done a lot of mapping of it through modern code and also outside of modern code of a lot of different histone modifications and Factors to try and understand because knowing where a factor is bound and the patterns can Tell you what that factor might be doing So I think it's important to have a description of this before you start I'm just trying to understand what what might be happening in the mutant So we do a lot of this mapping to try and uncover patterns So to try and apply What's This to in a developmental context and looking at and how chromatin might be regulated We decided to compare the chromatin at two different developmental stages So Susan Strom's lab had through modern code had been mapping chromatin of early embryos Which are the sample was undifferentiated dividing cells very early embryos and we map with the chromatin of L3 larvae which are about 85% of the larvae were differentiated Cells and about 15% are mitotic germ cells and in parallel We use the same antibodies and we map the same histones or histone modifications here to So we decided to then use these to see if we can look for similarities and differences To try and summarize the data we decided to do chromatin state mapping Which is a technique pioneered by Bass von Steensel's lab and it's a way to summarize data if you have a lot of different Patterns so each of these little tracks would be an individual factor And you can see when you see black where you have signal and you can see some of them have shared Patterns and some look different So if you look across and you can have make a hidden mark-off model that takes windows and keeps looking across here See the windows it doesn't have a particular window size. It finds it Then you can summarize and say I look it will find reproducible patterns across the genome and in this case Say this pattern eight assigned to a certain number, but here they've now called it black for example So they caught they just they assigned colors so across the genome whenever you have black It will be a certain similar combination of factors or yellow or red And this is very useful because then when you go back you can find certain colors would then or Combinations would associate with different features in the genome So we applied this using our 17 histones or modifications and here's the chromatin state map for L3 autosomes we have 20 states and After we could annotate where these are on the genome We can look and see that the state one is where the promoters were located We can see transcription elongation states because these were on the bodies of genes associated with introns or exons enhancers certain states date eight nine and ten and then these states at the bottom that we've put at the bottom or the States that are more associated with gene inactivity the polycomb associated marks That are enriched for a 3k 27 trimethyl here. You can see or a 3k 9 which is also a mark of gene inactivity So we've ordered them to put At the top five the states that are the most highly associated with the genes with active genes the genes that are in the top 20% of expression Most associate these states are most associated with those and these 16 to 20 are most associated with genes in the bottom 20% of expression and then here we have all the other states because we wanted to see if there were Gene activity domains in the genome and look for patterns And what we found here now we have the active five states here Here's here's the genes across the genome here the active five states And it's very striking that the active states all form clusters across many genes here You can see these large clusters and the inactive ones also here The bottom five the inactive five and then the other ten were kind of interspersed and across the genome we found These extended regions of either active genes or inactive genes and these were larger Then you would expect by chance these groups of of states So we decided to subdivide to call domains based on these So here the domains were orange I'll be calling as active domains or these are from the highly active states and the black Correspond are called based on the inactive genes here And then we have the gray zones are the border regions between the active and the inactive domains I want to point out the domains. I'm going to call active and inactive, but they're not Purely active or inactive and they're not uniform and activity They're just called based on these states which are associated with a particular kind of activity And so 11% of genes that are in the bottom 40% of expression very lowly expressed are actually inactive domains And also 20% of genes in the top 20% of expression are in the inactive domain So we have active genes here for sure, and we definitely have inactive genes Lowly active genes here. These are just called based on the states Now we repeated this in the early embryos and it was very we very strikingly So first we saw the same sort of pattern, but strikingly the patterns were almost identical between the early embryos and the L3 larvae. I'll show you that Here a bit to put getting rid of the states and lining up another region We found that 85% of the border positions are in common between early embryos and L3 larvae and 90% of the base pairs if you just look at the base pairs and the active in the inactive Regions were in common Despite the fact that they don't have cell types in common early embryos versus L3 larvae are completely different Almost completely different cell types So we this we can conclude that these chromatin domains are relatively stable across development and cell types Now we looked at at what modifications might be associated with these. It's very striking that There's a very close correspondence of active domains with H3K36 trimethylation and of inactive domains here with 8 H3K27 trimethylation and Even though there are some differences that these patterns look very similar in early embryos and L3 larvae So to then we decide to investigate the properties of domains and see what if we can learn something about What's inside them and how they might be formed and all these plots will have on the left the inactive domains Then this middle. I don't hope you can see very faint lines these two in here That's the border domain. That's between in an inactive inactive and then on the right are active domains All lined up in an aggregate plot across the genome So not surprisingly RNA polymerase 2 is high is much higher in the active domain than the inactive domain We see histone H3 is depleted at the transitions from active to border and border to inactive Very strikingly we see an enrichment for transcription factor binding sites and Enhancer chromatin states in the border regions and I'm going to come back to that in a few minutes So what about the types of genes that are in these domains? We found again not surprisingly ubiquitously expressed genes are mostly in active regions genes that we classified as silent because they were basically in not detectively expressed at any stage Those are in inactive domains, but quite strikingly genes that we could Annotate as germline expression We're in active domains and not only that that most genes in active domains. We could annotate as germline expressed and at least in the best annotation that we could find for using maternal gene expression 85 percent of genes in that set are in active domain genes are Have maternal expression and it's likely to be higher because this the annotation is likely to be not complete Maybe that every gene in active domains are mature are expressed in the germline Okay, so we we're getting some idea about what these are We want to look a little bit more about what types of genes might different types of genes might be here and to do that We turn to this type of analysis using the coefficient of variation of gene expression. I want to explain what that is Essentially, it's a measure of how a gene expression varies across development and cell types So if you have a gene that's expressed in every cell type at every stage The variation in love gene expression level won't change very much and that will be like this This is RNA seek it say all the different developmental stages This gene is expressed at a very similar level at every stage So this is very low variation The CV is the standard deviation in the gene expression level over the mean expression and this will have very low Variation and here we have a gene that's developmentally regulated So by definition a gene that has a high CV value We can annotate as regulated because you can only have a high CV if it has high expression in some cells or tissues and low Expression and others. So this is a good example Very low expression in embryos and L1 and L2 larvae. It's high and then it goes down again So we annotated gene we took genes which we could first say were significantly expressed in at least a stage So we're only looking at genes that we know we can detect expression And we took the bottom third of CV values and called the annotated them as broadly or stably expressed And then the top third of CV values Genes like this and we could call them then developmentally regulated genes And we asked where these two types of genes and very strikingly the active domains almost so the Broadly stably expressed genes were predominantly found in the active domains But this very striking thing is the developmentally regulated genes are almost all in the inactive domains Just and these genes are all detectively expressed in in the RNA-seq data So we can say these two types of domains active domains have broadly expressed genes They're generally almost always expressed in the germline and these are marked by HVK 36 trimethylation inactive domains on the other hand Contained developmentally regulated genes genes I say silent But obviously are going to be expressed at some time that we have probably haven't worked out what the stage is so these by probably we could define them as Regulated it's in some level could be not maybe not developmentally maybe environmentally regulated But and these genes in inactive domains are marked by a 3k 27 trimethyl and one very striking and very interesting thing is that a Paper last year from the Guigo lab found that Developmentally regulated genes fail to be marked by HVK 36 when they're expressed by looking and that's looking mainly in Drosophila and so Again that that's very similar to what we're finding here is that genes these genes here Despite this this HVK 36 being a mark of a transcription elongation It doesn't seem to be always found on During transcription elongation if a gene is regulated on it and there's a base for example temporarily regulated It doesn't seem to acquire a 3k 36 in Drosophila in this pattern in this paper and essentially we don't see that either here Okay, so this association of germline gene expression and and An HVK 36 trimethylation let us to look at the relationship between these domains and mess for which is a germline specific HVK 36 Methyl transferase that shows genetic interact that shows interactions with the polycomb system Which is where the inactive domains are a 3k 27 trimethylation? So to just give you some background there to a 3k 36 system methyl transferase is in the worm Met one which is a set to family transcription coupled enzyme that puts Co-transcriptionally puts down k-36 travels with Arnie polymerase Associates with the C-terminal domain of Arnie polymerase and mess for which is a different type an nsd family Methyl transferase its transcription in activities transcription independent and its germline specific and Susan Strom's lab has shown This marks genes that are transcribed in the germline with k-30s if they have k-36 it continues to put on k-36 That mark is inherited by the embryo and mess for is also Provided maternally to the embryo and mess for carries on putting this mark Transcription independently until it it runs out in the middle of embryo genesis So it epigenetically transmits the memory of germline a 3k 36 marking to the progeny So what she found is that here? We have a germline expressed gene it has a 3k 36 trimethyl Here's a in the germline and inactive somatic gene marked by a HVK 27 trimethyl and in the mess for loss of function The germline genes acquired a 3k 27. They found trimethylation. So mess for seems to inhibit PRC to and this is Not surprising. It's known that the HVK 36 mark is inhibitory to PRC to Biochemically so we decided to ask if the activities of mess does the activity of mess for have a role in domain definition using data that they generated So they did mapping of K3k 36 and a 3k 27 trimethylation in wild type and mess for Mutants. So here's the wild type data. Here's our domains and you can see here's the K 36 trimethyl and the HVK 27 trimethyl and wild type and in the mess for mutant if we look at where our domains are you can see the HVK 27 encroaches into the active domains here You can see another example here and the active the K 36 domains get a bit smaller. There's we still see them This is a an RNAi. So this is not a complete loss of function, but these domains shrink and the The K 27 domains get larger if we look genome-wide in a heat map if we point you to the K 36 These are all the the act again act in active regions border active regions You can see in the here There's a lot of K 36 it goes from the active region into the border and now we lose all of that here And the active region gets smaller here in mess for Mutant RNAi and we see the K 27 Billing out into the active regions and so based on this we can say that the germline chromatin Organization since mess for is a germline specific enzyme is helping to define the domains and isn't Important in helping to organize these chromatin domains Just a little aside because I think it's just really interesting We don't know how what what the basis of this regulation is as we see Some very striking remodeling of chromatin at borders region So this is the inactive region border active and this h3k 27 mono methylation. So the single methyl group We find is very has a very strong peak right in the border region And it's in it's low both in the inactive and the active regions It's a very specific to borders and this has changed dramatically in L3 larvae Now we don't see the peak right at the border, but we gain a lot of h3k 27 model methyl now in the active region And as they say we don't know I What the what the function of this is but it's a very striking remodeling that we're gonna look it into at the same time The levels of K 36 trimethylation in the border regions I don't have a slide to show you this drop actually from early embryo to L3 Okay, so I just want to come back to now looking at the intergenic regions that I told you have At the border regions, sorry The transcription factor binding sites and enhancer chromatin states are enriched in the border regions relative to the inactive or the active regions And so we wanted to explore What might be different about border regions compared to these others and Since a transcription factor binding sites and enhancers are transcription regulatory elements generally and these often are found in intergenic regions We decided to compare intergenic region length to see if they look different in border regions compared to inactive or active regions and Indeed we found that border the intergenic regions in border regions in Borders are significantly longer than those in the inactive domains or in the active domain so this is a killer basis here and There's also striking that the active regions have actually quite short intergenic regions compared to those in the inactive regions if we look at intergenic region length in In a borders we also find that they have the highest density of Enhancer read enhancers based on transcription factor binding sites then compared to inactive or active regions and this suggests this position of Borders between active domains and inactive domains and this And this association with transcriptional act regulatory elements suggests that transcriptional activity might contribute to the domain separation This is something we're trying to test now So to put all this together in on a model if we have all of these gray boxes are genes for example in the genome We can see that mess for in the germline it can put down h3k 36 Trimethylation mark on germline expressed genes and these are also genes that tend to be broadly expressed This mark will then be inhibitory to the prc2 complex Which then can mark other genes that are not marked by k36 trimethylation in the germline with with this H3k 27 trimethyl and these are would be the developmentally regulated or regulated in other in other contexts and lowly active genes And we now are Wanting to do this like we have this data from Susan's drones lab and mess for mutants We now want to do mapping in a prc2 minus mutant to try and test this model In addition we found that besides this interaction between a mess for and the prc2 that these Transcription fact that we have longer intergenic regions between act the in the inactive domains in the active domains and Higher density of transcription factor binding sites and this may also contribute to the separation of domains So how does this relate to other organisms? So you probably are familiar with hearing that that polychrome system and how it's marking Hawks genes and other developmentally regulated genes in Drosophila and helping to maintain Gene expression Repression a memory of repression, but prc2 can also put down the H3k 27 Monomethylation and dimethylation mark and these have been less studied and if you look at k30 k27 Trimethyl in Drosophila it doesn't form these big blocks that we have observed in Susan Strom's lab and gados at all had pointed out here These blocks of k27 across the genome that we see in the worm But very exciting the and interestingly in the mouse in Drosophila very recently People have shown that there are extended blocks of h3k 27 dimethyl that look very similar to the trimethyl that we see in the worm Genome that are anti correlated with h3k 36 trimethyl here You can see in the mouse and in Drosophila they mapped h3k 27 dimethyl above the line or long large blocks and this is anti correlated with Paul to where they were looking at Paul to Where you have low k27 dimethylation you you generally see Paul to they haven't done it k36 in this example So it may be that this separation into Domains of h3k 36 Methylated regions and h3k 27 methylated regions might be that's be similar in other animals Okay, so the last few minutes I want to Tell you about how we we'd now like to understand how these chromatin domains that we define based on marks relate to 3d organization and To see to try and to get some handle on how the genome is organized and also to try and understand How maybe enhancers and promoters can interact with each other We've decided as others have done to turn to these C methods that job Decker has kind of pioneered And where you can these are chromatin interaction methods if you apply for fix nuclei and Then digest the new these nuclei And then if a region of the genome that are very far apart say in linear space are close together and like and Fix together you can then ligate them after you cut them and religate Religate them and then you can say it's a way to tell what regions of the genome are in close proximity in the nucleus by having them ligate together and I don't have time to go through how all these methods work But the if you apply these methods on us on a smaller scale You can get very high resolution interaction and say this region of the genome the promoter Interacts here with this region where there's a peak and so this is a say enhancer promoter interaction And if you apply it genome-wide people have found these topologically associating domains And so you can have domains within which interactions occur but you can't really see any specific interactions here because the the the drawback of these methods is That the resolution is very low So if you want to look genome-wide which we do to compare our chromatin domains and the 3d structure is these maps are generally you have to Bin all the interactions in a in a window of either 10 kilobases to 100 kilobases because the data are very sparse and The average gene in the worm is five kilobases. So we can't really get a very much We can't really get very far if we have to bin have a 10 kb bin and we already have our genes That's already two genes there and the domains. I'm telling you about I've told you about are generally around 15 kilobases So this is not going to work for us So we've been working to develop a more streamlined high resolution high C method using DNA instead of restriction enzymes So that we can cut in a more precise Location and I don't have time to tell you exactly how this works But I want to show you some results and I'm going to skip over this in the interest of time to here to show Validation so in Barbara Myers lab has generated one. There's one high C map for C. elegans And she was used she applied this to study dosage compensation Because yet the dosage compensation complex down regulates gene expression on the C. elegans X chromosome And that leads to a change she showed in the structure of the X chromosome and when they applied this method they could find these little domains across the X chromosome these boundaries between these little domains and the these Can these dosage compensation proteins were tended to be enriched at these border these boundaries between these domains So that that's these green things here They also found there was a statistically significant enrichment for interactions between these sites Which are called the rec sites so that where these complexes bound so they have this little peak So that these sites tended to interact with each other So we used to compare our data to their data Our data is this yellow and black at the bottom and we could see these same boxes that they could see here And we could call these same essentially the same board boundary position so that we are very happy about that and Then in comparing looking at these interactions between domains, this is when we wanted to look at our resolution We're now able we take our data and we can instead of binning at 10 kilobases We can look at 500 base pair bins and we can look at the data and here's a little 200 kilobases of the chromosome X and this is the data from the Meyer lab Each of these little loops in black is an interaction that they reported in their data and in red are their rec sites That are statistically significantly enriched for interactions So you can see it in the data overall, but you can't actually visually see these interact these in this enrichment for interactions but in our DNA high C we now can actually see these interactions so the resolution is much At better and our signal noise is very good So you can see very what nicely these interactions between rec sites that they report So We're very excited to about this and being able to apply this To mutants to try and understand how the genome is organized We've done some replicates now and we can see the replicates if we now look at a linear plate We can look at these are all interactions between 500 base pair bins and then they look very similar We have 77,000 interactions that we can call 83% of the interaction ends overlap a chip seek transcription factor binding sites so and these are usually enhanced so they're enhancers and promoters, so we think these are These are identifying enhancer and promoter interactions 64% of them overlap a transcription initiation site and here the median interaction distance is 10 kilobases So now we're just starting to look at how these interactions relate to our domains And we haven't done any quantitative analysis of this yet, but we can see that there is an enrichment for Interactions to stay within domains so 80% of the interactions that are stay either within active domains or within inactive domains and you can see these nice Examples where this active domain has this very high Interactions across skipping this inactive domain here and this within here But we also and here a lot of interactions that are in black here stay within the inactive region But in blue are interactions that span they go from active to inactive and so we definitely see interactions like that too And we now need to spend more time We're busy analyzing these data, so I can't tell you more about this yet But we're now gearing up to because we have access to so many chromatin mutants to then do a whole and because this Procedure case very quick to generate these high C maps in a lot of different chromatin mutants, and then we can ask which Factors are required for particular types of interactions between promoters or enhancers or for generating particular kinds of domains So just to summarize In the beginning I told you about mapping transcription initiation sites and that we found that a large fraction of non-coding transcription occurs at enhancers that enhancers and promoters have Functionally similar and that they both initiate Transcription bidirectional transcription and that enhancers can function as protein coding promoters And then that the genome is organized into relatively stable domains of inactive inactive domains marked by hvk-27 trimethylation containing regulated genes and domains of relatively active Transcription that contained broadly in germline expressed genes marked by hvk-36 trimethylation and that the main definition Involves some germline events because mess for germline histone methyl transferase Regulates these domains and that the borders also separating the domains have large transcription regulatory region So there may be a role for transcription and helping to separate these domains So to tell you who did the work The transcription initiation work that was I told you about at the beginning was mainly done by Ron Chen And it was a collaboration with Thomas down the enhancer promoter functions was done by Chiara Ron Ron is Chen as well Carson and Eva The chromatin domain work was mainly done by Ni Huang And Kenneth Evans and Mike Chesney and Shemeck Stempoor also helped Then the mapping the regulatory elements Alex Yan and Jurgen contributed to that and finally this new doing this new high C method It's been done just by by Ni Huang and Student Wei Kang so and I'm happy to take any questions. I think Melissa was first I Yeah, so currently we only have data from one time point So this is data that was generated by Susan Strom's lab and they just did early embryos So that's something that we'd like to do to see when the zygotic when we get Met later in embryogenesis met one becomes more active and that we can see if that if we get some Change and we really want to do that as alongside the the PRC two components They're all essential. So it's very hard to get of these large quantities So they did this by RNAi and we are trying to There is a TS mutant in one of the PRC In mess three that we're seeing if that if that is a strong enough effect or there's a new oxen degron system where you can Try and then tag it with the AID and then and maybe in large scale we can have a Depletion we're trying to do that Yeah Besides the chip 36 try what's the evidence of the ceilings that mess family SD to family proteins I think the Honor But based on immunofluorescence Susan Strom's labs done. It's all antibody Basically the antibodies both she my lab and her lab have spent a lot of time at least trying to validate them using peptides So that they look where we have antibodies are at least specific based on peptide binding and also some other competition assays We don't have any evidence that the triantibody recognizes die, so if we do at least pep by peptide Arrays, we don't that there's no we can't detect an end The the the triantibody binding to die, but that isn't proving this and we haven't done mass spec Which would be a good thing to do and actually probably we should Yeah, do that to really be sure of which modification This is So worm ceilings don't doesn't have ctcf it does have cohesin and And the ctcf is often has has cohesin at the ctcf sites We have actually mapped cohesin in in the worm And we do see it enriched all through the active domains and particularly enriched at the edges of the active domains But we don't have any transcription factors that we have found yet that are particularly Enriched there, but what we're doing we're doing analyses bioinformatic analyses now of the border regions to see if we see any motifs very Specifically enriched we've looked at for a trend in the transcription factor binding Set data sets that exist We haven't found any factor. That's really it seems to be very highly represented What we see is most transcription factors are enriched at border regions And there's no put this some that are a little bit more a little bit less enriched But we're gonna start to have a project now to look from those the sequence point of view To see whether your high C data shows relationship between some of the factors and You know the board of the institute is driving But beyond that what I found interesting is that when you see divergent transcription And this seems to be to another organisms as well You see a different profile for things that are transcribing the Into the active gene according region versus those that are divergent in transcribing away making non-quoting. So you mostly don't diffuse Profiles So if we did you this is something that you observed That's okay Sorry this Beats That's right Yeah, so if we line them up so that the initiation sites tend to be a bit stronger So if you take protein coding genes the initiation sites are stronger in the forward direction than the reverse although they tend to have one in the reverse orientation Well the reason why they look sharper is we're aligning them on the forward strand and that distance isn't exactly 120 bases It's on average. So that's It would look the see it would look the same It would be sharp then on the reverse one and brought on the other one But what we do see is that the anti-sense if you take tandemly Transcribes so genes that aren't where you don't have a protein coding gene on both directions You just have a gene in one direction the anti-sense RNA. We don't see any Productive elongation in that direction so we can see initiation very strong But in the in the nucleus we don't see any evidence it must be if it's made it's just extremely unstable We can't detect it So we don't really know what why one direction is productively elongated the other isn't in other systems people there's evidence that splicing sites can promote elongation and transcription termination sites can stop elongation and that you have more in Termination sites in the reverse direction. We actually don't see an increase in termination sites in the reverse direction but we have seen some this I think an increase in perimidines just in a region where it would be Stopping and we're looking to see if there is a real signal there Were you about the transcription start sites you showed an example of a logis that had a hundred or hundreds of Start sites and some of those were within introns Transcription start site So that's a bit hard to answer so bought in code from Bob Waterston's modern code group mapped Did a lot of RNA very deep RNA seek and if you sequence if you take all those huge billions of reads They can detect at low frequency SL1 to almost every exon So so it's actually hard to know what the frequency and if that's important or not No, we do sometimes and those in those situations a Smaller transcript has been annotated so when so but generally we do see a lot of a lot of Initiation sites and transcription factor binding sites in introns and they don't always correspond to promoters So the first question is really interesting and we haven't done it yet is trying to put different elements together to see how they Might interact and produce different sorts of patterns or interfere and we're doing that now What we did in our initial study is we had We just took 500 bases because we were taking we're centering on the peak And maybe about a third of our data is from that but after the initial results and looking more at the these initiation sites that we are now just going from the We're just taking elements that have bidirectional transcription initiation and just From initiation site to the initiation site in the region in between and those are usually but they're between a hundred and twenty and Say 300 bases, so they're very small just these little tiny elements just to find by bidirectional transcription initiation Well by definition if it's a protein coding promoter then it's It's transcription at a protein coding gene, so the enhancers We see transcription at enhancers and that's generally we see doesn't look like it's as we see some productive elongation It doesn't elongate very far. I'm not sure if I'm answering your question Oh That's right You mean if if you see transcription and enhancer do we see the it's it's interacting promoter Transcribe we don't because we're just starting to map interactions between the enhancers and promoters. We haven't looked at that But what we can say we can see The enhancer we can generally see the enhancers associated with the promoter because we see elongation across enhancer regions, and then if we measure Transcription at the enhancers and the nearby genes that's strongly correlated So we have strong positive correlation between transcription at upstream enhancers and the protein coding genes downstream So that that definitely is very like point seven correlations quite strong Thank you, Joy. It is this month I've been 25 years