 All right, I was saying you probably heard me say already that I'd like to thank the NHGRI for organizing this meeting It's it's sort of bittersweet in a way. It's been great to work with everybody in the mod-encode project on the NIH side And also all these scientists. I'm sure we'll work together in the future but it's nice to have a meeting like this to sort of Put everything in perspective at the end So today I wanted to talk about how specifically about how the mod-encode data the mod-encode chromatin data might inform us about human chromatin and I wanted to sort of start off by saying that there's no question. I mean, there's no question about the fact that These model organisms in particular flies and worms have informed us about The function of human chromatin. I think it's safe to say that most of what we know about the function of human chromatin Was learned in model organisms. I don't think that's that's really in doubt. So that's sort of the baseline From which we're working specifically in flies You know, there there are people sitting in this room Sally who you know discovered or characterized DNA Hypersensitive sites in flies in late 1970s In early 1980s a polycomb try thorax heterochromatin protein one all these things that we know about Were learned about in models and that continues to be the case today And in worms, of course RNA I micro RNAs a lot of things that are associated with chromatin Were discovered in worms as well Okay, so We have generated a lot of data in the mod-encode project We've just to just to give you an idea of the types of data that are out there For worms we have a total of 291 data sets that mostly over three developmental time points. So these are whole animal Experiments mostly early embryo third larval stage an adult although there are other stages and We've done 133 different profiles of 30 different histone marks 147 profiles of 72 different non-histone chromosomal proteins a couple of which I'll talk about today And we took a we took a strategy of generating antibodies to these proteins So we generated a total of 451 polyclonal antibodies 288 of which were validated by at least one assay that that meets the consortium standards. So that's immunofluorescence or Western blot and a subset of these were the ones that were used for these chip experiments. So they work in chromatin IPs as well and most of this data was from from our group and Gary's group Along with I think this includes some data from Dave Have 601 fly data sets mostly over three cell lines So they have the advantage of having cell lines which are not available in worms plus some whole animal experiments and embryos larval stages and adults as well, and so they have 288 profiles of 28 different histone marks and 313 profiles of about 50 different non-histone chromosome protein. So we have a Fairly compatible data set in terms of scope and size But I'm not going to talk about how we can use all this data together We've already heard talks about that From Mark Gerstin who talked about how we can use all this chromatin data to try to predict gene expression states For example and build networks and from an oliskelus about how we can use this Data to create states. So how combinations of these marks work together to specify biological function Instead what I'm going to do is Just talk about two stories that I think led to unique insights that that were were generated by the mod-encode project and how the mod-encode data might be used by researchers to Really translate into human knowledge Oh great Thank you All right, so the first has to do with centromere specification So how centromeres are specified centromeres are obviously the DNA sequences that allow chromosomes to segregate during mitosis and the second has to do with interactions between chromosomes and the nuclear envelope and and in both C. elegans and And how that might relate to to human disease Okay So the first thing to know about centromeres is that centromere DNA is not conserved So you might think that this is a horrible place to look to use model organisms to study human biology in budding yeast For example the centromere is only 125 bases long. It's it's exactly the centromere is one nucleosome wide Whereas in fission yeast it can be from 40 kb to 100 kb fruit fly It's 400 kb and and there's a lot of repetitive sequence involved and humans can be up to five megabases of repetitive DNA mostly alpha satellite but also other repeats But the interesting thing is that this alpha satellite DNA is neither necessary nor sufficient for centromere function So whatever whatever is specifying centromere? Activity is not conserved and is actually not even necessary or sufficient for its activity And these repeats cause a lot of problems in studying human centromeres, right? Because repetitive DNA is just very difficult to get a handle on So C. elegans along with some other insects and plants have a different strategy Which are which is they're called holocentrics, which means they don't have a point centromere the centromere is essentially the entire chromosome And in worms this as you'll see allows us to study Centromere function because there's we can get around all these repeats So the what is what is conserved among all these organisms are the components that make up the centromeres so Kinetic cores, which are the protein component of the segregation machinery Assemble on chromatin containing a histone H3 variant called Sempe So this is just an alignment of Sempeh with H3 and basically if there's a lot of variation in this immunoterminus and so This Sempeh molecule substitutes for histone H3 in the altered nucleosomes at at the kinetic core So in C. elegans you can see that the entire chromosome actually acts as a centromere It's it's coded by these kinetic core proteins And this is this is Sempeh in red in Drosophila you get sort of a traditional kind of point centromere with Sempeh incorporation there, right? I'll also just mention that Sempeh and its and his chaperone Are overexpressed in many human cancers, and if you overexpressed Sempeh Or chaperone in flies you can you can get ectopic kinetic ectopic kinetic cores and miss miss aggregation of chromosomes and aneuploidy Okay, so so how does it look just to give you an idea so here's the DNA over here, and this is the centromere There's a structure called the interplate and the outer plate The centromere chromatin is thought to be arranged like this where Sempeh is Incorporated into the DNA and patches that are then looped in a way that presents them to the outer Surface of the interplate, okay So so there's there these domains But they form sort of a continuous surface when they're arranged properly and a big question has been how do you? Propagate this organization from one cellular generation to the next or in an extreme case in the first very first division In an early embryo and the idea all it has been that you do it by looking to see where the old Sempeh is and wherever there was Sempeh before you just put a new batch, right and so after DNA replication There's Sempeh some Sempeh associated with both strands and And then you just incorporate new Sempeh sort of where the old one is to fill in the gaps This has been shown to be uncoupled from DNA replication itself It occurs late in anaphase and early in G1 in mammals in Drosophila and it's known that this Process whatever it is requires special specialized machinery, which is also conserved Including a protein called K and L2 which I'll mention in a second again now There's some evidence that this old to new strategy may not or Hypothesis may not be entirely accurate because there are some very interesting cases all of their rare of Neocentrum neocentrometers that occur in humans So for example, this is a two-gen these are chromosomes from a two-generation family in which This is a chromosome for and normally the centromeres here and you can see that the alpha satellite DNA is still here But the Sempeh is which is in green has moved to a new location and this they're perfectly in this family is perfectly normal They have their chromosomes segregate normally, but the centromere just moved right and so this this happens occasionally And in addition to that centromere repositioning is common among very closely related species so centromeres can move around even away from the repetitive DNA that's usually Considered to be a hallmark of centromere function, okay Secondly if we look in C. elegans The question is how do you how do you set up these domains in the early embryo? There's no Sempeh and C. elegans sperm Okay, and so this is true by microscopy So what what this shows is the sperm are over here and if we have a Sempeh GFP, right? We can see there's no Sempeh in in sperm at all and This is just to show that there's some background Fluorescence in sperm But basically there's no Sempeh and sperm but it's all in oocytes and we can do this by quantitative Immunoblotting to where we have recombinant Sempeh and we can titrate it down and show that with purified sperm You get basically no band and this would tell us if we had it So this can tell us that we have less than 300 Sempeh molecules per sperm so We can all there come there's also experiments that show that the sperm chromatin when so there the C. elegans oocyte is fertilized by the sperm and eventually the sperm and oocyte nucleus merge together and undergo the first division and What what has been shown is that so here's the? Wild type oocyte and if we look at the sperm Nucleus which is what we're looking at here. You can see that it acquires Sempeh only after it enters the oocyte cytoplasm, so it recruits Sempeh from the oocyte cytoplasm if Sempeh is depleted from the oocyte Basically, there's no Sempeh that gets in that that gets incorporated for the first division Okay, so Nios the the neo centromeres the fact that Sempeh is absent from Mature sperm and recruited from oocyte cytoplasm at fertilization and the fact that which I didn't show you that Sempeh is removed and reloaded In my at a pro phase kind of all are argue against This idea that old Sempeh is used to mark the position of new Sempeh So if it's not old to new then then what is it? So as I told you before the repetitive sort of nature of the human centromeres prevented Study of those by for example chip-seeker chip-chip analysis What a holo-centric chromosome allows us to do is that basically the centromeres are in you chromatic regions So we can actually study those so our shot to sigh who's a member of our consortium did Chip with Sempeh Which is the histone variant? And at the time we were using genome tiling arrays, but the data look beautiful and And here's just a representation so you guys all know the chip procedure we put them on very high density oligonucleotide arrays and And here's what the data look like so Sempeh. This is just shown as a z-score display of a log of log two ratios and you can see that there are these very discrete regions Where Sempeh is incorporated across the entire length of the chromosome There's a very broad distribution of Sempeh 47% of the genome seems to be covered by Sempeh And we also also chipped KNL2 which is the protein I mentioned before this conserved and the pattern is identical Between Sempeh and KNL2 so this independent verification of this Now this brings up and so here's the relationship between KNL2 and Sempeh, so that's good This brings up a point that was made yesterday in the panel discussion, which is what does occupancy mean? So how do you interpret this? signal right What our shot did again by using quantitative immuno blotting was showed that Show that given the number of Sempeh molecules and per nucleus Maximally 4% of the nucleosomes can be Sempeh nucleosomes So what that means is we but we enriched for 47% of the genome, right? So how's that possible? well What the reason is possible, and I imagine this is true for almost every chip experiment is that you're not actually chipping You know where the protein is in every cell right you're you're chipping regions that are permissive For the binding of that protein right so what this means is that if we see a region like this that we're calling Sempeh positive actually it's actually Incorporated at low density in that region Maybe one in ten nucleosomes in that region or Sempeh, but they can be incorporated anywhere within that window, right? So these are the kinds of experiments that are required to actually start to interpret what occupancy means and So what what defines these permissive regions? At first it looked like in embryos that it was simply a matter of of Basically an inverse correlation with RNA polymerase 2 and gene expression so for example This gene CLE1 is a gene that's not expressed. This is Sempeh data from early embryos This is a gene that's not expressed until after embryo after embryogenesis Whereas these genes are expressed during embryogenesis and it looked like could be the case that Sempeh is just Excluded from regions that are transcribed in the embryo. That's a very simple model for how you set it up, right? And we looked at RNA Paul 2 we got this beautiful inverse correlation between RNA polymerase 2 distribution and Sempeh So that was consistent with that interpretation But there are some puzzling aspects to this model that Make it seem like it might not make sense. The first is that there's no significant zygotic transcription until the 30 cell stage But in fact this organization is Present during all the early divisions. So how would you set that up? The early divisions are also completely normal if you inhibit Paul 2 RNA Paul 2 completely So that didn't seem to be consistent with that model And there's no and the other thing is that there's no change in Sempeh pattern during development from the from the early 8 cell stage To the to the greater than 250 cell stage even though gene expression patterns are very dynamic So these all argue against the simple model that polymerase is just kicking out Sempeh And the other places are retaining it so the clue to What might be going on came with when we started dividing up genes in in embryos and asking about the relationship between RNA Paul 2 and Sempeh and the The if we took different categories of genes you can just focus on this ubiquitous. We saw the typical Relationship I just showed which is that if you have a lot of polymer if you have a lot of RNA Paul 2 You don't have a lot of Sempeh, but there is one class of genes where you didn't have RNA Paul 2 in the embryo and you also didn't have Sempeh and These genes were genes that were expressed only in the germ line and not in the embryos right So this was a clue to what to what was happening and Here's an here's another example of these doubly negative genes So these are these are all genes these four genes are all genes that are expressed only in the germ line But not in the embryo and what you can see is that there's no polymerase in the Embryo and there's also no Sempeh in the embryo. So basically these are reasons of exclusion that also don't have RNA Paul 2 So the model based on this now was different, which was that it's actually germ line Transcription that defines the regions from which Sempeh is excluded in the embryo So we're talking about a truly epigenetic phenomenon now where the maternal germ line Defines the centromere positions essentially in the Developing embryo and so if germ line transcription was on Then it was not permissive for Sempeh if germ line transcription was off. It is permissive for Sempeh And we can test this model by driving ectopic expression in the germ line where it shouldn't be using this Mutant that is not required for fertility called met one and that Causes certain genes to be over expressed in the germ line This is one of them is three hundred and seventy fold over expressed And we can also measure H3K 36 trimethylation in the embryos which marks germ line expressed genes throughout embryo genesis And this was shown separately by Susan strome and Bill Kelly. So here's a gene that's Ectopically expressed in the germ line and we can and we can show that there's no RNA polymerase 2 at this locus in embryos right So then we can ask is this locus particularly depleted in Sempeh in the embryo and The answer is yes So here's the wild type Sempeh pattern and here's the pattern in met one where this gene is expressed in the germ line But not on the embryo and we can see that Sempeh is depleted from this region even though It normally would be if if this gene hadn't been expressed so there's many many examples of this that we can show in fact out of We were able to show over 75 of these that have this relationship out of out of a hundred and thirty two that were kind of classified in this category so So that sort of confirms that the mechanism by which regions permissive for Sempeh incorporation are determined occur by a Memory of what genes had been transcribed in the maternal germ line and there are many candidate mechanisms for this some of which depend on other modern code data that was generated by Which is h3k36 trimethylation data and also the distribution of mess for which is an h3k36 methyl transferase Which is also published and there's also an interesting connection possibly to this protein called CSR one Caesar one, which is an Argonaut and these 22g small RNAs so we don't know the mechanism yet, but It is a truly interesting phenomenon that would never have been possible to discover in human cells And furthermore, we don't know yet Whether or not this mechanism is relevant to humans But the fact that all of the chaperone machinery and the components of centromeres are conserved Make it at least a strong possibility, especially given the fact that you can get neo centromeres and so on without The the repetitive DNA that's often associated with centromeres Okay, so the conclusions are that Sempeh nucleosomes can be guided by cues that are not pre-existing Sempeh nucleosomes We have a truly epigenetic phenomenon not a Not just in name. So it's transgenerational epigenetic memory of gene expression that regulates histone variant incorporation And finally germline expression which genes are expressed in the germline might influence which sites are chosen for centromere repositioning during evolution So this is a just one example of a result that really required a model organism to get to the bottom of Okay, so now I'm gonna I'm gonna in the Last five minutes. I'm just gonna quickly tell you about a story that has to do with Mapping interactions between chromosomes and the nuclear envelope and Susan mango alluded alluded to this earlier And it's connection to a human disease Hutchison-Gilford, Virginia and other laminopathies okay, so C. Elegans is actually a nice model for for nuclear Lamina function because it first of all it has a nuclear lamina unlike Servicier so yeast don't have these proteins it has only one Lamin not to like humans and so it's probably the simplest Model for nuclear lamin chromosome interactions that there is Instead of lamin we use the protein called lem-2 which is in a nuclear membrane protein that interacts with the lamins and also with With with chromatin and did chromatin IP. So we have a you know, very nice antibody against Lem-2 and We got this really striking result that Susan Introduced which is that the arms of the C. Elegans chromosomes are kind of really highly associated with the nuclear membrane in the centers They're sort of looped out and so these are just representations of the chromosomes Also these large lem-2 domains consist of small subdomains So if we take a closer look at these regions at the arms, they're not continuous So they're these discontinuous associations and there are small gaps Within that break up the subdomains in these in these larger regions And here's just another kind of example of these gaps And a typical gap is about 12 kb or as a typical Subdomain, you know might be 60 kb and what we And so you end up with a picture like this The the x is a little different from the autosomes the autosomes are sort of plastered for the most part against the nuclear membrane But if you zoom in there are these little loops that emanate out from the membrane And what we found was that in those inside those loops by by by putting the data together So here's here's the lem-2 by array and here's the lem-2 by seek the data is totally consistent So if we take a gap like this and we look what's in the gap you see RNA polymerase 2 in the gap H3k4 trimethylation HTZ1 right so so basically you see all the marks of transcription in the gap And what we think might be happening is that these And we also show that the size of the gap is inversely proportional to the level of transcription So the smaller the gap the higher the level of transcription tends to be and so these little bubbles might be Wells to concentrate factors Maybe for promoting higher residency times and component recycling at the membrane but importantly and this is why the The connection is a little bit different in this case to human biology What this did was made us comfortable with working with nuclear membrane proteins And we developed protocols to work with nuclear membrane proteins and it inspired us to apply for another grant To the progeria research foundation to try to do chip experiments on human lamins and the reason I'll just say that this is these relationships are conserved among different organisms the reason that we apply to the progeria research foundation is that Hutchison-Gilford progeria is caused which is basically a It's more complicated, but it's a premature aging disease average lifespan of 13 years and the cause of death are causes of death That you'd normally associate with aging. It's a heart attack and stroke and so on. It's a rare disease mostly caused by spontaneous mutations and Those mutations occur in the lamin A C gene The human lamin A C gene and basically it's an activation of their dominant mutations Activating a cryptic splice site that causes deletion of 50 amino acids and interest this protein is farsalated And the mutant version typically retains Farsalation whereas normally it's proteolytically cleaved off so so basically this this these dominant mutations can cause a Deletion of the region that is recognized by a protease that would normally cleave off the part that's farsalated Okay, there's also many other Laminopathies as well and so We were able to successfully chip lamin A from human cells and basically this is as far as we know the first successful genome-wide chip of lamin A and Basically, we can show that it associates with active regulatory region So it overlaps with marks like H3K27 acetyl. So this is different than Lamin B, which is sort of the expected result and It's inversely correlated with the distribution of Lamin B. So we get clusters Lamin A association where Lamin B is missing and Of course the next experiment was to look in Cells that were derived from progeria patients that have progerin which is the mutant form of Lamin A C and for the most part the results are the same but we do identify regions of Chromosomes that lose their association with the membrane in in progerin cells And we found there's there's many many more regions that lose association Then gain association with the with the membrane which is a little bit surprising and we're also there's a drug currently in clinical trials that's a farsal transferase inhibitor for progeria and we're currently doing chip experiments in Cells that are treated by with that farsal transferase inhibitor the interesting thing about that drug is that it partially relieves the symptoms It relieves many of the cell morphological symptoms. It partially relieves symptoms in cell culture and in mice The clinical trials ongoing in humans, but it's clear that it doesn't rescue all the defects So it would be of great interest to know Which of the membrane chromosome interactions might be rescued by this drug and which ones aren't Okay So I just told you two stories one about centimere specification the other about understanding interactions between chromosomes and the membrane that I think really Model organisms made unique contributions to understanding what the situation was in human biology and This is the people who did the work So the Sempe work was completely or was done mostly in our shot to size lab These are the people in a redo in our shod's lab and Andreas and Susan Strom's lab really drove this Forward I was recently published in Cota, Ecigami and my lab had did the Lem2 and progerin work And he also has done some really nice work on nuclear pores that I didn't talk about And here are all the PIs and the warm chromatin groups the arranger Dernberg Desai And Strom labs and our data analysis folks are surely Lou and Aron Segal and The fly chromatin group which I didn't get a chance to talk about the fly stories But Sally Gary Mitzi Kuroda Vince Prada Peter Park who's doing a lot of the joint analysis and Dave McAlpine has a separate grant but Associates with these with these guys a lot. So thank you very much That was great Jason. I'm just wondering about how you think about the fact that worms are diploid Yeah, there's one set that came from the father one set that came from the mother So you have maternal germline expression you have paternal germline expression And I know it gets cleared off those the sperm don't come in with with SEMPE so it seems to me that either The domains that are mapped in the embryo are actually separate There's a separate set of paternal domains in a separate set of maternal domains Or alternatively that that in fact the RNAi Component is is significant and that it's maternal RNA That does the job on both the paternal and the maternal chromosomes. Do you have any? Yeah, it's really intriguing There has to be some sort of trans factor right that acts on the sperm Chromatin to specify the maternal expression state and so we don't know what the mechanism is but What you're saying is true either there's I don't think that there is probably not the cake My guess is that there are not two separate states and that there's some trans factor in the oocyte That specifies the positions on the sperm chromatin but it's difficult to see how you we can distinguish the Paternal and maternal chromosomes it maybe you can use polymorphisms. Yeah, we're doing some stuff with interspecies crosses that might be useful for that Yeah, I Wonder if you can distinguish whether it's germ law the lack of germline transcription That's specifying centromeres or the late replication of Those regions in the germline Which might be a strong correlation, but it's good could be a completely different mechanism right Just off the top of my head that my initial response to that is that these domains are relatively Short and they're demarcated by gene boundaries pretty specifically so they correlate with the gene transcription units pretty tightly