 Okay, well, it's a pleasure to be here today. And while I'm not an official member of the ModEncode project, I kind of represent another project that's been involved in making mutants in the Drosophila genome, which has been very, very dependent on the quality of the annotation, including the many contributions of ModEncode. And I'll mention several other aspects where ModEncode has had a very important input into the work that I'm going to be describing today. So what I really want to tell you about is how we did this project and how we got some very interesting basic biology as a byproduct. So the subject of this work goes back to two of the really incredible discoveries of the last century about the genome, the discovery by Barbara McClintock that there are transposable elements in the genome, and not that much later, the discovery by Roy Britton that there's a major amount of the DNA in animal genomes is highly repetitive. I'm kind of pleased with both of these discoveries. Both of these discoveries were made in my small institution, the Carnegie Institution, this in the Department of Genetics at Cold Spring Harbor, and this is here in Washington, the Department of Terrestrial Magnetism, which is the first place that there was a cyclotron in the United States. So they had the first P32 and could do experiments on biomolecules before anyone else. And now, fast-forwarding to the era of genomics, we know that this repetitive DNA in transposons are just a huge factor in the structure of genomes, and I feel that those elements and what they do are not appreciated as much as perhaps they should be. So if we just look at the Drosophila or human genome, in Drosophila, for example, about 30 percent of the genome is repeated and related to transposons, much of it is clustered around the centromiric regions, the heterochromatin shown in green, and in humans, even more an estimated 50 percent or more of the genome is a product of transposition. And in Drosophila, there's over 180 different transposons that have been characterized, and these fall into families, and many of these families are very, very similar. So, one individual elements can be recognized between these two genomes, a couple of which, for example, the mariner class, a piggyback or P-element, are among the DNA transposons. On in Drosophila, there's some strains have zero, but many have several of these elements and they move around the genome. Interestingly in humans, there's 53,000 full-length copies of the mariner element, but they all contain point mutants, and similarly with piggyback, 500 copies. So these elements currently don't appear to be active. P-element and other transposon, there's no full-length copies in the human genome, but there's 12 genes that are related to the P-transposase, which is encoded by the P-element, and the function of these genes is still very poorly understood in the human genome. So, our general picture is that transposons, this used to be controversial, but I think it's becoming accepted, that transposons are incredibly important in driving the evolution of regulatory sequences, which we now believe is the major modality of evolutionary change, at least over shorter periods of time, and these transposons move enhancers and things around and can put them in front of many new combinations of genes, and this is a very efficient way to evolve new modes of regulation. It also is very important in cancer, because when you get double-strand breaks in a genome, this leads to recombination, and it often is at sites that already were present of a repeated sequence, not necessarily even a full-length sequence. So these are often found at the breaks of translocations, and this is something that's true from yeast to humans. So these elements have a big effect on the evolution of the genome, but we know relatively little about how individual transposons interact with the genome, why do they go to certain spots called hot spots and avoid others, how different are the behaviors of one type of transposon and another in terms of where they go, and another general question that we really have very little understanding of is why are so many of the transposon rich and repeat rich regions of the genome replicating late in S-phase? It's a very universal observation, but there's little understanding. In fact, I would say late S-phase might be the biggest aspect of the cell cycle, which remains almost completely not understood. Okay, so what I want to tell you, this project I was referring to started as the Drosophila Genome Project. It was supported by NHGRI for 10 years. And then the portion that had to do with making these mutations was renamed the gene disruption project and continued with the support of NIGMS. The purpose of this was just to generate insertional mutants using transposons of all the Drosophila genes so that the community could do functional studies. But as a by-product, the best data for how transposons do interact with the genome was generated at the same time. That's because we used a very simple paradigm, the kind of mutation that we were interested in is just having one transposon in the genome. It's very hard to understand how transposons move around in the kind of their wild state where there's multiple copies, defectives, and tacked elements. So what the paradigm is to, first of all, to use a strain that doesn't have any of a given element. So you just put in one copy and you put a marker in it, such as this white eye gene shown here. And if you just have one copy, if that element moves in the genome, you'll see an anomalous change in just the normal Mendelian inheritance of white eye color. So you can immediately pull out the fact that this red-eyed fly, which shouldn't have been red-eyed, must have had a transposition event. As a result, just by simply looking at the flies, you can basically pull out every transposition event and from a known starting site, do this thousands and thousands of times. And then you can sequence the flanks and localize it in the genome. So basically you can get huge data sets of you started here and you moved here, and here's how frequently you went to this site or that site. And you can do this with multiple starting sites, different markers that might have more or less tendency to be suppressed by the sites where they land. And a key aspect of this that I'd just like to acknowledge at this point is that when you get one of these sites, there's no function here. You're just seeing by sequencing that your element is now, say, here. Well, you depend totally on annotation to know, is this likely to be a mutant of this gene or this gene, but which are the transcripts of that gene? So the success of this project has been dependent, greatly dependent on the quality of the annotation which in the fly is quite good and has improved considerably thanks to the mod-encode project. Okay, so the kind of issues we want to get in these screens and because many of the, much of the existing data on transposition has problems. And that is because most screens aren't really interested in collecting every insertion, many go to the very similar sites, so they don't seem very useful. Many don't use markers that can identify insertions that go to regions of the genome where the chromatin is not very conducive to gene expression. But we can use markers that are very, very insensitive or we can do it in the background of a suppressor of variegation. And another problem is if you get an insertion that's in a repeated region, you can't uniquely localize it. But by sequencing long amounts from both ends and looking with very, very high accuracy, we worked hard to localize as many inserts even in repetitive regions as was possible. And thus, I think that while we undoubtedly missed insertions due to suppression and due to repeated sequences, more were recovered in these experiments than probably in other collections of transposon insertions. Well, the results, here we did this with these three different types of DNA elements. And the net result for the community is that two-thirds of all the genes have an insertion that's very close by and provides access, if not a direct mutation. These have been very widely used. And the remaining, our current strategy is just now we have FICI 31 sites in these insertions and with these allow you to swap in DNA at the site of the insertion which allows you to go to the rest of the genes basically by homologous recombination. So what I'm going to focus on is the site specificity of these three transposons using these databases collected as I described. And first I'd like to, and what you're going to see is that each transposon actually is a unique entity that interacts with the genome in a unique way, that's quite different among the transposons. And that knowing this is actually rather useful if you're interested in using transposons to engineer genomes. And many of these transposons will work in just about any genome that you put them in. So our Mariner class element called MENOS, here's a description of its transposition mechanism. It integrates at TA sites, but there are many of those throughout the genome obviously and it functions widely in organisms from cyanide and clostridium. And one of the things that is characteristic of this element is it's really, really random in its transposition throughout the genome. So if you just divide the genome into a bunch of bins and you take a lambda equals one, so let's say you have 4,000 bins and you look at 4,000 insertions, that would be a lambda equals one. And pure Poisson distribution would suggest you'll have what's shown in this yellow bar, an equal number of zeros and one hit bins, etc. And what you see is that the Mariner class element is very, very close to the yellow. So it's just a little bit less random than a Poisson distribution. And now we just contrast this with what I'll be describing next, which is the piggyback and the P element where you see that they're very, very different. There are very many more still unhit bins and there are many, many more hits with multiple hits out in this tail showing the non-raminous. So the Mariner, however, is a really random transposon. So if you're trying to saturate all the bins using Mariner is a very good choice. You're really only falling off slightly from a Poisson distribution, which would be the best you could do. And we can look to see what is non-random, why isn't it exactly like Poisson? And what you find is a kind of a simple and interesting explanation. And that is most of the cold spots of the regions that Mariner doesn't hit are polycomb group regions. And this also is something, of course, that the Mata ENCODE data helps you localize very precisely and then you can see this correlation even better. So here's the most famous such zone, the ultra-by-thorax complex. And you see, so here's the UBX gene, abdominal A, abdominal B. And then the ends of the complex are pretty much where this black line goes. And then each one of these bars is an independent insertion or sometimes more than one. And here you see that there's just a halt of all three of these trans-Posons, notably Mariner as well, throughout this zone. And this is true. Here's a bunch of other, this is just a table of the cold spots for Mariner, you see the zeros here. And here's the predicted number of hits in the same zone. And everyone that has this bar is a polycomb group gene. And so pretty much where it's not hitting are polycomb group genes, a few additional low side. And this explains how it deviates from randomness. And there's one or two, not everything in biology is ever going to follow an exact rule, there's a few polycomb group regions which are not affected in this way. And what we would expect is since these polycomb group regions are being mapped in other cell types or even in tissue culture, but transposition is in germ cells, these must be polycomb group regions that are not actually assembled in the germ line and thus the elements are free to move in the germ line. But it's surprisingly similar. And this may be related to things that have been called transposon free regions in mammalian genomes because they're often these also polycomb group or they're homologs at least a polycomb group in Drosophila and there's a number of them that correspond. So it's certainly something that one would think of as a mechanism that might have an impact on transposition in other genomes besides Drosophila. Well, let's turn to the piggyback element. A piggyback is interesting, it goes to a TTAA site. So it has a specific target site, here's its transposition mechanism. It actually doesn't, it picks up by kind of enhancer trapping or protein trapping to express its transposase. So it depends on where it's located, where it's transposed. This is the element that has been domesticated and drives things like DNA elimination in a number of organisms that do this process. It's a very widespread element. And it's not nearly as random as Mariner. It, like the P element, piggyback, likes to go in genes and five prime ends more than in non-genic regions. And the interesting thing when you, so we calculated from our data all the hot and cold spots for this element. So here now you see, these are cold spots. You see all the zeros here in the piggyback column. And this is the predicted by the size, how many should have been in there. And the thing that was surprising about this collection is that the types of genes that are in these cold spots seem to not random from kind of a gene ontology perspective. There was a highly high enrichment in like membrane proteins and receptors. And then in the hot spots, there seemed to be a greater than expected number of genes involved in neural development and behavior. And so what this made me wonder is whether this ancient transposon has in a sense evolved in its site specificity to target certain classes of genes, presumably by the way it interacts with certain types of chromatin or transcription factors that might allow it to provide less deleterious interference with its host and maybe even to under conditions of stress to hit genes that might provide beneficial variation to allow survival. At least if you were interested as an evolutionary biologist in such an idea, I would strongly recommend looking at piggyback rather than some other transposons. Well, now I wanna turn to the Drosophila P element because this element is, has a very interesting history. It's only recently been in the Drosophila melanogaster. It's a horizontally transmitted. It can spread rapidly throughout wild populations. So in just relatively, just a few decades, it's been documented that the P element spread throughout worldwide throughout Drosophila populations. In part because of the shipments of fruit all over the world which have fruit flies and you get some P elements and some of these pretty soon they're going everywhere. And you can simulate this in the laboratory. You can just take one, you can insert one element into one fly and put it into a population in the lab. And within a relatively short time, the genomes of those flies throughout that population look like the genomes of wild strains which might have 15 complete elements, 30 defective elements, etc., and they've become repressed. They can evolve to repress their own movement. Well, this brings me to the question of how does, how do any of these elements, and the P element in particular, how do they spread like this? Because cut and paste DNA transposons have a problem in that when they move, if they moved like in G1 from one site to another, there would be absolutely no increase in copy number. So various mechanisms have been proposed for how DNA elements can increase in number like in that experiment I described. One is that if they, if they can limit their transposition until after they've replicated, then, you know, now you have a homologue where they still has a copy of the element, jump somewhere else. This double-strand break will invade here, and it will repair the whole element back in there. So now you've gone from basically one to one and a half copies. However, if you also, by chance, jumped into a region of the genome that hadn't replicated yet, and now a replication fork comes along, you'll now have achieved replicating twice in the same S phase. And so that gets you another half of the copy, and you can go from one to two copies. However, this has mostly been a theoretical idea, because it hasn't really been shown the extent or the mechanism by which such events could be programmed by a transposon. This is known to happen, and it's been thought to be the major mechanism for the element copy number increase. But again, how would the element know when it's in S phase? That has not been really understood. So another thing that's striking about P elements and that was frustrating in our project is that they have a very strong, not just a promoter preference, that's been wonderfully useful in terms of manipulating genes. But they have a very high degree of hotspots, and they don't go randomly throughout the genome. They were the lowest curve on that saturation curves that I showed you. And in fact, so this shows you how strong their promoter preference is, there's just plus or minus 100 around the promoter. The enrichment through P elements is 16 fold. And even plus or minus 500, it's still substantial compared to these other elements. When you look at the list of the hotspots shown here, there's no obvious physiology or logic to what they would be. And they've resisted all attempts to explain why these genes were hotspots. They don't correlate with, well here's one idea early was that they correlated with transcription in the germline. This was like open chromatin in the germline, and that's what they were going to. Well, when we could actually measure expression levels in the germline, here's a plot of the correlation between germline expression and P element targeting, and as you see, it's virtually zero. However, thanks to the modern code data, I noticed a very striking correlation beyond the promoter correlation that suggested a lot of interesting answers to some of the questions that I just raised. So what I've plotted here is the modern code orc site. This is replication origin data from Drosophila cultured cells, and here's insertions. Now there's a lot of overlap between orc sites and promoters. And this is not unexpected because some of the same transcription factors that activate genes also activate replication origins. And we know from studies of developmental biology that in a tissue, often the most abundantly expressed genes have origins right near their five promens, and that actually are very close to the promoter. However, these correlations for the orc sites were actually looked a little better, like here in this set of five inserts, there's an orc site, but there's no promoter, at least no known promoter. Maybe that's a poor annotation, and eventually there'll be one there. But consistently, we seem to see a better correlation with orc sites than promoters. And one of the ways you can see this is if you plot, there's quite a few of the promoters, both promoters and orc sites. If you plot promoters that have multiple P inserts, then more and more and more of them in a nice correlation are ones that have an orc site as well as a promoter. And remember, because the data is coming from tissue culture cells, we would assume that a few of these, the fact that maybe the correlation isn't to 100% could be just because there's differences in the replication of a germline cell and a tissue culture cell. Okay, so but I think the thing that can hopefully convince you that orcs are more relevant than promoters is shown on this side where I've compared the degree of enrichment of PLM insertion in various classes of promoters with and without orcs. So if you just, first of all, if you just look at orcs all taken together, or the orcs common to all three cell lines, you get higher enrichment than if you just take promoters taken together. And if you have orcs that are at a promoter, you get over 50 fold enrichment, which is a lot of enrichment. And then perhaps the most convincing of all is if you take promoters with orcs, you get a very big enrichment, but promoters without orcs are almost unenriched. So we think that it's that P elements literally recognize replication origins as their target sites. And this can explain a number of factors that were never really understood, like another interesting factor about P element site specificities, they absolutely don't want to go into genes that are tandemly repeated like decorion genes, things that are tissue specific, often clustered. And this type of gene, like here's some cuticle, one of the cuticle protein clusters, this CPR cluster, it's got lots of promoters, but there's no orc sites in this region and also no P insertions. And when you do find, so here's another one of these clusters, a UGT cluster that many promoters, only two of which are ever hit. And look, those are the two that happen to have an orc site also. So there's a lot of data of this type, which I'm not showing you, that further reiterates this. P elements seem to be recognizing promoters, and it can explain many of the facts that we observed previously on P element specificity, and it also coincidentally tells you that tissue culture origins are, at least two thirds of them are probably used in germ cells. Well, this can potentially provide a mechanism for this connection to S phase that I was talking about that is so important for the propagation of these elements. So if a transposin is inserted very close to an origin, then because origins are assembled in these pre-replication complexes, which have unique collections of proteins, as long as those pre-replication proteins are there, you can just postulate that inhibits transposition. And then only after replication will those pre-replication proteins turn into moving forks and leave, and that, so that would suddenly the element will know that it's replicated because it isn't associated with these proteins anymore, and that can activate it to transpose now, presumably relatively early in S phase or whenever the first one of these origins where it's sitting fires. So that provides a guarantee that there will be a homologue that still has a transposon when one of these elements excises leaving a double-strand break and moves to another side of the genome, explaining kind of the first mechanism for a P element increase in copy number. Now another interesting possibility about this is it might also provide a mechanism for replication timing. Because if the P element interacts with these pre-replication proteins, and it might have these two choices, say, because it likes to go to origins, but if this one is already fired, the pre-replication proteins will only be at this one instead. And so that would allow it to go to an origin that had not yet fired, which means that it's going to still be replicated later. And thus that will allow the second copying. This copy will be doubled by repair, and this copy will be replicated again. So that means that now this element could take advantage of both mechanisms to maximize its copy number increase. However, there's one kind of Achilles' heel to this strategy. And that is that every S phase, the transposon would be going from an earlier origin to a slightly later origin in S phase, the later-timed origin. And then the next round, this one wouldn't start until the later time. So the only ones left that hadn't fired would be even later in S phase. In other words, it would provide a kind of fundamental push of transposition towards later and later in S phase. And this type of movement to later firing origins provides a benefit to any transposon, not just a PLM, but any transposon that jumps to a later acting region of the genome later firing will potentially increase in copy number by replicating twice. So I think this could be a very, it's a very, very simple, but it could be a very fundamental connection between repetitive DNA and late S phase. And could explain why there's this big collection of transposons that we find in late replicating regions, rather than the transposons causing the late replication. Late replication might attract intrinsically mobile elements for their own selfish reasons. Likewise, genomes may fight back in Drosophila, the pi RNA loci that generate the regulatory RNAs for transposons are also located in late replicating regions. They might be there almost like traps knowing that the transposons are going to want to go to those regions. They can now start making RNAs that will slice up their transposase messages once there's some copies in these loci. Well, another aspect, so if late replicating regions tend to have a lot of flux of transposons, this also creates another interesting aspect that could explain some of the character of heterochromatin. And that is you're going to have a lot of double strand breaks due to transposon activity. And we showed using again these substrates from our genome project, some of these happen to go into tandemly repeated DNA. So we looked at what happened when you have a transposon mobilized from a tandemly repeated DNA, what happens is perhaps very expected from basic genetic studies in many organisms. You get a double strand break and now it does that repair reaction, but because it's a tandem array, the double strand break doesn't always find the homologous repeat copy. Sometimes it goes, you know, one strand goes to here and another to here. You get all kinds of combinations and these result in either a decrease or an increase in the number of copies. So we think that this flux of transposon activity in any tandem repeat that's in a heterochromatom will tend to expand or decrease the arrays, thus allowing these things to balloon to large size if there's any selection for them to do so. So I think this provides at least a reason to think that you would have a lot of tandemized sequences in heterochromatic regions with a high flux of transposon activity as opposed to other sites in the genome. And indeed you see transposons inserted in many of these regions, including in single copies of some of these sequences. Okay, well there is a problem with this severe addiction to going to these regions in that you could go to the latest regions in the genome and then you wouldn't be able to move anymore. So we think that it may be that all transposons have to follow this strategy some of the time, but they can't really become completely dependent on it. And this suggested an interesting idea for a phenomenon that's been seen in many transposons called local transposition. So even shortly after McClintock discovered the AC element, it was found in maize that quite a few of the transpositions from a transposon actually just land nearby, a so-called local transposition, maybe zero to 200 kb, that particular transposon an organism it varies, and it could be 30 to 70%. This was considered a problem, for example, in sleeping beauty, which is a mariner element mutagenesis of the mouse genome, because they said there are too many of the inserts are going locally. But it's never exactly been explained why there is local transposition and what its mechanism might be. Well one of the things about endrosophila local transposition was gave a clue. And here's some old local transposition data. And here you see the starting transposon, which is oriented the way this arrow shows. And what you'll notice is almost all the insertions that are kind of on the five prime or the left-most side are oriented in the opposite orientation. And the ones on the other side, well there's only one shown here, but take my word for it, they tend to be in the same orientation or random. Whereas normally in a distant transposition the P elements will go equally in both orientations, and that's been shown many, many times. So why would there be an orientation preference locally and not in a long range? And I think that the association with origins provides an incredibly simple mechanism to explain this. Is it you figure that the P element the transposon has just been derepressed due to the activation of these pre-replication complexes. And it takes a finite period of time for all those proteins to disassociate with the replication forks. So there's still some of the proteins perhaps present as these forks just start up, that we know that the transposon likes to insert into. It uses to recognize unreplicated or unactive origins. So for a brief period the transposon might tend to insert in the forks just as they're leaving. And that would, since many transposons are, they're not actually right in the middle, they recognize proteins that are associated preferentially with like the leading or the lagging strands. Often the lagging strand, PCNA on the lagging strand has been shown in a number of prokaryotic transposons to interact with the transposon. So if that was the case, if this is on the lagging strand, if it jumps in this direction it's going to switch to the other strand which will cause that inversion of its orientation. And if it goes in this direction it'll still be on the same strand and it'll cause it to be in the same orientation as the starting element. Thus provide, but it's the association with replication that provides this easy explanation for the local transposition data. So I would like to just conclude then that by studying transposons their specificity as a byproduct of our project to make mutants in all the genes we've seen the individuality of individual transposons and where they jump in the genome and we've gotten some new ideas into how transposons may be involved in generating some of the major features of genomes, their heterochromatic transposon rich repeated regions. And I would just like to thank all the people who've worked in the Drosophila Genome Project and the Gene Disruption Project and especially my other PIs for the current Gene Disruption Project, Hugo Bellin at Baylor, Roger Hoskins at Lawrence Berkeley Lab and I also have worked closely with Bob Levis in our department and thank you very much. Those are some really nice models there especially tying it into replication. I'm wondering if you've thought if there could be another layer of control in the transposons influencing replication. So if we like get rid of the replication licensing mechanisms we see that re-replication specifically occurs in the heterochromatin and the transposons around pi RNA clusters and several others are specifically much more over replicated than the rest of the repetitive sequences. Well I think that's an extremely interesting observation. You know I think there's probably more than one potential mechanism to explain that. I guess you're suggesting that maybe the transposons are carrying replication origins within them or some other, I mean there certainly, I think that the evidence is a lot stronger based on this that they can interact with origins. Maybe they can interact in a positive as well as a negative way. I think it's a very interesting initial observation that could lead to a mechanism I think with some further study. Yes? So, very good talk. I really enjoyed it. A few questions about the transposons. First of all, do you think that these transposons could potentially be the origin of things like some of the hotspots we are observing for sort of the DNA binding? I mean they could kind of, you know, a sort of transposon hotspot could give our eyes to kind of a binding hotspot for lots of things? Well, I think, you know, there's a lot of copies of some of these sequences in the genome and they will bind to proteins. Just like a lot of repeated sequences are major binding sites for proteins that don't seem to have any obvious relation to the, I mean most of the topoisomerase 2 in Drosophila is bound to one of the satellite DNAs. So they're going to be affecting the distribution of protein just by virtue of the fact that they're there and they have affinity whether it's specific or nonspecific but some of it at least is going to be fairly specific. I think what you need to... finding examples where there's functionally important protein binding, we know of a number of examples where like the regulation of a gene has changed due to a transposon insertion nearby and that's due to the proteins that now bind to that transposon. I mean one of the cases would be like the amylase gene in humans, which used to only be expressed in the gut and then during very late stages of primate evolution it began to be expressed in the salivary glands as well. There's a repetitive element in front of the salivary gland version that if you just take it and put it by the gut gene now it's expressed in the salivary gland. So there's, and Sue Wessler has shown in plant evolution now many, many examples where it's a transposon movement caused a evolutionarily selected change in gene expression. I think the number with the GWAS project in these non-coding regions which according to Eric Lander often have transposons associated with them suggests that this is a major theme and it would be due to the proteins, I presume, that are brought in by these elements. I love the model. I'm wondering whether if you look more deeply into the monocode data and similar kind of data if you can, you know, get some other information. So for example, you know, Dave mapped early and late domains. Again, tissue culture cells. And then the other is... I did look at that. And you know, the problem with that early and late origin data, I mean, just in concept it's great, but 40-some percent of the genome is called early and, you know, it's not very precise. What I would like to see is like smaller windows that give a more thorough, you know, parsing of replication. So with that data, I tried to see if I could document this tendency to move from early to late with those definitions, and I didn't see it. So either it's just my ideas wrong, which I, you know, I'm not willing to admit that yet. I think we just need some better data to see this effect in origin timing, but it is something you can experimentally test. But also the orc binding data by sort of combining Dave's data with the chromatin and the chromatin states, there are distinctions between, you know, promoter-associated, non-promoter-associated in terms of the chromatin marks and remodeling factors and things like that. I'm wondering if there are further distinctions that, you know, for the correlations with more specific subsets of chromatin types among the origins. Well, it's possible, but a lot of this work was done before all that data was out, and I did the best I could downloading various data sets, but maybe with the assistance of some of the experts here I could do a more thorough job. But I do have to just give a plug. I mean, the quality of that orc data is absolutely superb, or you would never get a 50-fold enrichment for something that's completely a completely different paradigm. So I think, hopefully, that's just a sign of the quality of all the modern code data, but it's really fabulous. Has there been any correlation between transposition sites, either gen or local, with high-C data? Not that I'm aware. I mean, you know, again, you have this problem of what cell type and are you really looking in the right type the correlation might exist, but unless you get the data, ideally maybe genome germline kind of data, you might miss it. And it might be missed because I haven't focused very heavily on high-C data in my analyses of this transposition data, and it's all public, but I just don't know how many other people found that model very attractive for integration at local sites being associated with replication origins. Could that potentially be an explanation for local hopping? Well, that's the model. I think, you know, I always tell the people, in my lab, you've got to have a model. You can't just work on something and not have some hypothesis. No matter how absurd, it's better than nothing. So these other transposons? So, like, I think this... I don't know of a better working hypothesis for local transposition, but if someone can put up a better one, then I'll shift to that one. But I think this one has a lot of attractive features that, based on what we know, it explains kind of the distance and the fact that they're really clustered in close, and that's where the proteins would still be most like an un-fired origin and the orientation effect. We observed that if we build a P element that has an additional repetitious element in it, in our case we're interested in 1360, that we see preferential insertion of those elements, then back into some of the heterochromatic or pyroRNA-associated loci, for example, at the base of the second chromosome arm and telomeres 2R and 3R. So we're wondering if there is any tendency of those elements to interact with the associated proteins in the germline. Do we have any evidences to whether the P element itself interacts with some of the proteins that you would find in an origin replication complex? Well, there has to be some mechanism to make it go to origin, so I would say it's almost certain that some either DNA protein or maybe a protein that binds to the P elements interacts with a protein at replication origins. This would argue that now you have to go and find the biochemical mechanism that's drawing them to origins. And I think you're well aware of the history, you know, going back to the Escargot locus, where you put a certain piece of DNA in a PL element, it can affect, usually it doesn't, but it can affect its targeting dramatically so that you can get a preferential insertion in a certain region upstream from Escargot by having that same region there, suggesting there's a protein that binds that region and then protein-protein interaction tethers this replication complex preferentially to that region so it'll insert somewhere nearby. And I think that could possibly explain some of the observations you just made, which others have seen, for example, the only P elements to break that polycomb barrier in by thorax are P elements that contain pieces of UBX. So something about having a piece of UBX allows them to get through the polycomb screen and to insert into those types of regions. So there's definitely effects like that, but they're relatively rare. Most pieces of DNA you put in PL elements don't affect their targeting, and we've looked at that. Well, even in this study, we had like at least seven or eight different structural elements, and you can compare the different distributions. They all hit the same hotspots, et cetera. Okay, but you don't have a favorite candidate yet. Excuse me? You don't have a favorite candidate in terms of the proteins that might be involved. It's really, I just haven't... I don't have a good enough idea to favor one over the other at the moment. I have a very different question. Do you have any idea where the filament came from 50 years ago? Yeah. I mean, it wasn't my work, but what people have shown is that there's a group of Drosophila called Drosophila-Willis-Stoni, and they've... You can tell by looking at the transposons in a genome how long they've been there by comparing the sequence variation among the different elements. That's kind of the archaeology of the human genome. Transposons have been beautifully worked out using that approach. When you take the PL element in Drosophila, in wild strains, they're virtually identical. You know, there's maybe one nucleotide difference or something, but in Willis-Stoni, they're 30% identical, and the same sequence is the one that's in Drosophila. So it's a... And then there's a big phylogenetic range in between that don't have these elements. So, you know, it's very hard to say otherwise than that some mechanism, some mite or virus or something, or, you know, a renegade, somebody who was injecting DNA before, you know, who we knew about it, that somehow that element from that group, Drosophila-Stoni, but one of its very closely associated species, got into the melanogaster germline. We know that if you do that, the element just jumps right into the chromosomes. And that start... And then it'll spread by the mechanisms that I proposed or others. And so we now have world populations. And this has actually been documented repeatedly with some types of elements. There's an ability to do this horizontal transmission. Any evidence for any sort of rapid selection or subspeciation as a result of this? I think this happens a lot more often than people realize. You know, it's not the only element that has fairly recently come into the Drosophila genome. I mean, the eye element, which is a non-LTR transposer, not even the same category, active elements have only recently been in Drosophila, but there were some old defective elements in the PI RNA, low side that presumably had been keeping these things under control for years. And now just recently, that seems to have broken down and now you can get in wild strains ones that have active eye elements. So I think transposons are in a constant state of flux in terms of their regulation and how active they are and even due to horizontal mechanisms coming into the genome, maybe going extinct. And that's partly why, you know, the heterochromatin to some extent is like the immune system of the germline. It has all these old transposons that used to be there and as long as their PI RNA is being made to those, they can't be reinfected. But just like your own immune system, if the antigen doesn't come back, it's slowly, your immunity decays away. Okay, thanks. The possibility instead of an origin having it be a break or, I mean, what are your thoughts on breaks contributing at any point to the insertion? Well, you mean a break before just going to where there is a spontaneous break? Right, where there's just some random break in the genome and the PLM inserts through that type of mechanism as well, or... Well, it wouldn't explain the tremendous correlation to the York Mining site. So, I mean, maybe they're just the most prominent reason there's a break is because when these forks start up, there's a single strand of DNA and whatever. Is that what you mean? That origins just are the site that has the biggest chance of having a break? Or just in addition to that process of an orc? Well, you know, not every insert is at one of these sites. So, until we can map them in the appropriate cell type, we don't know, maybe that's another class. And I think the local jumps are a little bit off from the origins because they can jump up to 100 kb off to one side. So, I think there's going to be a class of transposons that are not precisely at orcs, but I think you'd have to almost start to go into in vitro. You know, there is an in vitro PLM at transposition system and that would probably be the best way to test that. And, you know, transposition is a reaction that you can really reconstitute in vitro and many prokaryotic transposons have just been... The biochemistry is just gorgeous and that's why those mechanisms I showed you have been very well worked out. So, I think that these eukaryotic transposons are all moving in that direction as well and one could see the influence of the substrate, you know, in terms of its double strand or single strand break character. But it wouldn't let you tell if there was an origin so easily in vitro. Right. If Willis Tony has had these elements for much longer, does that mean that Willis Tony also has some natural defenses against them? Do you see such evidence? I couldn't understand your question because he was saying that we have to move on. I can skip it, but it's about natural defenses. We need to move on to the panel discussion now. Thanks, Manols. Sorry. I'll ask you... Yeah. Natural defenses, what about them? Yeah. So, next we'll be having a panel discussion from some of our speakers earlier today and yesterday. And top is going to be use of modern code data for understanding human biology and disease. And Brian Oliver is going to be our moderator and I'll ask him to introduce himself and the panel members.