 Hello. I'd like to thank the organizers for inviting me to speak here today. My name is Elena Hellman, and I'm a graduate student in the Harvard MIT Health Sciences and Technology Program. And I work under Matthew Meyerson in his lab at the Broad Institute. Today I'm going to discuss RetroSeq, which is a tool to discover somatic insertion of retro transposons. So we often hear about transposons being used as an artificial system to induce oncogenesis and reveal important genes. But I'm more interested in retro transposon activity that occurs naturally in cancer. So what are retro transposons? Retro transposons are mobile genomic elements that mobilize via a copy and paste mechanism across the genome. So here we see a retro transposon element in the reference. And that DNA is transcribed into RNA. That, in turn, is reverse transcribed back into DNA. And that DNA is inserted elsewhere in the genome, resulting in now two copies of the original element. So retro transposons have been described as drivers of genome evolution. They comprise over 40% of the human genome. Whereas protein-coding genes comprise only 1.5%, these clearly make up a huge portion of the genome. Most are no longer active, luckily for us. But some do remain hot, as it's called. So they retain the ability to retro transpose across the genome. And thus, they've been recently described, and it's coming to light, that it's a major source of genetic variation. Recent 1,000 genomes projects have revealed up to 10,000 polymorphic retro transposon insertion sites in the human genome. And it's estimated that two European individuals differ by 600 to 1,000 retro transposon polymorphic sites. So just a brief background on two of the most abundant retro transposon elements, because we don't really hear about them much in cancer research. They are line 1, L1, and ALU. So line 1s are 6,000 kB long. They're to compose 17% of the genome. And about 100 of them are still highly active. They're autonomous. That is, they have two open reading frames, which encode for the reverse transcriptase and endonuclease that's needed to reverse transcribe and insert into the genome. And the ALU element is 300 base pair long. It composes 11% of the genome, and it relies on the L1 retro transposition machinery. So when a retro transposon inserts itself into the genome, it can have multiple effects depending on where it lands. It can disrupt the function of the protein if it lands in an exon. It can affect a promoter and thus alter gene expression. It can create or disrupt sites for RNA splicing. And due to homologous recombination, even if it lands in an intergenic region, it can lead to further genomic rearrangement. And so thus not surprisingly, retro transposons have been implicated in cancer. There's episodic evidence of an L1 inserting into an APC exon early in colorectal cancer progression. There's also evidence for an L1 in a muc intron affecting splicing in breast cancer. And more recently, a study from the Divine Lab found nine somatic L1 insertions in six out of 20 primary lung tumors that they looked at using an experimental assay. So the overall goal of my project is to identify the extent of somatic retro transposon insertions throughout the cancer genome using paired-end sequencing data. And of course TCGA is a great resource when looking at questions of extent. So to that end, I developed a retro-seq. The way in which retro-seq works is as follows. We align paired-end reads to retro transposon consensus sequence. Then we then locate the pyramids of these reads that align uniquely to the human genome, to the reference genome. We find these reads, we cluster them. And this provides evidence for potential retro transposon at that site. Now, if these pyramids that we identify, if they're at a normal distance from each other, then this provides evidence for a reference retro transposon at that site. If, however, the distance between these aligned reads where one is in the retro transposon and the other is uniquely mapped is non-concordant with the fragment length distribution, then this provides evidence for a putative novel, non-reference retro transposon insertion at that site. As such. And that's what we're looking for. And in particular, we're looking for ones that are present in the tumor genome and not in the normal. Of course, there are many more steps to retro-seq and parameters and nuances that I won't go into for the sake of time. But I guess my poster is still hanging up there, so feel free to ask me about the various steps at any time. I would like to mention, though, that retro-seq does go back to the reads that align to the retro transposon consensus sequence, we reassemble them de novo to really try to get at what particular retro transposon element it was that inserted at this location. So here's an example output or result from a retro-seq run. This is an IGV view. At the top, you see a normal, the normal genome and all the reads are normally aligned, they're no problem. On the bottom is the tumor genome and here you see clusters of colored reads. And these colored reads represent reads whose pair mates align to a retro transposon, somewhere else in the genome. So this shows evidence for retro transposon insertion at that blue line. And interestingly enough, this example is from the CSMD3 gene, which has been discussed at length during this conference for being frequently mutated. So I guess it's mutated in this way as well. So to test the performance of retro-seq, we did a simulation. We took a BAM file and artificially inserted 226 L1s and 732 Allus into the BAM file. And then we ran retro-seq to see if we could recapitulate them. And we were pretty well. The sensitivity is pretty high and the specificity as well. So that was a good validation for our approach. As an initial study on real data, which is a bit noisier than our simulation, we looked at colorectal cancer, in part because of that event I mentioned in the APC gene. So we took nine whole genome sequences from tumor normal pairs and we used a retro transposon consensus sequence of the L1 family in this particular case. And here you see the results. So the pink lines represent the number of germline somatic insertions of L1s and the green are the germline retro transposon insertions and the green lines are somatic retro transposon insertions. So you see that there's a wide range of somatic insertions that we see in colorectal cancer. In terms of the composition of these events, for the germline events, if you can see on the left panel, we have more than half of them are actually known polymorphic events. So they've been already annotated in the thousand genomes projects and other recent studies. But we do identify around 500 novel germline rare mutations or retro transposon insertions. And on the right, we see the somatic events, the breakdown, most of them do land in intergenic regions, but some do land in the gene introns and a few even in exons. So for future studies, we're currently validating these events experimentally. So this is just the first round of some gels where you see the germline event happening in both the tumor and the normal. And then the somatic, it's missing the event in the normal. We're gonna extend this to other tumor types. For sure we've already started on lung squamous and the results are consistent with the divine lab. And we're going to integrate other sorts of data as such as expression and methylation, which are known to play a role in this. So in conclusion, RetroSeq leverages paradigm sequencing data to computationally localized somatic retro transposon insertions. Using it, we discovered novel retro transposon insertions that are present in the tumor, but not in the normal. And some of these insertion land in genes and regulatory regions. And this provides some evidence for the reactivation of retro transposon mobilization in cancer. So with that, I'd like to thank my colleagues, especially Mike Lawrence and Chip Stewart, Gaddy Getz, and my advisor Matthew Meyerson, and everybody at the Broad Institute Cancer Genome Analysis Program. And with that, thank you for your attention, and I'll take any questions. Questions for Angela. Hi, very good, Carl. My question is how do they be, how do they deal with the transposon with the repeats in the original sequence? Sorry, can you say that again? The transposon may be a repeats of the original sequence. Yes, they are, right there. How to distinguish. Well, so that's how, that's why we use this method so that we localize it, we localize a novel insertion in a new place. So we take advantage of the paired end that's uniquely mapped to the genome. And that's how we identify the novel insertion. Also, maybe the original repeats, right? No, because then the distance between the reed pairs would be normal and concordant. Okay, thank you. I had a question. Okay. Is there any association between these insertions and earlier late replicating DNA? Well, I mean, the CSMD-3 has insertions. Yeah, we haven't looked at that yet, but that would definitely be, yeah, for sure. Yeah, I find myself worried about early and late replicating DNA now for some reason. Okay, anything, there's one. I was just wondering whether you just throw away non-uniquely mapped for reeds. I mean, the many retroposons are highly repetitive. Yeah, so if both ends align to the repeat, we don't look at that, because we don't know what to do. You only consider uniquely mapped for reeds. We only consider paired reeds where one reed is unique and the other is not. All right, thanks. Okay, we should move on. Thank you, Elena.