 So, I'm glad to have this opportunity to present our work on gene suppression using RNA-seq data. So I think the first question about gene fusion is why gene fusion is important. So back to three decades ago, the first field gene was discovered from the Philadelphia Cromsom, which is a BCR-able fusion gene. And in 2005, this paper, our DNA group discovered the TMP-RSS2 gene fusion in prostate cancer using high-super sequencing data. This is probably the first study to use sequencing data to identify fusion genes. So in TCGA, we have quality RNA-seq data for so many samples, so which allows us to identify gene fusions with confidence and even at very low frequency. So to do this, we developed a tool called Prada, pipeline for RNA-seq data analysis. So Prada is used to identify gene fusions, but Prada is more than fusion detection. Prada is composed of four modules, processing module, expression and QC module, fusion module and gas module. So the processing module performs some basic operations on RNA-seq data, including read alignment, recalibration, and duplication removal at all. The expression and QC module calculates the RP-CAM values and generates the quality control metrics for RNA-seq data set. Fusion module detects all candidate gene fusions using all genes in the genome. And while the gas module is a supervised search module, which can detect if a fusion is presenting a sample, it's a very quick script to detect gene fusions. So can you go back to the third slide? No, it doesn't work. It's pretty slow. Yeah, it doesn't work. Probably I screwed too quick. Yeah. So they can control that. The third one. So a central step in Prada is the alignment strategy. We use a multiple-tiered alignment strategy in which we align short reads to both transcriptome and genome. By mapping the reads to transcriptome, we capture all the transcript variants. And by mapping to genome, we capture the unannotated transcripts for RNA-seq data. So as an RNA-seq data analysis pipeline, Prada uses many tools, the established tools and functions as infrastructure. This shows a detailed diagram of Prada. And we use SAM tools. We use SAM tools, BWA for alignment. We use SAM tools, Pcard, GTK, and RSXC for the RNA-seq quality controls. So as an RNA-seq data analysis pipeline, it's important to output RPKM values from Prada. So RPKM value is an estimate for gene expressions from RNA-seq data. So a little memory refresh. So back to 2010, we published a cancer cell paper in which we divided glublasomas into four types of subtypes, which we named pro-neuro, neuro-classical, and mesenchymal. So we made this classification based on array-based data. And right now, since we have the RPKM values from RNA-seq, we want to compare the RNA-seq based classifications with array-based classifications and see how consistent these two metrics give us. So we did this to around 160 samples. And this table shows a comparison results for this 160 samples. So overall, we get a concordance rate more than 80 percent. And considering we divide those samples into core and non-core samples, those non-core samples we cannot ambiguously classify into any of those four subtypes. So we have 25 percent of the samples to be on-core samples. While considering that, this concordance rate is pretty high, actually. So for fusion detection, this is a rationale for Prada to detect fusions. Prada uses two lines of evidence. First is this cotton-red pair. And second is fusion-spanning rays. A discoloration-red pair is a pair of rays in which one end of the red map to gene A, and while the other end of the red map to gene B. So if gene A and gene B has a fusion, then we expect to see there are some rays mapping to the junctions. And while the other end of the junction-spanning rays map to either gene A or gene B. So in Prada, we require both evidence to call a fusion candidate. And like many other fusion detection tools. So a important issue in fusion detection is false positives. So in Prada, we have many filters to filter out those false positives. And the number one filter is we require that the gene pattern in a fusion cannot have a significant sequence similarity. So we call this homology filter. We also require the ratio of fusion-spanning rays and the discoloration-red pairs. The number of the ratio should be within a limit, and which is determined by the library size and the read length. We also look at gene patterns and junction patterns and other things from the fusion identification. So we applied Prada to kidney cancer and glioblasma. And we identified recurrent fusions, which include SFPQ, TFE3 fusion in kidney cancer, FGFR-tachyne fusion in glioblasma, also TFG and GPR128 fusion in both cancer types, which I will discuss in the next coming slides. So in kidney cancer, we identified 80 fusions in about 15 percent of the cohort. So the most frequent fusion is SFPQ and TFE3 gene fusion, which occurs in five samples. The second recurrent fusion is TFG and GPR128, which occurs in four samples. So interestingly, TFE3 translocation has been reported to associate with a rare subtype of adult kidney cancer. So to evaluate the accuracy of Prada, so we selected 13 fusions, and we used RT-PCR to validate at least 11 out of these 13. So we also validated the fusion identification in glioblasma, which we had a comparable validation rate. So we used whole genome sequencing data in glioblasma to validate our fusions. So in glioblasma, we identified a total of 232 fusions in about around 70 percent of the samples. So the recurrent fusions include FGFR, TACG gene fusion, TFG, GPR fusion, and EGFR involved fusions. So these fusions are important because, for example, the FGFR and TACG gene fusion was reported in science by Singh et al. in July, and they show, by experiment, they show this fusion is transforming fusions in exercise. And TFG, GPR fusion is interesting because we thought this fusion in both kidney cancer and brain tumor. EGFR fusion is important, certainly, because EGFR gene is important in glioblasma. So since we have the data from both kidney cancer and glioblasma, so we think maybe we can do some, you know, pan-cancer analysis. So the first thing we look at is the fusion distance, the distance between two fusion genes in both cancer types. So this shows the first bar shows, the first three bars showed the interchromosome fusion distance, and the last bar shows the interchromosome. So we can see for both glioblasma and kidney cancer, we actually find more short distance fusions, where these two fusion partners has a distance less than one megabase. So TFG, GPR fusion, so we thought this fusion in both kidney cancer and brain tumor. And then we went back to see the copy number profiles of these fusions. And this figure shows a screenshot of the IGV, IGV screenshot of the copy number profiles. And you can see each row here, each line is one patient. And you can see almost all of the patients with this fusion, they have a very focal amplification on this locus. So this fusion is actually caused by a focal amplification and a inversion. And this pattern is not only present in kidney cancer and brain tumor, it was also found in some other cancer types. And most interestingly, it was found in healthy individuals. So suggesting this is the germ line events. So this gives us a reason to look at this fusion in normal samples in TCGA. So to do this, we developed a tool called gasFT. So gasFT, like I mentioned, gasFT is a method to supervise, search a fusion in RNA-seq data. So we use gasFT to all cancer types, to all the fusions and the mesh normals. In all cases, if we found this fusion in a tumor, then we found this fusion in the mesh normal. So this result shows this fusion is a germ line event. Looking at the gene expression pattern, this fusion can activate the expression of GPR128. So I mentioned EGFR a few slides ago in glioblastoma. And the most common mutation in EGFR in glioblastoma is the V3 mutation, where EGFR loses exon-227. So consequently, there is a rearrangement between exon-1 and 8. This kind of rearrangement is different from gene-gene fusion, because this is the intrigenic event. And this kind of intrigenic event cannot be detected by the common fusion detection tools. So to find this, we developed a tool called gasIG, gas for intrigenic rearrangement. So we applied this gasIG to RNA-seq data in glioblastoma. We found V3, we found C-terminal deletions. And most surprisingly, we found exon-12 to 13 deletion and exon-14 to 15 deletion. So those two kinds of deletion has not been functionally characterized. To summarize, Prada has several highlights. So it provides powerful functions in processing RNA-seq data. And it can be used as a standalone version, or it can be used within a PBS or LSF system. It has a modular design. So we, our users can actually flexibly use Prada. You can pause, or you can resume your analysis, or you can just select to run individual modules among all of the four modules. So we, because considering the number of samples we are handling, so we actually design Prada to run samples in batch. For example, we run around 180 samples in two weeks. So right now, Prada is available in SourceForge. So finally, I want to thank all the people involved in this project. Prada was developed in Warhawk Lab. In particular, I would like to thank Wandersthorst Garcia and Rahal Vagesna for their enormous contribution to this project. I also want to thank collaborators within MD Anderson and out of MD Anderson. Michael Berger from MSKCC, Andrew Schwachinko and Gedi Geis from Broadly Institute. Also this project benefits from TCGA kidney working group and GBM working group. Thank you. We have time for one quick question, one quick answer. Just wondering if you can comment more on the very interesting finding of the Exxon 1213 and 1415 deletions. Do you see, so you see both, do you see both DNA based and transcript based evidence for these deletions? Well, from our data, we see this deletion from in the transcripts because we use RNA-seq data. But I don't remember if we see this in DNA, because this is too short. We can see these in sequencing. We've actually seen these in sequencing data. We have some single cell sequencing data where we've seen, and I would love to follow up with you on this. Yeah, that would be great. Thank you.