 since to find new drivers in cancer and in particular kinase drivers in cancer because we're mostly interested in kinases. So as we've discussed several times already in this meeting, kinase fusions or fusions in general are a result of genomic instability in cancer and we know of many examples of strong drivers in cancer, for example BCR-Able or EML-4-ACC that are frequent in CML and lung adenocarcinoma respectively. And these are very strong drivers because the clinical trials and approved therapies associated with these alterations are very successful therapies to treat these cancers. And RNA-seq data produced by the TCGA is a very powerful dataset to discover new kinase fusions and new fusions in general and to date there are more than 10,000 samples that have been sequenced in TCGA across 33 tumor types. So we sought to find new kinase fusions across this dataset and initially I used some of the publicly available algorithms at the time in 2013. For many of them either the time to run those algorithms was way too long above half a day or a day on eight cores per sample so I was really too much to run on the entire dataset or they didn't have enough, they were not sensitive enough to discover all the fusions and they were missing a lot of the known fusions. So we sought to develop a new algorithm that would be both very sensitive and also very fast to be able to run across the entire dataset. And so we run it on this dataset, there are a few of these cancer types that are still under embargo from the TCGA so I'm not going to present any results about those but I will have the, what I'm presenting is the data on all the remaining samples. So rapidly what the algorithm is doing is really what all the kinase fusions or what all the fusion detection algorithms are doing. First there is an alignment to the genome using our knowledge of the transcriptome and this is done using the star aligner which is a very fast aligner. So this is done so conveniently what the star aligner does is separating the aligned reads into two different BAM files, one containing all the reads, paired reads that align perfectly to the genome and another BAM file that only contains the chimeric reads that could be supportive of a fusion. And then what the fusion detection algorithm does is go through this smaller BAM file only containing reads that are not aligning properly to the genome and finding if there are any pairs that could be supportive of a fusion. So in a sense it can discover any single fusion that is present in the data and really the RNA-seq data are a very powerful dataset to discover fusions because there are really a lot of reads in the data that support these fusions. And so in the final step of the pipeline there are some false positive detection that reduces the number of false positive and passenger events that can be seen in the data and I'm going to describe how that can be done. So I want to spend a minute really here because these post-processing tests that are applied to the results are very important in order to discover what could be a real driver events in those tumors as opposed to passenger or false positive events. So as I said there's a first step in the pipeline is the fusion detection step that is where all these supporting reads for all the fusions are assembled and counted. And then in the post-processing step there are heuristics to find first the passenger events and these are real fusions that exist in the data but that don't have the properties of real fusions. For example, they don't have an exonings on junction or their coding sequence is not in frame or they're cutting through protein domains making the proteins unstable or finally in the case of kinase fusions they would not contain a kinase domain. And I want to stress on the fact that there are really many of these kinase fusions that exist in the data that do not contain a kinase domain. There are many examples of such recurrent pseudo fusions that are a result of copy number amplification examples such as RPS6KB1 in breast cancer or other kinases that are known to be amplified do have some fusions in RNA-seq but either they do not contain a kinase domain or they are resulting in stable proteins. So it is really important to check whether the punitive translated sequence could be supportive of a real protein activating event. In another step there are also heuristics to filter out false positives and this is done using a large data set of normal samples both from the CCGA there are about 600 RNA-seq normal samples and also from the GTX data there are 3800 samples that have been sequenced by the GTX project and so the union of both data sets so I've run the same pipeline fusion detection pipeline on the union of these two data sets to discover what could be false positive events and then filter out anything that would appear at a certain frequency above a certain frequency in this data set. So in the end there is a list of kinase fusions that we think could be functional. Also I want to mention one reason for some false positives which is the very high expression of one of the partner genes in the fusion and those results in very nonspecific fusion events between two genes that occur without a clear break point so that is also very frequent reason for which some genes might appear as fused with others. So the output of the pipeline is presented here across the tumor types that across several tumor types and what I want to show is that there is as expected there is a very the frequency of kinase fusions or recurrent kinase fusions varies greatly from one cancer to another a third cancer is as it is known contains more than 13% of such events and on average across cancers across solid tumors there is about 2% of recurrent kinase fusions. This is a plot that presents the thresholds that have been applied in terms of the number of chimeric reads and speed reads to filter for fusions and there are two things that I want to say here one is the plot on the left presents the recurrent kinase fusions that have been discovered by the pipeline and the color code separates those as with known fusions known kinase fusions and in blue the novel kinase fusions that we've discovered and there's there really is no clear bias in terms of the number of reads so there these these novel events that we've discovered they don't have less genomic evidence another thing I want to show is that the passenger and the singletons so the non-recurrent kinase fusions that are present in the data they tend to have lower number numbers of reads so really applying these thresholds is is necessary to filter out a lot of the pulse positives but in the same time we don't want to miss we don't want to miss novel events so this is the result of the the entire pipeline presented in this this matrix where the genes are the recurrent kinase events are presented on the left across 26 tumor types at the bottom and therefore roughly so there are four types of fusions that I want to describe first the the pipeline was able to recapitulate all the known fusions that were that had been described in in all of these cancers and many of which have been already described today such as the rough one fusions or FGFR fusions so we do find all of these events and all the fusions that had been described in the TCGA papers and others have been recapitulated by the by the pipelines with a few notable events for example the PRKACA fusions that have been discovered last year in fiber lamellar HCC there are six examples of those in the TCGA data two of which are annotated as fiber lamellar hepatocellular carcinoma two are annotated as HCC and two do not have clinical annotations in the TCGA other so there there are many events that we were able to recapitulate this way second there are novel fusion there are fusions that were involving known kinesis but for which we found new new partners and that is really a recurrent theme for for many of the fusion finding efforts there there really are a lot of gene partners that that tend to be fused with with kinesis and other genes in general so that is really a message for diagnostic efforts to find those fusions because they cannot be specific for just one one driver gene and its partner they need to be really agnostic of the partner. Third and more importantly there are some novel indications where we found some of the fusions that were known before for example the red fusions could be found in colorectal adenocarcinoma or breast carcinoma another example is the in FGA for three fusions that could be found in prostate cancer in one sample so really there are several examples of these fusions at low frequency in other cancers and finally and most importantly there are some novel examples of of kinesis fusions with genes that were not known to be to be involved in fusions before first met and pic3ca which are obviously known oncogenes are and that were not known to be involved in fusions before we found several fusions for these two genes six fusions in met and four in pic3ca but there are also some other kinesis fusions that are that we believe are our drivers in those samples that are involved in kinesis that were not known to be associated with cancer before and an example is FGR for which I'm showing here four for four samples that harbor fusions of FGR there is also a fifth one in a cancer type that has not been released by TCGA yet so quickly I want to go through a few of the examples in red we were able to describe some novel indications these are plots that describe the putative sequence of the of the protein the predicted protein these so these are known partner gene ccdc6 and arc1 in respectively correct or you know carcinoma and breast cancer they contain the cold cold domain that cause red to dimerize in order to activate we found other these are examples of novel partners for for red fusions they contain the kinesis domain they contain also the partner gene contributes cold cold domains that create that cause red to dimerize and other activate so these are all in steroid carcinoma these these are examples of the the the filters that we apply to verify that these fusions are functional I also want to describe an example of met the sorry met and peak 3CA fusions starting with peak 3CA actually these fusions are interesting because they are really five prime UTR fusions in in some examples tbl1 xr1 is contributing only the five prime UTR exon and it is fused with the first the exon 2 but which is also five prime UTR in peak 3CA so really the entire coding sequence of peak 3CA is expressed in in those fusions and it has the effect of overexpressing peak 3CA as is shown in these three plots here showing in breast cancer with peak 3CA is known of course to be mutated and amplified this would be a third way of activating peak 3CA by fusions where we see that the two samples in red also with the arrows harboring the fusions are among the highest expressors of peak 3CA and the same is true for the two other tumor types with met the two way so we found two examples in kidney papillary carcinoma where met is known to be a driver highly mutated these these examples of met fusions contain a cold coil domain that is causing met to to dimerize and activate there are also four other cancer types in which we found met fusions but this one is particularly interesting because met was already known to be a driver there this is an additional evidence that the peak 3CA fusions are real the four samples are shown on the left these are only the reads that are involved in the in the fusion so only chimeric and speed reads involved in the peak 3CA fusions showing here the UTR 5 prime UTR of peak 3CA so these real promoter fusions causing overexpression of peak 3CA and lastly I want to show an example of this novel fusion that we discovered between what's F2 and FGR so this is also an example of a 5 prime UTR fusion akin to the peak 3CA fusions that I just described what's F2 sorry FGR is a Sark family kinase it is known to be highly expressed in some hematopoietic cells and some cancers as well it had had not been involved with with cancer and had not been first implicated with cancer so we don't really know its role or anything in cancer however we know that is a viral oncogene homolog so it could potentially be oncogenic the five fusions that we found in the TCGA the asset across five different tumor types are all in the form of what's F2 fused with FGR so this is a strong argument that this very recurrent event could be oncogenic again as I've shown with the peak 3CA the same is true with FGR the samples in which the fusions are present overexpressive GR at almost the highest level without a copy number amplification this is the fifth tumor type that I was talking about and interestingly we have also found this fusion in a cell line and also because we were talking about the clinical implications of some of the TCGA work earlier we have been we are collaborating with a large hospital that has included FGR in their in their clinical sequencing panel and these fusions have also been found in patients so we have really we're hopeful that this could be discovered in additional tumor types and maybe at higher frequencies than the frequencies that I've shown here and finally because I'm over time and track and track one two three fusions are recurrent across several cancers this is the same theme as FGFR fusions that are recurrent across cancers so this really underlines the the necessity of looking for for these fusions in in more than just specific tumor types for diagnostic purposes so finally I want to to summarize that my this talk and two with two parts one these there there were updates to the study we published last year that I presented today for first there were additional tumor types that were analyzed so that I presented here one thing that I didn't talk about it the FGFR two fusions are found in Corengio carcinoma at a frequency of about 10% in Corengio carcinoma is one of these six new cancer types presented here Jack one fusions are also novel they were not described before and there are two that we found in two different cancer types between IG 14 and Jack one there are two novel FGFR fusions compared to the study we had published and also the PRK ACA fusions in liver cancer are a very important feature in in this in this analysis because there are six six of these fusions in the liver cancer data set and finally some some broader implications of this work this was the first pan cancer fusion analysis we focused on Chinese events because these are strong drivers of the disease and believe that this could have profound the discovery of novel fusions could have profound implications for diagnosis treatment and drug discovery finally I want to thank the cancer genome atlas because none of this would have been possible without these data being publicly available and also I want to thank other colleagues at blueprints starting with a crystal finger or CSO and other colleagues that have participated in this work thank you Niko great talk and really a great illustration of how the TCGA data can be used beyond the network and in drug discovery so thank you and I was really interested by this FGR fusion because one of the things that's been really mysterious to me you know for the last 20 years or so doing cancer genomics is where are the activating Sark family kinase alterations in the human cancer genome and this could be to my knowledge the first one and so what I'd like to know next is what's the evidence that these FGFR fusions are functionally important and if there is evidence are you at liberty to share any of that so what what I can't say is that we're working on it the second thing I want to say is that actually some papers coming out of your lab shown that FGR was one of the genes that went over express could induce resistance to some therapies absolutely resistance to a lot of therapy right but it's not it's and if I recall correctly it's in a region of amplification in some eGFR mutant lung cancers as well right and so so this is I think these cancer types are not the ones where we're going to find FGR fusions that are high frequency there might be a rare cancer type just like FLHCC where FGR is a main driver event there are no activating mutations that have been described in the TCGA there are really very few events yes it's it is an interesting interesting event actually my question is related to this and goes along the idea of trying to infer which of these fusions are really driving and having a major impact for the tumors harbouring them have you tried to look at the local amplification of status of the fusion genes because one of the point we always find is that those that seem to have then a biological dominant function clearly those fusions that don't show these broad amplification events they show extremely micro focal amplification events that typically involve one or sometimes both of the regions implicated in the fusion genes so have you tried to go on the SNP array you know high density SNP array and see whether you can identify very focal amplification events on the future genes right so for for the known fusions I haven't done this work but for the novel ones I've been looking at them particularly closely to know if there were some amplifications that could that could maybe cause these to be artifacts of that could maybe cause the fusions to be artifacts in the amplification and it is not the case there are no amplifications or there are maybe broad amplifications around the gene but no focal amplifications however the SNP arrays are not the definition of the SNP arrays is not sufficient to receive these events that are very to see very small events the other thing I can share is that there is one sample harboring FGR fusion that has whole genome sequencing and I so for one of these five there is whole genome sequencing and I was able to see also the fusion in that sample which seems to be just a chromosome or break without amplification sometimes even those NIPA 6.0 if you go to the single probe level you can actually find those type of things but it's true if you do a standard the segmentation it can be very upright thanks everyone and enjoy your lunch and please come back at 2.25 thank you