 Okay, so this is quite a new module, so I erred on the side of having more content. So I'm going to go through it quite quickly and feel free to stop me if anything is not well presented or if you're misunderstanding something. Okay, so what we're going to try and learn in this module is the impact of gene fusions in cancer. With the different types of evidence, we can use to identify gene fusions. We're going to try and understand the available detection methods and tools and the basis for these methods. We're going to look at how we can identify common sources of false positives, and then we're going to try and have some understanding of the potential function of these fusions. Just to begin with a quick definition, gene fusion is a novel gene formed by a fusion of two distinct wild-type genes. The canonical example here is BCR-ABL, which is formed when chromosome 9 and chromosome 22 undergo a translocation, somatically, so that's occurring during someone's lifetime. And this is how most of the gene fusions we're interested in are occurring, is some kind of translocation in the genome that brings together two genes. In cancer, gene fusions are relevant clinically because a lot of... There are a lot of examples of a gene fusion that clinically defines a subtype of cancer. So for instance, BCR-ABL1, our example, is present in 90 to 95 percent of CML patients. These events are potentially targetable by drugs, and again, the example here is BCR-ABL1, which is targetable by a drug called aminotep, which is very effective at treating CML patients. The other reason why gene fusions have become more of an interest lately is because we have new methods to detect them. One thing that's not listed here is that with the discovery of the ETS fusions in prostate cancer, people started to think that maybe these events had implications for solid tumors, which make up the vast majority of morbidity in terms of cancer. And so they're far more important in terms of saving lives, solid tumors, than leukemias, which is up until, say, 10 years ago, where gene fusions were predominantly thought to occur. So what is the evidence for gene fusions as being initiators of carcinogenesis rather than just passenger mutations that occur and have no effect on the phenotype of the cancer? Well, they correlate very well with cancer phenotypes, so they define subtypes of cancers. A successful treatment often results in the eradication of the fusion product from someone's bloodstream, say, for instance, in CML. If you successfully treat a CML patient, then there's no more translocation that's visible in there or detectable in their blood. Gene fusions also produce neoplastic disorders in mice, and if we perform experiments in models where we silence gene fusions, then we can reverse the tumor genesis process. There's two classes of gene fusions that we like to talk about, and one is a class of gene fusions where we deregulate a proto-oncogene by translocating a promoter of a highly expressed gene next to a gene that is a driver of cancer. So the example here is IgH being translocated next to MIC, which is a proto-oncogene, and the IgH promoter drives the expression of MIC. In this figure we're just showing where the potential breakpoints can happen in MIC and IgH, and then at the bottom here we have the hybrid IgH MIC gene. So the other class of gene fusions is a fusion where more than just the promoter region of that 5-prime gene is involved in the function of the fusion, in that perhaps both genes are contributing functional domains. So the example here is again BCR-ABL1, which is BCR and ABL1 are both contributing functional domains to the end product, which is this fused gene. So we can look at the gene fusions that we've already discovered and try to understand what the content is of this collection of genes that tends to form fusions with other genes. And a predominant group is the tyrosine kinases. These are genes that transfer a phosphate group from ATP to a protein, and they're involved in signaling, and they regulate complex processes in the cell. And there's some examples on the right. Transfer from tractors are another large class of genes that form fusions. The ETS genes and MIC are an example of that, and then there's also the proto-oncogenes, such as BCL2 and BRAF. If we draw this as a network where genes are connected when they form fusions, they form what we call a scale-free network, with the connectivity here as following a power loss. Basically what this is saying is that there's a few guys at the party who know everyone, and then most people just came with a friend. And they form three big clusters, one centered around MLL, another centered around BCL6, but there's basically three highly connected clusters here. The genomic effects of gene fusions are important if we want to detect them. So the first two obvious effects are chimeric DNA sequence at the translocation boundary, where we have, say, a translocation, and the chimeric sequence is a sequence that's partially chromosome 9, partially chromosome 17, say. So that can also result in the expression of that particular breakpoint in a mRNA, although that doesn't always happen. Sometimes expression happens downstream of where the breakpoint actually is. And then we have expression changes if we have, say, an upregulated gene. And so if we're thinking about expression, then one way the expression patterns have been used to detect gene fusions is where they were used, the expression arrays were used to detect the ETS fusions in prostate cancer by just looking for upregulated genes. And they found that ETV1 and ERG, both ETS transcription factors, were both highly upregulated in prostate cancer and that they were also mutually exclusively upregulated. So either one was upregulated or the other in different samples. And from this analysis, they were able to do some lab work and validate TMPRSS to ERG in prostate cancer, which occurs in over 50% of those tumors. This actually technique was only used once to find that fusion, and after that I don't think it's been applied since then. Genome sequencing is another method. The downside here is it's quite expensive, although it is very comprehensive and it'll find things where we don't necessarily have expression of a product that is partly one chromosome and partly another. And it also doesn't give it expression information, but it was used to identify this one fusion in colorectal adenocarcinomas. And finally, what we're going to focus on mostly is mRNA sequencing. And this has the benefits of being quite inexpensive. It gives you information about the expression of the genes. You can use it to identify an exact fusion transcript sequence, although it's not as comprehensive as genome sequencing. And it was used to discover these massed and notched fusions in breast cancer. Although interestingly enough, these people have had trouble reproducing these results, and so this is kind of pointing to the fact that we're still in the infancy of identifying fusion transcripts for mRNA. So how is this data generated from the beginning where we have RNA molecules all the way to the end where we have sequences? First we isolate RNA by doing a pull-down on anything with a poly-A tail. That gives us mRNA. And then we do first transcription to get CDNA. At this stage, we lose the information about strand. So we go from single-stranded RNA to double-stranded DNA. And then we size, we fragment into smaller segments, and then size-select those, and do sequencing on each end. So is mRNA sequencing the ideal platform to do fusion discovery, or do you think it's a cheaper one, like if you use exome sequence? So exome is definitely not a good platform, because it only samples a very small subset of the genome, and the chances are that your breakpoint is actually going to be an introns of the two genes, because the introns are the largest part of those genes. And so you won't actually identify the breakpoint from exome sequencing. RNA-seq definitely, I think it has the most potential for sure, for identifying fusion transcripts. But again, it won't identify rearrangements that are still contributing some kind of gene fusion activity, like maybe they're translocating an enhancer next to a gene that is upregulated. Something like that won't be found in RNA sequencing, because the translocation exists sort of almost outside of the transcriptional region of the gene that's being produced, if that makes sense. Okay, so what does it look like when we sequence a gene fusion with RNA sequencing? Well, at the top here we have our gene fusion, which is a fusion resulting from a translocation between chromosome A and B here, and it results in a fusion transcript that's annotated on the top, where we have three exons of gene X and two exons of gene Y. This is transcribed into this messenger RNA, and then from that we get a number of RNA-seq paired-end reads. And these reads we can classify into different types, so I called these reads that don't have any indication of the breakpoint in them, that basically they're all one color in this figure. Those are wild type reads, and then we have spanning reads, which are reads for which one end is entirely from one gene and one end is entirely from another gene. And then split reads are the reads where the actual read sequence that we've sequenced on the Illumina machine or other sequencer is partially from one chromosome and partially from another chromosome. So now the problem is to take a collection of reads that look like this and try and assign them to the original locations in the reference genome that we believe that they originated from in terms of a healthy human reference. And this is, I guess, I would call it here the RNA-seq alignment problem. So the problem, say, for this read is to identify which location this green segment comes from compared to these other orange segments. You can see that this is also made difficult by the fact that the read on the right is split by this intron. So we have, reads can be split by the fact that they are from multiple chromosomes due to effusion or multiple regions within the same gene due to splicing. Yeah. Sorry. I don't think it's in either of the cases. So this is, yeah, that's basically the problem that I'm trying to define against. But that's a side of the alignment, so the alignment is... So I guess, yeah, determining that alignment is the problem. And that includes determining where the split is in the read. Does that answer your question? You don't have, but how do you decide where the split is? So maybe I'll go into that in the next few slides, and if you don't understand that by then, then totally ask me again. Okay, so, and alternately, we can, since we have a very good understanding of the genes and their splicing in the human genome, we can just assign to the gene sequence, the mRNA sequences themselves, which makes this problem a little bit easier in that we don't have to worry about splicing, although in some cases rearrangement can induce some novel splicing, so there's a reason why we wouldn't always want to do what we do on the bottom here. Okay, so at this point I'll mention that there's two paradigms that are used for RNA Seq. One is alignment first, and the other one is assembly first, I'll call them. For alignment first, we first try and find the locations of all of these reads in the genome or transcriptome independently for each read, and then after that we cluster the reads based on their alignment in the genome into contigs, basically. Conversely, in assembly we kind of do the clustering first, we try to cluster reads together that look like they're from the same messenger RNA, and then after that we align them to the reference genome to determine whether or not they come from two distinct genes. And so the clustering in terms of assembly, if you're not familiar with assembly, it's basically about looking for a suffix of one read that matches a prefix of another read, and then stacking them up like this so that we can build a longer sequence out of a bunch of shorter sequences. So I'm mainly going to focus on alignment-based approaches because those are the predominant approaches that are being used for gene fusion discovery in RNA-Seq. In essence, the problem here gets harder as we consider more and more, I guess, differences within the read with respect to the reference. So if we have an exact sequence, a read that exactly matches our reference, this is a fairly tractable problem that we can solve easily. If there's perhaps a few mismatches or indels, small insertions or deletions, this gets a little bit harder. And if we have non-contiguous alignments to separate chromosomes, then this gets a lot harder. The general strategy is pretty much the same for a lot of these algorithms. It's basically to take a problem at the bottom here, the non-contiguous alignment problem, and make it, massage it until it looks like the exact aligned problem. So we leverage the fact that we can solve the easier problem to solve the harder problem. So one way we can do that is to do something with the read sequence, and the easiest thing to do is just to split it into sections. And so say we take this read, split it into three parts, and then if this read comes from a fusion transcript in the middle where there's a fusion boundary in the middle, then this middle sequence won't map very well, but the sequence on the left and the right will map perfectly to the chromosomes from which they originated. And then we can, after the fact, it's pretty easy to reconstruct the exact breakpoint. And the second thing we can do is, based on some prior information, we can do something with the chromosome sequences. We can leave the read sequence alone, and we can do something with the chromosome sequences. Here, if we know that a priori that there's a fusion that involves this green part, the part of this green chromosome here and this part of this red chromosome here, we can merge them together approximately, and then do a, a gapped alignment of our read sequence to the, to this sort of fabricated chromosome sequence, this, this chromosome sequence that we've created with our prior knowledge. And this is an easier problem. And the way we do that often is by using paired-in read information. So if we see a read that maps with one end to one gene and the other end to another gene, then we can say maybe there's, there's a fusion between these two genes. And we can, there's, there's basically two ways of doing this. We can either take the approximate region where we think the breakpoint is on either side, put those two regions together, and try and align reads to those regions in, in this way that we've described here. Or another way is people assume that the fusion only occurs, the fusion boundary only occurs at an exon boundary. And then they just concatenate all of the exons together and try and align to those concatenated exons. So there's a lot of fusion discovery tools based on the alignment approach that have come out recently. A lot of them are quite similar, and they're named quite similarly. But yeah, basically they all do some variant of the stuff we've just, we've just covered. Most of them give you the exact sequence. A lot of them segment the reads as a way to deal with split reads. A lot of them leverage paired input and information. And then a subset of them use an approximate reference or an exon junction reference to realign split reads and, and recover those split reads. I'm not going to talk too much about assembly-based tools, although we are going to, we're going to try and do some assembly of some fusion transcripts in the lab. There's a couple good options, transibus and trinity. Trinity is pretty good because it gives you full-length fusion transcripts, whereas transibus just gives you the, the sequence at the fusion boundary, which is often difficult to take and map back to the genome. And then what you, after you have the contigs, you can use something like GMAP, which is a really nice contigal liner, or a dissect, also a contigal liner, or barnacle does a little bit more in terms of, it actually does the annotation for you. So based on some evaluations, chimeriscan, top hat fusion, and diffuse are among the top most sensitive ones, and we're going to run all three of those in the lab. Diffuses is my own creation. So the other thing I would mention is that results from simulations are, are not really very reliable at distinguishing these tools' performance in real data. Yeah. Yeah. Is there anything that's very different between... So are you talking about a novel insertion of completely unknown sequence? Are you talking about insertions, if the insertion is coming from somewhere, so is there anything specific that's more easier or harder to detect than a gen sort of sub-processing? Well, I mean if you, if the insertion contains, if the inserted product contains a gene, then just, I guess by semantics we call it a gene fusion, if it, especially if it produces mRNA that is from two genes, one of them from the inserted sequence. So... So is it then just on the functional side of it? I think it's just because you know that if you take the two genes, you care about this as a particular class of insertions, or are there certain features that need... So to answer your question, I think it's, yeah, you have to recall back to your starting material for this experiment, is messenger RNA or an RNA population and not DNA? So you're already looking at something that is moving forward to make, be made into a protein, and so it is falling into a different class. So I guess if you have a piece of a chromosome that maybe doesn't have any genes inserted somewhere else in the genome, that happens a lot. And it happens a lot that that gets transcribed. Maybe there's a promoter that's inserted somewhere else next to just garbage sequence, intron sequence or something like this. So you'll get something that's expressed. Is that a gene fusion? I guess it's kind of semantics, right? It's not in the strictest sense in terms of creating either a highly expressed proto-oncogene or a chimeric functional gene. So is it just a semantic difference then that there's no sort of features of the infusions that make them easier or harder to detect that we should come through? No, not in terms of detection. I don't think there's that much difference. Because, I mean, we could... In the example I just gave, we could have an insertion of say a promoter into a region that doesn't have any known gene sequence, but perhaps there is something there that's relevant to the cancer. Maybe it's a non-coding RNA that's unannotated or something. So, anyway. So will your fusion program just call general insertions and then you filter it out just to see if this is actually... So often they're... Some of them are based entirely on just restricting to the known set of genes, and then some of them align also to the genome and the transcriptome. So there's the possibility of finding sort of... I call them expressed rearrangements, but... Yeah, for a better word. Is there any program that discovered... Yeah, so... I guess I don't think of it in terms of generally... You know, it's good to give an example in terms of different chromosomes, but really these can be any two regions of the genome that are somehow pulled together to bring two genes together whether they're inverted or it doesn't matter. I would say it's more a type of rearrangement, whether it produces a gene fusion depends on whether the translocation breakpoints occur within genes or close to genes. Yeah. What about the distance of the boundaries? Because in terms of... If you're thinking about insertions, they're usually... they tend to be pretty slow. They're very large, it's hard to detect them. Yeah. Gene fusions, I guess, they rely on very... they connect very distant parts of the genome. Yeah. Is this correct? So I guess it depends on your definition of insertion, right? Like if you're talking... I guess you're talking about small insertions, which couldn't be gene fusions. These have to be very long segments, like gene-sized segments that are being moved around in the genome by rearrangement. They're mostly insertions that are small. The couple base is 1, 2, 6... That's the difference between them. Yeah. If you're thinking about insertions that are things that are small, if you're very, very large, it's completely different than that, in terms of... I think you can classify... so there's small insertions, and we can just call them small insertions and big insertions, right? Like, there's... the TMPRS is to our... the IRB gene is actually an insertion of one loci into another. But it's an insertion of a segment that's the size of a gene. So, yeah. You mentioned before that this here is a fusion part. Yep. Is there any consistency where the fusion junction occurs? Actually, I have a slide on that. Do you want to wait until I get to it? Yeah, okay. Okay. So, in terms of what kind of false positives we're going to see, there's two ways in which these are produced. One is the technical stuff like alignment artifacts that are produced when we have homologous sequence and we can't precisely align these sequences to the reference genome. Or the other problem that occurs is we have high expression of some... very high expression of some genes, which is homoRNA. And a small amount of read errors in those sequences that come from those highly expressed genes will mean that those genes might map to somewhere else in the genome and will produce artifacts. We also get biological technical artifacts like chimeric reads, which happen for these two reasons. When we're doing the first transcript days, we get a chimeric read produced from two different mRNAs, and then we can get ligation artifacts. These are usually randomly dispersed, which helps us in removing them. And then also, in terms of biological artifacts, there's natural sources of rearrangement like germ line rearrangements, IG rearrangements. And a very important one is transcription induced chimerins, which, if you look at any of the results of these fusion discovery programs, you'll see a lot of these in some types of cancers, where basically you have what is pretty much alternative splicing from one gene to its neighboring gene. So solutions for the first problem, the alignment artifacts, are ones we can... This is a problem we can try to solve computationally. What we did in the diffuse paper is we defined a set of features of these alignments and then trained a classifier. And that's one approach, and another very common approach is just to apply filters. So I'll quickly go through the features that we found to be the most significant. The first one here is... refers to how well when we've reconstructed our fusion transcript sequence, how well the reads distribute across the fusion boundary. If we have them all stacking up together in the same location and just contributing only a small amount of sequence on one side, this sequence that is smaller sequence on this one blue side of the fusion transcript here is often called an anchor because it anchors one side of the fusion to the other. So if this is quite small and they all stack up in the same place, that's a pretty good sign it's a false positive. And that's what we see in this histogram where we have green are positives and red are negatives and we can see that this feature of how well these distribute is separating the false positives from the true positives. Another good feature, and this is a feature that helps us get rid of alignment artifacts and also the ligation and reverse transcriptase artifacts is just to look at the boundary of the fusion transcript in one gene and the other and see if there's a canonical splicing signal which is generally 99% of the genes. It's GTAG is how most genes are spliced. So if we see that splicing signal then that's a good sign that the translocation happened in the intron of both genes which is the most likely thing to have happened and that the resulting translocated chromosome spliced out that whole new intron sequence. And then we can see here that this large mass on the right of green of the positives is showing that we have a match here for the GTAG signal and the red negatives are just kind of a random match to GTAG. The next feature is just how well these we are able to map each side of these paired end reads. If we see maybe one or two alignments on either side for the paired end reads then that's okay. But if one side of this fusion transcript maps to many different places then that's giving us information about how ambiguous this prediction is. Another quite prevalent feature is if we reconstruct the fusion transcript and look at how the supporting reads align to that new fusion transcript we can ask the question how do those reads that align to the fusion transcript look the same in terms of their parameters as the alignments to just regular wild type genes? For instance, one of the things that we can look at is how far apart each end maps from each other. And if the distance is around the same as what we expect given that we can align to all of the wild type genes with the wild type reads then we can say that we have confidence in those reads. Finally, how well do the previous feature with split reads know how well do the spanning reads align to the fusion transcript? And then the last one is do we have any homology in the region of the breakpoint? So here on the left in this part of the figure if you can see where the mouse is pointing we have each side of this predicted fusion transcript maps to the gene to which it's predicted to have come from and only that portion of the sequence whereas on the right here the part that we predicted is coming from gene A plus a little bit of the part that we predicted coming from gene B or maybe even more in especially bad fusion calls can be mapped to gene B. So in essence we have this region at the fusion boundary that can be mapped equally well to gene A and gene B and that's showing us that this is probably an alignment artifact. So in terms of biological artifacts one of the things we can do is look for that GTAG signal. We can also screen against existing databases such as repeat masker, num T, nuclear mitochondrial insertions, IG rearrangements are prevalent so maybe we want to get rid of anything that is between two IG genes. And finally, read-throughs or transcription induced chimeras are pretty easily identified because they're between adjacent genes. Once we have a set of candidates we're confident in how do we prioritize them? Well, we can look for stuff that's highly expressed where perhaps there's interruption in the expression profile across the gene compared to what we expect the wild type expression to be. We can look for recurrence of either the pair of genes or one of the genes especially if it's seen as rearranged in other cancers. We can look for corroborating rearrangement if we have any information about the genome such as whole genome sequencing or if we have copy number data then we can look for a transition in copy number somewhere in the genes. We can also look at the gene's function. Is it a kinase? Is it previously implicated in cancer? Could it serve as a drug target? And we can look at whether the function is preserved by the fusion. And one important aspect of this is whether or not the reading frame is preserved for both genes. Here's an example where we identified two gene fusions that we have a lot of confidence in just by looking at the position of the breakpoint relative to the expression across the gene. And so for this R3 C-rad fusion the expression pretty much stops after about 4,500 nucleotides into this mRNA sequence and that happens to coincide exactly with the breakpoint. Same with this gene on the bottom here. And so this means that we can do an expression analysis on our fusion transcripts as a way of maybe having more confidence or calculating the expression of the fusion transcript for one but also eliminating false positives as ones that maybe have zero expression. The caveat here is of course if you have a balanced rearrangement say you have CMKLR1, HNF1A and you also have HNF1A CMLRK1 which does happen. If you have those two fusion transcripts then there's not really anything... it's unidentifiable where the reads are coming from. Okay so in terms of prioritizing we can also look at whether or not our fusion partner is recurrent across multiple cancer types. So BRAF for instance is translocated in a lot of different tumor types which would have evaded us if we just looked at one cohort. What we also see here is it's also translocated in very similar ways in ways that preserve this last protein kinase region at the end of the BRAF gene. So when we look at... another way of prioritizing these is basically to look at whether or not the relative gene position whether or not it could be a read-through. Read-throughs are quite prevalent as we see in this... basically on the left here we're showing a number of inter and intracromosomal rearrangements and then a number of read-through events. And then this matrix shows tissue by rearrangement presence and then there's a histogram on the right of the information at the same information and it's just showing that the only thing that's really recurrent here out of all of these events is TMPRSS2ERG and then there's a whole number of gene fusions that are pretty much only in one sample and this can be contrasted to the read-throughs which are often found in many, many samples. And I'm just going to skip this image on the right. So we don't want to completely rule out read-through events because some of them have been found to be interesting. For instance, this particular read-through in prostate cancer has been found as very important for the disease as it regulates prostate cancer proliferation. And what we also know about this is it's tissue specific. It's very specific to prostate but it's also... and that tissue specificity basically comes from the fact that we can see 45A3, the 5-prime gene and the gene that would be contributing the promoter. That is itself is prostate specific. So finally we can look at where the fusion boundary exists in the gene model. And so on the left we have different possibilities for connecting these axons with the gene fusion. And the most common for read-throughs is the second to last axon of the 5-prime gene and the second axon of the 3-prime gene. Although there's a lot of read-throughs have other patterns. Predominantly that's not the predominant pattern for interchromosomal and interchromosomal rearrangements. They can have any distribution of axons that are fused from 5-prime to 3-prime. And so the other thing in thinking about function is whether or not our protein sequence is the sequence of the mRNA that's getting it translated is actually going to produce a chimeric protein where the 5-prime set of codons and the 3-prime set of codons are both going to be preserved in the resulting fusion protein. If our fusion comes together in a specific way then these codons obviously if it happens at a codon boundary if the fusion boundary happens at a codon boundary then the 5-prime codon sequence and the 3-prime codon sequence will both exist in the resulting protein. Even if they don't come together at a boundary but they come together in a specific way then we can maybe there will only be one nonsense codon. If they come together in any other way then the downstream of 3-prime protein sequence will just be nonsense. And so this is easily computable and then can be used to assess whether or not 3-prime protein sequence is relevant. So finally we can this is from some of my own work where we looked at the corresponding rearrangements that were producing gene fusions in prostate cancer. And this is a particularly complex case where we had two fusion transcripts we were interested in. They're highlighted here on the right this red and blue transcript and this yellow and green transcript. One is involving shank 2 and one involving mick. And we found that they were produced by this complex event where three chromosomes at four loci were brought together and simultaneously translocated to produce four breakpoints and this simultaneously produced these two fusion transcripts. And the second thing we did in this paper is we looked at whether or not we could identify complex breakpoints where it's not necessarily just one chromosome loci translocated and fused to another chromosomal loci. And in many cases in this particular cancer type the breakpoint was more complex than we expected. And so for example here the breakpoint involved one kb insertion and another one kb insertion at the breakpoint. So this if you're trying to discover gene fusions from whole genome sequencing this has implications because you would only be able to detect these independent breakpoints. You would basically detect all three of these breakpoints and you wouldn't be able to piece together necessarily that the D12 was fused to this PHF20L1 gene producing a fusion transcript. So just to finish up a few considerations when we're thinking about designing the experiment. So with a large cohort size sometimes your computational cost can be pretty prohibitive with applying some of these methods. But often you can leverage the fact that with a large cohort size you could look for things that are recurrent can be often classified as highly recurrent. So say 90% can often be classified as artifacts. And so a lot of the even though we're in some cases we're looking for something that's highly recurrent because we want something that defines the disease most of those are actually going to be something that we can filter out. So if we're smart about filtering things that are recurrent then we can often use that to hone down on things that are real. Can you comment on the show? Is that recurrent? Yeah, this is very unique definitely. I think there's a lot of so what we're finding in prostate cancer is that TMPRESS is too erg and those fusions are in 50% and then there's a whole catalog of fusions that happen in a very small subset of cases that are unique to each specific tumor that may still have implications. For instance this translocation here produces a up-regulated MYC gene in the same way that it does in Burkitt's lymphoma. It's actually in the same entron of MYC. And so it's relevant to this particular patient but not necessarily to the cancer in general because we've only seen it once. Yeah. And so finally here if we are doing an experiment where we suspect that we have a set of fusions, partners that we want to understand what they're fusing to then we can do something like targeted capture instead of our MYC. And so I'll finish up just with this sort of good news story. Even though this problem is hard there is a definite reason to continue to find these fusions even if and I think that in the future these fusions as I kind of alluded to will be quite rare but still relevant and this is an example where we found EML4 ALK and lung cancer and only 5% approximately of lung cancer cases so it's not very prevalent but now we have a drug that has just completed phase 3 trials which can be used to treat it and that's it for the lecture.