 Okay, so first of all we want to be able to identify driver mutations and in particular we would like to know which mutations were a gain of function and loss of function in order to better understand disease, mechanism and treatment. There are many mutations out there that we don't know much about. They tend to be these lower prevalence mutations but they're very much recurrent. And I'm going to talk about a method that I've been developing to infer gain of function and loss of function using an integrated pathway approach. As the quick introduction to paradigm, what paradigm does is it takes in functional genomics data such as copy number expression as well as sets of pathways in order to infer gene level activities and kind of a quick example. If we were to infer higher activity of, say, MDM2 with expression data and a copy number, we would therefore predict less activity for P53. Loss of function and gain of function in terms of pathways, kind of what we would expect is that for loss of function events the genes regulating it might be active in trying to turn up its activity but downstream the genes not functional so the downstream targets are off and the opposite would be true in the case of a gain of function. So I'm working on, my approach is called discrepancy analysis for now but what it does is it tries to leverage the difference between the signal upstream to downstream in order to infer these mutations and the way I do that is I run paradigm in two modes. One where I run it based on the downstream information and once with upstream and then I infer a discrepancy between the activities. Now I'm going to give an example of a real mutation RB1 and GBM, this is the loss of function mutation and this is a circle plot and what it's showing is that the center ring shows you with the black ticks those are mutated samples of RB1 and the rings outside of those expression, the inferred upstream activity by paradigm, the inferred downstream activity by paradigm and then discrepancy score which is just the difference between the inferred downstream and upstream activity and what you will notice is that when RB1 is mutated paradigm is inferring higher activity in the mutated case for the upstream information but downstream it's inferred that it's inactive and this leads to a negative discrepancy score tracking with the mutations. Now if you want to go back to the original data I'm showing the network used to infer this RB1 mutation. If you look upstream RB1's activators such as CDKN2A through C are more active in the mutant case and the inhibitors are less active. So you can see the more red tracking CDKN2A with the black and the opposite for say CCND2 and downstream we're inferring a lower activity. So now for each sample we have computed discrepancy score but we have two populations the mutant case and the non-mutant samples showing the discrepancy scores between the two groups you can see there's a pretty significant difference in the discrepancy scores for the mutant and the non-mutant samples shown in red are the mutant samples. So a more negative discrepancy score is more indicative of a loss of function that I compute a t-statistic between the two distributions and that's what I'm calling a signal score. The next thing you want to ask is now that I've called a possible loss of function how significant is this signal score? And to ask that question what I do is I have a background model in which I'm keeping the same network topology but permuting the genes used to infer the mutation. So the network is the same except the data is permuted out of all the 20,000 genes in the data set. Given this background model shown here is for 100 times I have a background distribution of signal scores and I compare my observed value against it and it's pretty significant as a loss of function call. Now I'm going to show another example but this time I gain a function of NFE2L2 or NERF2 in lung squamous cell cancer and this time you see that there's higher downstream activity tracking with the mutations and that leads to a positive discrepancy score. Shown here is the NERF2 network. You can see keep one it normally aids in the degradation of NFE2L2 but it's more active in the mutant case and it's not repressing NERF2 because you can see that downstream such as NQ01 the expression readout is quite strong tracking with the mutations. Again when you look at the difference between the two distributions of discrepancy scores for the non-mutant and the mutant samples there's a pretty significant difference between the two and again with the background model it's very significant. So the next thing I want to do is show that my approach was both sensitive as well as specific and in order to do this I ran on a set of potential passenger mutations from colorectal cancer and the way I selected those genes was I used genes that had mutesig Q values greater than .5 and it turns out since those genes actually aren't that well annotated in our pathways there are four genes that had enough pathway information to run my analysis on and shown here is one of them PRKDC and you can see there isn't a significant difference between the discrepancy scores of the non-mutated and the mutated samples. So in order to summarize my method I've shown that discrepancy analysis can differentiate between gain of function and loss of function mutations using pathway information and I've also shown that this works in the case of RB1 and GBM and NERF2 and lung squamous as well as showing a negative control. So this approach has potential in that if we can potentially identify gain of function mutations by running my analysis on all mutations in a cancer cohort if we identify novel gain of functions this could provide insight for drug treatment and actually that went pretty quick. We have time for some questions Chris. Hi there's a gain of function loss of function there's an interesting subclass perhaps a minority which are switcher function right IDH1 IDH2 actually is a very nice example of that can you extend your method to get at those? Potentially on that might be able to be done on case by case it would be difficult to do a switch of function since we run our analysis on fixed pathways that we have annotations for so predicting a switch of function would be difficult. Please. So a very interesting approach. Did I catch you right or did you say that all but four of the genes that are annotated in one of the in your paradigm networks or the networks that are inputs to paradigm had a Q value of less than 0.5 greater. So our pathways are fairly cancer enriched so I mean well studied genes that will have not significant p values and mute sig are less likely to be our pathways and we're always expanding our pathways to include more genes. So later our coverage may be greater. So you think that just represents some kind of literature bias or that's actual biology that you're discovering. Most likely literature bias our pathways currently have about 25% of the genome. Linda. If I understand you correctly you are collapsing all the mutation on one gene and considered as a whole in assessing whether it's a gain or loss function or neutral event. Is that true. What do you mean exactly by collapsing. Well most of the mutation we see are not hot spot. You know always exactly the same mutation in the gene. So are you collecting all the mutation that occurs on a gene. Right. And treat them as one entity in your analysis. Right. So all the mutations that are not silent mutations I'm collapsing them together in this analysis but you could potentially imagine that I could split up mutations by their different domains as well and just run them separately. I can treat them as separate mutations. I think that would be useful because there's certainly a lot of not a lot enough functional data to show that different amino acid changes in the same gene can have opposing effects. Right. I completely agree. Thank you. Nice talk. So when you're calculating this discrepancy score are you differentiating between activating or inhibitory relationship that the gene might have with its downstream. Right. So the the paradigm model can handle activating and inhibiting links. So I mean the logic is just kind of switched when you see in the case of RB one I was showing that RB one had inhibitors and you could see that the logic is switched for those genes when the RB one was mutated. Those repressors were less active. Right. Sam I wonder if you could comment on why the distribution of the discrepancy scores in the mutated samples is bimodal. In RB one you noticed that what I think in both. I think it was in both but in RB one it was more more pronounced. OK. One possible explanation is though that mutation could have been a silent mutation even though I mean it could have been a neutral mutation even though it was not silent. Thank you. Thanks a lot. Thank you.