 Hi everyone. First, thank you to the organizers for giving me this opportunity to present the work on network-based stratification of tumor mutations. I'm a student in a triadicus group, and I've just realized that I'm keeping all of you from lunch, so I'll try to keep it interesting and to the point. Let's start. We've discussed at length stratification, so if I just to say one word, it is clear it is important as all of you have tried to do it in various forms. We feel it's a very major milestone in the way for patient-tell or treatment, and there have been many successful attempts. We've seen a few today, and I'll remind us some of the original work by Verhak, who just preceded me, where they've subtyped glioblastoma into four types, I guess five types now, where some of these subtypes have significant association with survival. In other cases, however, for example ovarian cancer, the subtypes that were defined by expression were not as successful at recapitulating a clinical phenotype, and we were asking ourselves, there's remains this type of data which might harbor information for subtyping that haven't been used yet, which is somatic mutations. And then the question arises, why are somatic mutations difficult or why haven't they been used before for subtyping, and somebody already today on this podium sort of mentioned that it's just there, there's not enough of them there. It's a very sparse data type. Here I just plotted the patient by mutation matrix just for chromosome 17, and really the only feature that is apparent is this TP53 mutation, which is in the majority of the ovarian cancer cohort. This is just for ovarian cancer here. And if you were to quantify this a bit more carefully, here I show a histogram on the bottom, a histogram of the columns of this matrix, so basically the frequency of mutations for each patient, and here basically a histogram of the rows. And as we can see, most mutations occur in a very small fraction of the patients. And it is, we could sort of discount these as passengers, however in some sense these might be for that specific patient's very important mutations. So we were sort of curious, my lab does a lot of stuff from a network perspective, and we were curious whether it's possible to basically go from the clustering we see above, the stratification, which is sort of not very meaningful, use networks to sort of provide something that's more meaningful. And we proposed a network-based stratification approach, which is based on consensus clustering. We start as in consensus clustering with a bootstrap initialization of the data, so we draw out a sample. Next we apply a network smoothing step, which is basically starting from individual mutations. We propagate them onto a network, and that will expand upon this further in a moment. On these propagated features, we next apply a network clustering approach, which is basically an NMF with an added regularization layer for a network. And for anybody not familiar with NMF, you should just think about fancier K-means. So we basically do fancier K-means with a network. Finally, when we aggregate the results, we get something that actually seems to contain information, as opposed to the same data when we apply consensus clustering NMF out of the box. So an intuition for network smoothing, or why we think this captures something that makes sense. Here we see sort of two virtual genotypes, genotype A and genotype B. And if you could see at the bottom here, they're very sparse, just have a few ones, mostly zeros, and very, very few, very, very little overlap between the two genotypes. Through network propagation, we are able to smooth across the network, basically allowing influence from the individual mutations to seep to its network neighborhoods, forming a much denser vector that is now, has a lot more to compare between these two genotypes. Basically at the end, forming these areas of overlap between these individual genotypes. We start by testing this out in simulation, and we formed a simple simulation framework. What we varied, the variables we varied are the size of the pathway that we believe is implicated in cancer, or the amount of pathway information that is tied to the cancer. On the bottom is the frequency of drivers. So how much of the mutations are part of the background, and how much are part of the cancer drivers. And when we make comparisons between these two approaches without smoothing, and when we use our network-based approach, we could see that without the smoothing, if the mutations are very, very common, and the pathways are small, we are able to basically capture using standard methods. However, when we use a network-based approach, we could push this area of informative clustering much further into the space. And we actually believe that the real cancer lies somewhere in this space. When we apply this to a random network as a sanity check, we sort of see a degradation in signal, which we found very encouraging. Of course, I would not come here if we only had results in simulation. We apply this to the TCGA ovarian cancer, which I've already mentioned had sort of not quite as interesting association with biological for the subtyping results. And here you see the conventional stratification, which really is not interesting. You get the monolithic cluster and something that looks like a few outliers. When we apply our approach, you get something that really looks like a meaningful signal. The question is, is it biological? And I'd like to argue that it is. Here I plot the association of patient survival with the number of clusters. So we see that the y-axis here is the survival log likelihood ratio, so higher is better. And we see that for quite a number of cluster numbers, we could basically get what is significant association with survival compared to either a permuted or the standard NMF. If we drill down to four subtypes, which we find reasonable both in terms of the association result and due to other intrinsic measures of clustering performance, we can see here that there are four subtypes where there's one subtype that actually performs much worse than the rest in terms of mean survival. And this is also recapitulated when we look at the probability of platinum resistance. So the acquiring of platinum resistance seems to possibly be an event driving this result. If we compare to other data types, which we've thankfully could download from the fire hose, we sort of see that they also get some sort of a likelihood ratio performance. However, it is in fear to what we get when we use different networks. So our method is both recovers different subtypes and subtypes that are actually more predictive of survival. Finally, we sort of asked ourselves, can we take these results as expression measurements are still much easier to come by than somatic mutations? Can we transfer using the TCGA data our results from the world of somatic mutations into expression subtypes? So we basically defined the subtypes as before and we used a supervised learning approach now to sort of predict these subtypes using expression data. So the expression are now used as biomarkers basically to predict the subtypes that were defined using somatic mutations. Now we can apply this to an independent data set and sort of see how well we're performing. So first of all, just to as a sort of a measure of how much this actually works, we sort of test this out in cross validation. Here we sort of see the performance for somatic mutations and here we see the performance for the expression subtypes and we see there's a degradation of performance still above what we'd get by random chance and we are able to recapitulate some of the survival difference within the same dataset and when we apply this to an independent dataset, in this case the totial expression cohort, we still maintain that separation into three of the four subtypes or one subtype is sort of lost due to lack of patience and it does maintain the same trend as we've seen before. Just as a comparison, this is the rerunning standard could send us clustering and MF and this results in what is a substantially different clustering that has a much less remarkable association with survival. Just drilling a bit into the biology, we could sort of define a subtyping of what are the genes in the network that are different for subtype 1, the subtype that actually has lower survival and we see a number of things that sort of are very encouraging. The first thing that sort of popped into our eyes is this caspase pathway that has quite a few hits and is quite widely reported as being tied to cis-platform resistance and platinum drugs. A second result which has been discussed here today is this FGFR pathway. It has been discussed in the context of other cancers but there is a significant amount of work that sort of shows that the FGFR pathway is tied to platinum resistance and human bladder cancer and here in ovarian cancer finally and in two recent papers. So really to summarize what we've shown is that we could use a network-based ratification to recover what we believe are biologically relevant subtypes. We believe that somatic mutation subtypes are different from those recovered for other downstream molecular phenotypes. These subtypes can be recapitulated using gene expression as a biomarker signature and each of these subtypes seems to have specific affected subnetworks which might explain the reason why we are unable to sort of find these specific genes as mutated over entire cohorts just because they are specific to a certain rare subtype of the disease. So one slide summary as we go from here to here using a network just that's the gist of this talk and if I should give some acknowledgments first of all JP Shen who helped me extensively and is here in the crowd Janusz Dukowski and Andy from my team who also had a lot of insights during this work and of course Stray for his help and support. Thank you all. All right. Questions there? That's a fascinating talk. Could you describe briefly your method for network propagation? So we use the method is described in detail in a paper by Van Uno Lao from Roder Charans group but very briefly what we do is we basically use the normalized adjacency matrix and we start with the somatic mutation matrix basically multiplying it matrix multiplying it by the adjacency matrix. So it's the way to think about it is like a random walk model with restarts. You have a parameter that sort of sets how far you want to propagate the signal. Thank you. Question there? Yeah. So I just wonder you show that network based grouping show correlation with clinical survival. I just wonder given those known the clinical feature variables do that provide additional prediction value? So the way we explore this approach has been in a completely unsupervised manner and so we have not included any clinical phenotypes because we sort of wanted to I feel like in some sense that would sort of make it more of a supervised approach like using it just as a feature like we could talk take it offline if you have specific ideas. Well I think he was asking maybe if you did a multi-variable analysis that had the clinical features and your stratification would your stratification be predictive? Then I just answer the question I'm sorry. So we do I do have a slide somewhere at the end we do show that the clinical variables like stage, grade, and age are actually not correlated with the subtypes that we've derived so these are independent of these variables. But given those known clinical variables do you add those network based grouping provide a better prediction? So we have not done the survival analysis in this way but as a sort of a post-processing analysis we can show that these are independent across the different subtypes that answers your question. One more question. So we have done very similar analysis based just on the gene expression not the mutation part. What we do see is very interesting so what we see is treatment type comes out as a compounding variable. So model works very well across platforms across different data generation laboratory and so forth but if we if we translate to another treatment it has no predictive power. Have you done something similar? When you say treatment you mean like the kind of chemotherapy these patients receive? Yes. So in the case of DCGA I think the vast majority of patients got a platinum based treatment so there wasn't any variability in the treatment types and so we didn't really explore that sort of analysis. The results do transfer to the total data set. I am unsure exactly what what were the specifics of the treatment they got there. Thank you. Thank you. Thank the speakers again and we will.