 So firstly, I would like to thank the organizers for giving me a chance to present our work. Today, I'll tell you a story about how we use the TCGA data to understand the role of a common but not well understood mutation in acute myeloid leukemia. Thank you. First, a little bit of background on the disease. As many of you might be aware, acute myeloid leukemia is a disease characterized by accumulation of myeloid precursor cells in the bone marrow that are blocked in their ability to differentiate into mature blood cells. And like many other cancers, AML is associated with widespread deregulation of DNA methylation. So some work done early in 2010 comparing leukemic blasts to normal CD34 cells showed both regions of aberrant hypermethylation in the red here as well as hypermethylation. And so what was most interesting for us was to see what was the source of this aberrant methylation. There are several reasons. It could be stochastic, as has been shown, cancer methalomes have been shown to be quite stochastic. It could be a cell of origin issue where the methylation pattern that you're seeing is a result of the cell from which the leukemia originated, or it could be a genetic mutation. So it could be a mutation that's actually driving aberrant methylation. So we were particularly interested in this question because mutations in DNA methylation machinery had been identified in AML, but at the same time those mutations could not explain all of the aberrantly methylated samples. So we would actually have hypermethylated samples that didn't have mutations in any one of these genes as well as aberrantly hypermethylated samples. So we sort of, because of the availability of data from the TCGA with both mutation and methylation data for the same samples, we set out to systematically analyze this question and we had two goals in mind. One was to see if we could identify any new genetic drivers of aberrant methylation and secondly, if we could actually use the methylation pattern to find leads for a mutation-specific therapy. So to get to question one, we used a tool that was developed by my PIs group several years ago. It's called Boolean Implications, and this was inspired by Boolean logic where an implication actually is a pairwise operation. So in the context of data analysis, this enables us to look at pairs of attributes and look at their relationships. So it was developed for gene expression data and when I started analyzing the TCGA data I actually extended it to consider mutations, copy number alterations and DNA methylation. So what does it look like? It's actually an L-shaped relationship. If you look at the scatter plots of the attributes, so visually it looks like an L-shaped relationship, but of course we have a mathematical way of extracting these relationships. So the first step to extract a Boolean implication would be to discretize the data. So again, we have a systematic algorithm that looks at the distribution of values for each attribute to come up with an attribute-specific threshold of when to decide that the sample is high values or low values. And for it to be a Boolean implication, when you look at the four quadrants in a pairwise plot, you need to have at least one sparse quadrant. And by sparsity I mean that there should be very few samples within that quadrant. And again, there is a statistical test we have to actually identify sparsity. So as you can imagine, there are four different quadrants and so you would have four different types of Boolean implications. The one that I'm showing you over here is what we call a high-high implication. So this says that if A is high, then B is high. So another way of thinking about it is what this is saying is all samples that have attribute A high would almost always have attribute B high. But the reverse is not true. So you would have many samples where attribute B is high, but attribute A is not high. So as you can see, it's a way to derive asymmetrical relationships. So the second implication, kind of implication, is a high-low implication. In that case, you're actually looking for sparsity in the stock quadrant over here. And the high-low implication states that if A is high, then B is low. And that's typically what is known as mutual exclusion in this community. I'm not going to talk about the two other types of implications, because that's not necessary for this analysis. So what does our computational pipeline look like? So we took all the TCG AML samples that had overlapping mutation and methylation data, and that's about 191 samples. Then with the mutation data, we identified 17 recurrent mutations. So these are mutations that occurred in five or more samples. With the methylation data, we used the 450K arrays, so there are 450,000 sites. We did some filtering to remove probes that had low dynamic range. And then we discretized the methylation values. So now I have Boolean data on both sides. The mutation data is Boolean to begin with. I've sort of discretized the methylation data. And then we generate these Boolean implications. And just to kind of reiterate what I said in the previous slide. So Boolean implication between mutation methylation looks something like this. This is a high, high implication. So what this is saying is if IDH2 is mutated, then CPG site A is always methylated. And this is a high-low implication. So what this is saying is if DNMT3A is mutated, CPG site B is always not methylated or un-methylated. So once we have the Boolean implications, we just count the number of methylation high, high, and high-low implications. And when I started this analysis, I wanted to see if we would really see any differences between the different mutations. And to our surprise, it actually fell into four nice categories. So the category on the right over here, so these are mutations that have very few implications with methylation, both high, high, and high-low. So you can see KRAS and NRAS fall into these, which sort of make sense, because they are not known to be associated with methylation. Then in the high-po category, so these are mutations where you have many, many more high-low implications and high-high implications, suggesting that these have a predominantly hypo-methylating effect. Again, DNMT3A falling into that makes sense, because it's an inactivating mutation of adenovo DNA methyl transferase. So what was most exciting for us was this category of hyper-methylation. And again, what stood out was WT1, which is because it's a transcription factor, and we weren't really expecting to find it there. So if you look at the data again visually, because there's a whole bunch of numbers here, so what I've plotted here is the ratio of the number of high-high implications to high-low implications. Again, you see WT1 really kind of stands out. So that's what we decided to pursue further to see what was going on. So the first question was to see how the co-occurrence and mutual exclusion patterns of the three mutations in the hypercategory, and what we found that was that both in the TCGA data as well as the separate cohort, the eco-cohort, we found that the mutations were almost mutually exclusive. And also when we look at the CPG sites that are being methylated by the three different mutations or the genes that are associated with the CPG sites, we find only partial overlap or little to partial overlap, actually. So this is kind of suggesting that WT1 mutation has a very unique hypermethylation signature, but it still does not tell us whether it's actually actively playing a role in causing the hypermethylation or it's just some kind of epiphenomenon that we are observing such as a cell of origin thing. So to get to that question, we decided we needed to really do wet lab experiments. So we put a mutant form of the WT1 protein in AML, in THP1 cell lines, which is an AML cell line that is wild type for WT1, and then after 10 passages, we measured the methylome using 450 keres, which if you remember is the same array that we use for the TCGA patients. And so this heat map over here compares the controls to the WT1 mutants, and you can see an increase in DNA methylation going from the controls to the WT1 mutants. And furthermore, what was interesting was I looked at the genes that were being methylated in the cell lines versus the genes that were being methylated in the TCGA patients with the WT1 mutation, and there's a very significant overlap suggesting that probably a similar mechanism is happening in the patients as well. So now we have these genes, and so the next question is, you know, what are these genes that WT1 mutation seems to be preferentially methylating? So GSE analysis shows, actually, in both the TCGA patients as well as the THP1 cell lines that they're extremely enriched for the PRC2 targets, so that these are genes with the H3K27 trimethylation marks. And so, you know, so I mean, it's possible that this happens to any all genes that are methylated would be enriched for PRC2 targets. So what we decided to do was to compare the methylated genes associated with the different mutations, and to see if that was unique to WT1, and while you do see PRC2 enrichment for the other mutations, but the enrichment for WT1 is clearly off the charts. So just to give you a quick review on what the PRC2 complex is, it's known to be a master regulator of differentiation events throughout development, and it has three components, one of which is EZH2, the catalytic component of which has histone methyl transferase activity, and it's supposed to induce the H3K27 mark, which is a known mark of transcription repression. So is it possible that the WT1 mutation is actually causing dysregulation of PRC2 targets in AML? And our preliminary analysis suggests that's indeed the case. So what we did is we isolated a bunch of genes, we called them the PRC2 mark genes in adult hematopoiesis, and we used the encode data to come to that list. And when we compared gene expression for the WT1 mutant AMLs versus normal hematopoietic populations, we found a significant degree of repression in the WT1 mutant AMLs compared to mature monocytes, and these were actually looking more like the normal progenitors, progenitor populations, suggesting that the WT1 mutation could be actually inducing a myeloid differentiation block. So to test this, we used a well-established cell line assay in which TF1 cells in the presence of EPO produced fetal hemoglobin, but when the mutant was introduced in the same cells, they failed to produce fetal hemoglobin, suggesting a differentiation block. So far, hopefully, I've shown you how the WT1 mutant seems to be producing both dysregulation of PRC2 targets and also seems to be causing a differentiation block. So the next question from the clinical standpoint is, can you actually rescue the differentiation block? So in order to test that, we used GSK126, which is an inhibitor of EZH2 and is already in clinical trials, I believe, for activating mutations of EZH2 in diffuse cell lymphoma. And what we find is, by treating with GSK126 for primary AML samples containing WT1 mutant AML, we see an upregulation of mature myeloid markers, and this was not observed in AML samples that did not have the WT1 mutation, as well as in APML samples that did not have, that responded to other treatments, but not to this particular treatment. So we are hoping that this actually means that GSK126 could actually be used as a targeted therapy for AML patients with the WT1 mutation. So this brings me to my last slide. I've presented a story about how we think mutation in WT1 is a novel driver of DNA methylation in AML, and we have actually tried to use the methylation pattern to come to a cure, come to a potential therapy. So what's exciting from the clinical perspective is EZH2 inhibitors have activity in WT1 mutant AML, and just a little plug about the computational pipeline, the initial pipeline that analyzed mutation methylation data is not AML specific at all, and it should be applicable to other cancer types, or even in a pan-cancer kind of analysis. So if there's interest in that, please talk to me, and I'd be happy to help you with that. So finally, I'd like to end with acknowledgments. I'd like to thank all my co-authors on this work, and particularly Dan Thomas who did the experimental work, our funding sources, and finally the TCGA actually for putting out this kind of data, because that helped people like me with the pure computational background and an interest in biology to come out and try my tool on this kind of data. Thank you. Thank you very much. I'd like to ask a quick question if I may. So I'm a little puzzled by your rescue experiment, because generally when polycomb targets acquire abnormal DNA methylation, they lose the polycomb mark, and so I'm a little confused why your EZH2 inhibitor actually rescues the expression. So actually, yes, we started out with the DNA methylation, and then we noticed that the signal for this PRC2 dysregulation was actually stronger in the WT1 mutants, and that's why we decided to use the EZH2 inhibitors. So they're usually mutually exclusive. Right, that's what we had read in our own samples, but this is what we are seeing in the data, so we're actually analyzing it further to see what's going on. So how much do we know about the EPPRF, because IDH mutation, this is very clear. I have seen a pair up, so the original work, I believe my figure over at the classified AML into 16 epigenetic clusters, one of the clusters actually had CBP alpha mutants, and it was a hypermethylation cluster, but I do not believe that there's any work to actually validate that. Thank you, Shabarna, and we're going to take a coffee break now, and we'll be back for session 4 at 11 a.m. Thank you.