 Good morning. So first I want to thank the organizers for inviting me to speak here today. I'm going to be talking about a project we've been working on, looking at pathways dysregulated via DNA methylation across different types of cancer using a web-based program we just finished developing called LRPath. Where's the clicker? Okay, so over the last couple of years there's been a number of new driver cancer genes identified. And several of these are known to play a major role in epigenetic mechanisms. However, the driver targets of these genes are not well, not fully understood. And so we're interested in looking at taking a pathway level approach and looking at whether epigenetic mechanisms tend to target similar or overlapping genes and pathways as somatic mutations, or whether they tend to target different pathways than somatic mutations. So a platform that's been used for a couple of years now to assess DNA methylation genome-wide is the Illumina Human Methylation 27 B chip, which covers over 27,000 CPG sites. And several studies over the past year, year and a half, have been published and are now publicly available using this platform. And interestingly many of these available data sets are studying cancer. So we decided that the time is now ripe given the data available at TCGA and through NCBI's Gene Expression Omnibus to do an integrative analysis looking at the DNA methylation pathways across cancer types. So our hypothesis was that during carcinogenesis certain pathways or biological gene groups would be commonly dysregulated by DNA methylation across cancers. And the approach we took to this analysis was to use our newly developed web-based program called LRPATH along with clustering analysis, which is available on the website, to look at the commonly altered pathways and other biological concepts across 10 different cancer studies that we chose. And with DNA methylation profiled using the Illumina B chip. So first I want to take just a couple minutes to introduce my LRPATH method. So I used the LR stands for Logistic Regression. So a logistic regression model is used to analyze each gene set of interest. And it turns out that logistic regression has some nice properties in terms of doing gene set enrichment testing. As many of you may be aware, Fisher's exact test is probably the most widely used test across different programs. And logistic regression can be considered an extension of this without the need to choose any significant cutoffs for genes or sites for differential methylation or expression. It also enables calculation of an odds ratio similar to a Fisher's exact test and users can obtain a similar type of output. So this original algorithm was published in Bioinformatics Journal in 2009. And since then we have developed a user interface for this program accessible at lrpath.ncibi.org. So it's a fairly simple interface. Users choose a species which annotation databases they want to test against. And they can get tabular results for any one analysis. Or if they choose to test multiple conditions, in this case, multiple cancer types, then they can obtain a heat map to visualize how the pathway expressions are changing across the conditions, across time. So to visualize the pathway profiles, basically. So some advantages of lrpath are that it has strong performance for data sets with both large and small sample sizes, which is nice when you're integrating many data sets with various, having various sample sizes. It has the ability to test both, do both the directional test. And by that I mean to take into account the direction of changes. In other words, look at up and down-regulated genes or hyper-hypometallated concepts in this case. Or you can also do a non-directional test where you just look at what is overall differentially methylated or differentially expressed. Or which gene sets actually have less differential methylation than you'd expect by chance. And that actually became interesting in this study. Finally, lrpath has a random sets interpretation. By that I mean it basically looks at whether the overall amount of differential methylation or expression for a gene set is significantly more than you would see if you just took random sets of genes from the experiment of the same size set. And it has a number of other advantages as well, which I'm going to skip. So here are the different types of gene sets available to test with lrpath. And these are all listed here in the user interface, along with additional options that I'm not going to go through. And for clustering, the user has the option for several different clustering methods as well as different options for how to filter which concepts that they want to look at. Okay, so now on to this specific study. All of the data that we used was performed on the Illumina human methylation 27 Beechit, which has been discussed already this morning. It assesses the percent methylation for over 27,000 sites for over 14,000 genes. And most genes are represented by one or two sites on the array, as you can see on the table on the bottom on the left-hand side. So these are the different types of cancers that we studied. Half of them were from gene expression omnibus and half from TCGA, which we downloaded from the data matrix. There was colon cancer, multiple myeloma, glioblastoma, prostate and breast cancer from GEO, and kidney lung AC and lung SCC from TCGA and ovarian and stomach cancer from TCGA. So we download all the beta values representing percent methylation. We tested for differential methylation between tumor and normal samples using an empirical Bayes method. And as you can see from the bottom row on this table, all of the different cancer types had at least a thousand sites with a p-value less than .01 and at least a 10% change in average methylation. But what's nice about LRPath is that we didn't have to choose any certain criteria. We put in all of the data for the enrichment analysis. So what I'm going to show you here is some initial results from clustering for these 10 cancer types. And for this particular heat map, our criteria was that the biological concepts or pathways had to have a p-value less than 10 to the negative 4 and at least 5 of the cancer types, or at least half of the cancers that we tested. So here, of course, we're biasing ourselves to see which pathways are commonly altered by DNA methylation across cancers. And we saw quite a few. So each row represents a biological concept or pathway. And red indicates hypometallated genes and green indicates hypermethylated genes or pathways. And so among the hypometallated pathways were many immune-related pathways and gene sets, including chemokine and cytokine activity, inflammation pathways. And then there were also receptor binding activities and peptidase activities, which are important for breaking through the extracellular matrix in cancer, as well as epidermis development. So since these were hypometallated, we expect that they would have increased expression levels in cancer. Among the hypermethylated gene sets were nervous system embryonic development and homeobox genes. And so with these, there's a lot of overlap with PRC2 targets, which I'm glad Peter Laird introduced earlier this morning for me, and as well as transcription factors, which are represented here by sequence-specific DNA binding and voltage-gated potassium channels, which are also known to be important in cancer. So next, we wondered, well, if we expand the concepts that we're looking to, to ones that are changed in just at least one cancer, are we going to see a lot more cancer-specific pathways affected by DNA methylation? So in this, on the right, the heat map represents concepts enriched in just at least one cancer type. And we saw surprisingly similar results between these two, indicating that most of the pathways affected by DNA methylation are common across multiple types of cancers. And there were a couple exceptions to this that I wanted to point out. In the, in the middle is the, on the right, chromosome X-related pathways and concepts. And these were significantly hypermethylated in breast cancer only, which is an indication that the second X chromosome is getting extra methylated in breast cancer. And then on the bottom, you can see in prostate cancer cell adhesion, gene sets were tending to be hypermethylated, and, but that's in prostate cancer only. Okay, so next we wondered whether, so it's, so we saw that it tended to be the same pathways that were being affected by differential DNA methylation. We wondered if it tends to be the same genes in those pathways driving enrichment, or whether it can be different genes driving the enrichment. So we did a number of Fisher's exact test between pairs of cancer types for a handful of enriched concepts. And what we found that was in general, it was tended to be the same genes that were driving the enrichment. So here I just show the results for immune response and epidermis development. Both tended to be hypermethylated. You'll see that prostate cancer is not significant here, and that's because prostate cancer was not enriched for these hypermethylated concepts. I also show the results here for a hypermethylated concept, neurogenesis. And this interesting result here is that multiple myeloma did not have a lot of hypermethylated genes in neurogenesis. And this is due to the PRC2 target overlap with this concept that myeloma is the only non-solid tumor that we tested. And this did not show the same type of hypermethylation among PRC2 targets. So our conclusions is that pathways affected by differential methylation were surprisingly concordant across cancer types. We saw voltage gated potassium channels, homeobox, embryonic and nerve development were hypermethylated, and epidermis development and characterization were hypermethylated along with immune response genes. For most tumor types, it was similar genes that were affected by a change in CPG methylation. I didn't show the heat map results for our non-directional test, just looking for enriched and depleted pathways with differential methylation. But interestingly, DNA repair, a commonly affected pathway in cancer, obviously, actually had fewer genes changed by DNA methylation than you would expect by chance. So we hypothesized that genes in DNA damage tends to be regulated by alternative mechanisms in cancer such as genomic aberrations. So by performing this integrative analysis across 10 cancer types, we identified concepts affected in multiple cancers that support previously known biological important findings. And there's a lot of additional results hidden in this data that we're still looking at that I think will give us several novel findings as well. And so our overall conclusion is that it's a subset, seems to be a subset of all the known cancer pathways that appear to be commonly dysregulated via DNA methylation across cancers. So I just want to thank my colleagues at NCIBI at the University of Michigan, Brian Athe and Gil Oman, the leaders, and Terry Weymouth and Vassour, the developers that worked on LRPath, as well as my colleague Alar Karnofsky, my collaborators in the School of Public Health, Laura Rosick and Dana Dolanoy, and my postdoc, Julie Kim, who actually did most of this analysis. Thank you. Thank you Maureen. We have time for a question or two. Great talk. I'm curious about the, you show the coordinated hypometallated concept across tumor type. Have you drilled down and looked within subtype of the cancer to see whether there is preference in a particular population? In other words, does this hypometallated concepts define subtype and do they correlate with, for example, immune response being one concept that's enriched? Is there any relationship to transcriptomic defined subtype that's enriched in inflammatory cells, for example? No, we haven't drilled down to that level yet. As I said, this is a project that's ongoing, we're still working on, but excellent points and we will definitely try to be correlating some of these things by subtypes and clinical variables as well. Yeah, excellent talk and certainly plan to look up the website and look into your methods. A generic question that relates not just to your talk, but to Peter's and to many that'll be coming up is what the meaning of normal is. That is to say it's a particular mix of tissue types, it's not the progenitor cells. Despite what Eric described, it's relatively obvious what in the genomic level normal means is a point of comparison, not so obvious from an epigenetic transcriptomic proteomic point of view. Peter might be better able to answer this question from the TCGA point of view. Well, you're absolutely right, John, and obviously we would like to know the cell of origin and have that in a pure state in our hands to be able to characterize the epigenome. We don't have that, and so we make do, and sometimes it's really questionable. Like in the case of ovary, it's actually still under debate what the actual cell of origin is, but we had full thickness fallopian tube, for example, for comparison, which is probably not the ideal. Others use ovarian surface epithelium. What I would say is you might have noticed that in my slides, the cancer specific methylation tended to, those probes tended to be fairly consistent across a wide range of normal tissues. So whatever the cell of origin is, we are relying on the unproven assumption that the cell of origin had a profile somewhat similar to that broad spectrum that we see across multiple different cell types. So that the differences among tissues in that sense, at least for the methylome, are relatively small. For these probes, so you can do a supervised analysis to identify tissue specific DNA methylation differences, but those tend not to be the ones that are true CPG islands and undergo strong hypermethylation in a cancer specific way. Clearly the same question for transcript and protein level. I have a quick question, great talk. If I understand correctly, hypermethylation is more local, but hypo could be in larger regions. Please correct me if I'm wrong. Do you see any signal of gene sets that come up with genes that are just close to each other because they're affected by the same, by the same event of hypo methylation, which could cause your null hypothesis to be kind of a skewed compared to the actual analysis? Yes, that's a good point. So some of the immune response genes are close together. And so that, yes, that could be a valid point. That doesn't mean that the functional result is not meaningful, however, I think. Because on the beta ray, anyone probe is just linked to one gene and one gene only. So it is independent probes that are doing this. But yeah, if a wide range across multiple genes tends to be hypo methylated, then yeah, from that you could pick up a whole gene set. So one way to deal with it might be to take regions and kind of shuffle them as blocks rather than as individual events. Yeah, so the two that I'm aware of like that are homeobox genes and the immune response genes. Okay, thank you Maureen. I think we need to move on. That was a great talk.