 So there's been a program change. I've been asked to give an update on the subtype, integrated subtype pan cancer manuscript and exercising my ultimate power as chair for the session. I bumped my own graduate student, Adrian, who is gonna talk at this slot. He'll be giving his presentation at the Prostate AWG meeting later today. So I apologize and say thank you to Adrian. So I'm gonna give an update on that, on this work that was done by the pancan subtype working group chaired by Chuck Peru, Chris Benz and myself. So I don't need to say much about where the data came from but the pancancer analysis, obviously, we stood on the shoulders of that work. A lot went into that and this dataset that we used also was coordinated by the same group and we stuck to using those results so that we could contribute to that result archive. So we're talking about 3,500 samples that we're gonna try to place into molecular classifications across tumor types potentially and ask the question when we use these multiple different datasets, how do the tumors fall out? So more asking the question do cancers fall along tissue of origin boundaries or do they tend to form clusters that reflect pathways or other types of molecular features? So that was what we set out to do. I'm gonna also put a shameless plug-in for a resource we're trying to put together to get an overview of this data. Most of you know that I really hate heat maps so we have a tool that's gonna create a browser for looking at this data all in one place in a slightly different way and so just to give you a little primer on that, the maps I'm gonna show you for the results of the subtype work are gonna display samples in a particular address on a 2D plane and each address, this is sort of the ultimate, I guess, socialistic neighborhood but everybody gets to occupy a hexagon and so each sample will get assigned an address that maps to a hexagon and we lay out the samples based on their molecular similarity so that similar samples with common, let's say mRNA profiles will end up as neighbors on the map using techniques similar to these force embedded layouts that people work with. So that helps you because now you can look up a particular sample that you're interested in and then just look in the zip code where that sample came from and hopefully there are samples nearby that share molecular similarities with it. So it makes working with this data, I think, a lot nicer in some respects so just to show you an example on the left is the map that I'll show you a bigger picture of but just zoomed in on the breast cancer samples and you can see that the basals separate well from the luminals and the HER2s are sort of clustering in this mRNA space with the Luminal B tumors shown in purple there and then there's another layout we get for deriving samples based on their similarity in pathway space and so we can develop a number of these maps, one for each different platform and you can kind of consult the map to see if the AWG subtypes which we've loaded in match up to how the map is being laid out. So there's the endometrial subtypes on the mRNA layout and so on, GBM, here's the colorectal subtypes and then you can consult other maps to see if maybe there's a different map. In this case, this is the methylation map. You can see that the colorectal subtypes actually dial out maybe better and this one actually corresponds to the high MSI and stable tumors which makes sense because we know that's correlated with the methylation status. So the working group took six different platforms. Originally it was five but we were asked by a referee to also incorporate mutations and we each generated subtypes from these separate platforms on their own and then set out to do analyses on these platforms. So the clusters we got from each separate platform ranged in size from the smallest one from copy number. We had eight subtypes all the way up to 19 from the DNA methylation subtypes. This is what it looks like on that tumor map I showed you. Each color now is a different subtype from the mRNA map so it's not tissue here I'm showing but the cluster and then here are the other subtypes that we got for five of them. We don't have a layout for the mutations yet. So they correspond pretty well. The tumor map pretty much recapitulates the subtypes defined by the working groups. I think we might have some work to do for some of these subtypes and we're still working on how to process some of those data platforms. So the biggest news first and the biggest signal was tissue of origin does in fact correlate with these clusters that you get from each separate platform. Each color now shows you the tissue type and each one you can see that they all pretty much correspond. You see large swaths of the same color so that means that the clusters have enrichment for tissue. And then maybe the one that's sort of going against the rule but you can still see big tissue of origin swaths is mutations may have more of a tissue orthogonal axis. And this is on the tumor map you can now we're showing colors for the tissues. Of course we recapitulate that the maps are driven by tissue. So your eyes should just see nice separation of colors which you see. So how do you cluster this data? So Katie Holdley set out to take these six different subtypes and come up with one subtype to rule them all. And the answer we came up with was well we'll just let them all vote. Well how do you let them all vote? Well we just said let's use the US system of kind of like electoral votes. But the number of electoral votes that a platform gets equals the number of clusters it came up with basically. Which kind of defines the different types of orthogonal data in it. So she could make a new matrix which has all the copy number subtypes and methylation subtypes and so on. This gives you a new matrix, big binary matrix. These are the samples and you cluster this matrix and she calls this the cluster of cluster assignments. So if I say coca, that's the integrated subtypes and we found 13 of them, a couple of them were tiny. So we analyzed the bigger 11 of them so we have 11 main subtypes from that map. And when you view them in the context of all the other data specific platform subtypes this is what you get and you can see the nice correlation across all the platforms for these subtypes. And of course as you expect tissue of origin does dominate with the enrichment for these clusters. So you can move on here. This is just the subtype map on the tumor map image that I showed you. So what kind of patterns do we see among these cancers? I mean so do they tend, I already told you that dominant is that tissue of origin kind of goes into one specific cluster. Well there are some exceptions to the rule. These are the ones that sort of fall in the tumor tissue of origin only clusters. And but then there are some that have different patterns. So the head and neck and lung squamous get co-clustered together into this squamous like subtype. Colon and rectum as we saw with the working group for that paper, those get nicely clustered together. Blatter has an interesting story of getting divergently clustered into one cluster on its own which is mostly bladder cases and then it clusters with the lung adenos and the squamous like cluster. And so we'll take a little bit closer look to that. And then there's a divergent pattern with the breast. So we all knew that luminols and basals were very different but put in the context of a whole bunch of other tumor types those actually look like separate tissues in this context. So it kind of gives us a guideline for how to interpret just how different the basals are. Leading's group looked at the copy number change, sorry the mutation frequencies that were defined by these subtypes. And basically as we kind of expected there are only three genes I guess with 10% more frequency. But there are others that were identified and it keeps kind of going through that. The chromatin remodelers if you add them all up kind of account for plenty of the samples. And then Andy Churniak derived a copy number map that now has been ordered by the coca subtypes you can see clear distinct copy number patterns that define these tumor types especially for the mixed types we see some good patterns and he could create a new clustering with that copy number map. Ben Raphael's group then used hotnet too he's got a poster upstairs where they could use protein-protein interactions and find what mutations are either specific to a particular subtype or in this case this core network defines mutations that kind of link together lots of different tumors in multiple different subtypes. Sorry for the resolution there. So you know you can see the frequent ones P53, PIC3CA, P10 are shared in lots of different tumor types and then there are some that he's laid out that you can see what defines the squamous like subtype that I mentioned before some of those chromatin remodelers like the MLL2s and 3s and EP300s in there, EGFR mutations and so on. You can say so what, who cares about the subtypes? Well we looked at whether, so you know we already knew that tissue of origin defines different prognostic profiles so you can look at overall survival and there's clear difference in the tissue of origin. If you take the coca subtypes you also see that but you wouldn't be surprised because I already told you that they follow along to tissue boundary. So what Katie did was she was able to do an analysis where she kind of stacked up she first used the clinical data and showed how predictive the clinical data on its own were it's this blue bar for predicting overall survival and then you gain a lot more when you add either just the tissue type as a feature to predict or if you add the coca subtype as a feature to predict but what was cool to see was you add independent information on top of just knowing tissue and the clinical the coca subtype actually does give you some small but significant and independent information about how to define these survival groups. So let's look at the bladder cases really quick. I told you that they diverge into three different subtypes on the tumor map. This is what they look like. They fall into the squamous on their own and in the lung adenos. These bladder cases also are differentiated by their overall survival outcome. So the integrated subtype has something to say about these bladder cases. So the lung adenos and the lung squamous like bladder cancers have poor survival outcomes than the other bladder island samples that go into the other C8 subtype. There are genomic determinants that define these bladder cases. Copy number changes on chromosome three loss and then Nuriolopus's group could look at what different mutations define the bladder cases and again there are these chromatin remodelers that tend to happen in the squamous subtype for those bladder squamous like cases. So I'm having trouble. Then John Weinstein's group led by Ray Han could look for what defines the expression profiles in proteomic space and consistent with what the working group was seeing for the bladder samples. We also find HER2 and this RAB25 protein is more expressed in these non squamous bladder cases. And then for the squamous cases we're seeing sort of an epithelial to mesenchymal transition theme. The catherans are in there, beta-catenins and so on that define the protein levels for the squamous like bladder cancers. I haven't spent a lot of time talking about this analysis but I'll just quickly show you an end with this. Denise Wolfe from UCSF defined 22 different gene programs we call them. These are sets of genes that were known to be functionally related. She found that the ones that are co-regulated across the pan-cancer-12 data set and then clustered them and came up with these 22 kind of orthogonal gene programs that we use. And so you can kind of look at all the coca subtypes and see how they're correlated in this space. So just to get your bearings, this is the breast luminols and they have their high in estrogen signaling obviously. On the tumor map you can look at this by just pulling up the like a weather map we call it for ER signaling and you can see there's the breast luminols and they light up with this ER estrogen signaling program really nicely. The kidney cancers, which primarily have VHL mutations and drive the signal, they have an up-regulation and HIF1-alpha. You can look at that on the map. So here's the kidney cancer tumors there and you can see that the hypoxia gene program that Denise defined is well lit up and distinguishes those tumor types. The squamous set that I mentioned before was defined by not surprisingly squamous differentiation and then this other group basal signaling and a map kinase signaling cascade. So you can take a look at all these things on the tumor map. You can see the, if you're interested in MIC amplification targets and their activity or this basal signaling program, you can pull up and see which tumor types have higher activity in those profiles and then you can even do an overlay to find maybe ones that kind of go against the general grain. So I'm gonna quickly conclude here because I'm already over time. We found an interesting story with P53 and that it may be the homologs may be compensating for P53 in the squamous set and that was led by Zhang Chen and Carter Van Weis here. So let me just say that I don't know if I mentioned but it's now impressed itself. We're looking for covers and we've got some a little bit of competition going on in the group. Katie Hoadley and Julia Zhang have been making the cover like this and Zhang Chen has been making a more maybe a more serious one. I've had multiple versions of this so you'll have to come I guess and vote on this tomorrow at the pancake breakfast, the second annual one tomorrow morning. Since I'm out of time I'll just leave up the acknowledgements and take any, I'll take one question maybe. Thanks for your attention. Yeah, the question's about how we handled batch effects. So we have a whole batch effects committee that's run by MD Anderson and they do principal component analysis derived type of methods or combat to take a look at whether it's worrisome or not. So they give us a report before we do any serious analyses. We did find some batch effect when we moved from an RNA seek platform midway through and then tried to identify which genes were, and we had some samples where we had measured the data on both of these batches and we tried to mitigate that with identifying which genes actually showed this batch effect and subtracting those away. Maybe Katie can tell me and remind me exactly what we did. We throw out those genes that we found had that batch effect. I'm not sure if she's in the audience. What, we can follow up with you. So we basically, the 19 samples are on both a GA and a high-seq we used to figure out what the difference between the platform was and adjusted that to all of the GA2 data so that we kind of put them all in the high-seq space. One small issue is that we're limited to just colon express genes but it was the best we could do and we're hoping to get some more samples on both platforms to be able to help with that adjustment. Thanks for all that help answering the question. Okay, so I'm gonna introduce the next speaker. It's Samir Armin from the University of Texas, MD Anderson Cancer Center. His title will be Profiling Long Introgenic Non-Coding RNA Interactions in the Cancer Genome.