 Okay. Great. Thank you very much. It's great to be here and talk about the work of our double normal committee. Our goal was to do some genomic characterization of cancer adjacent breast tissue. And what we're looking at is both field effects and expression subtypes. So to get started with this, I wanted to give a bit of the translational relevance for looking at adjacent normal tissue. This is a curve from a randomized trial of breast conserving therapy versus radical mastectomy. And what you see on the y-axis is the probability of recurrence and on the x-axis is the number of years. And even 15 to 20 years after surgery, you see that individuals having breast conserving therapy have a much higher probability of recurrence. Overall survival is the same for the two groups because an additional resection can occur. But this difference in the recurrence rates has been the subject of some interest specifically with relation to this concept of field effects. The other thing we know is that when local recurrence does occur, it commonly occurs in the lumpectomy bed and that the rates tend to be higher in some different subtypes of breast cancer, specifically basal-like breast cancers. So to try to understand this better, my research group has been doing some work with looking at field effects and stromal microenvironments surrounding tumor. And then the TCGA had done this extensive characterization of many different data types in that adjacent normal tissue. And so I'm going to talk to you a little bit about that today. But to start out with, I think it's important to have a clear definition of what do we mean by a field effect or what is a field carcinogenic effect. And this visual really nicely shows what we're thinking about when we're thinking about field effects. On the left, you see a patch which is defined as a small region that is showing high expression of a particular mutant protein. In this case, it's mutant P53. Then as you move to the right, you see progression of this patch to form a field which ultimately then persists in the tumor that forms. So this is histologically normal tissue at the left, but it harbors this defect that then ultimately manifests in the cancer. And this is what we refer to as a field effect. Usually we're referring to epithelial changes and we're referring to genetic changes that create an area that looks normal, but that if left behind during resection could result in a second primary. And this was first described in 1953 in oral squamous cell carcinoma, but has subsequently been shown to occur in a number of different cancer types. The other thing that happens in the adjacent normal tissue is that we can get a stromal response. So it's well known that there can be changes in the stromal tissue immediately adjacent to a tumor and that these changes could be detected by gene expression profiling or other genomic effects. So here I'm showing you a heat map from mRNA microarrays or mRNA data from a DNA microarray, and you can see on the left that the cancer adjacent tissue has a very different gene expression profile than reduction mammoplasty tissue does. And in this paper, we showed that some of the gene expression changes that were most detectable seemed to be consistent with evidence of a wound response or a strong stromal reaction to the presence of the tumor. So with that background in mind, I'm now going to turn to what we've been doing with the TCGA double normal project. And here I'm representing the data from a number of different groups. And I'll start first talking about the DNA analysis. And this is data where we had triplets. So we had 40 triplets where we had normal breast tissue adjacent to the tumor. We also had the tumor and we had blood. And this tissue was exome sequencing was performed by the WashU group with Dan Kobalt and Lee Ding doing the analyses that I present here today. And then we also had copy number alteration data, which Andy Churniak presented or did for our group. And then we also have methylation data. Now the difference with the methylation data is that we don't have a blood normal standard. We can't look at the blood level methylation as a standard for comparison. But our key question across all of these three DNA data types is, are there detectable field effects and or are there detectable tumor cells? And the idea was that perhaps some of this adjacent normal tissue could be used as control tissue for other types of TCGA analyses. And so what I'm going to show you here is the limitations and the advantages of potentially using the data in that way. So this slide is courtesy of Andy Churniak and it just shows us how we're using the SNP array data to find copy number alterations. So here, using the blood normal as a comparison, Andy has identified the copy number alterations in tumors. You can see that there's a MIC amplification present in this particular tumor. Also, lined up right above that is the normal adjacent. You can see that there's also focal areas of MIC amplification in that normal tissue. So this would be what we would consider a field effect. Or it could also be potentially evidence of tumor cells. And it's difficult to distinguish that just from looking at this data. But this is the kind of thing that Andy Churniak did to assess the copy number alterations in the adjacent normal tissue. Here is a map showing you all 40 of the samples that we analyzed. And they're just color coded here, each tumor subtype is color coded. And the reason I did that is because I wanted you to see that these field effects seem to be occurring in all different types of tumors. So we have a basal-like and two luminal As that showed some sort of field effect or tumor contamination based on copy number data. And it was a total of about 7% of the samples had evidence of some kind of copy number alteration in the adjacent normal. We then turned to analyses by Lee Ding and Dan Kobalt, where variant allele frequencies were detected, again, using the blood normal as the standard. And what you can see is there's quite a lot of variant allele frequency variation in the tumor tissue. And the scale here goes from 0 to 100%. And there's quite a few samples that show high variant allele frequencies in the tumors. Then if you look along the x-axis, you see the variant allele frequency in the adjacent. And you can see that the axis is much shorter. So we see a lower prevalence of variant allele frequencies in the normal tissue. But many of the samples that have high variant allele frequencies in tumors are also showing up in the normals. There are also many variant allele frequencies that are present in tumor that don't show up at all in the normal. So again, here we're trying to distinguish between what might be a field effect and what might be tumor cell contamination. But the bottom line with this data is it tends to be much more sensitive to being able to pick up these mutations. So 10 cases out of the 40 or roughly 25% had some evidence of a field effect by mutation analysis. So here we're adding an additional column to the data set. And you can see that the XM-Seq data identified a much larger number of samples with some sort of alteration in the adjacent normal tissue. Okay, now the methylation data is divided into two data sets. The first is the 27,000 Illumina platform, and that's shown here. On the left are the tumors corresponding to the adjacent normal on the right. And you can see that there are quite distinct patterns between tumor and normal for methylation. But there was one adjacent normal sample that shows a methylation pattern very similar to what's observed in tumor. So this one was picked up as having a tumor-like methylation profile. Here is the Illumina 450K data. And again, there were three samples that showed up as having this tumor-like methylation profile. So all of these were flagged. And again, I'm going to show you where they came out in terms of these 40 samples. And what you can see is that there's one tumor here that kind of consistently was picked up by, or one normal sample, excuse me, that was picked up by all three platforms. But that some of the platforms are picking up different samples. So there's some heterogeneity in terms of which samples are being identified depending on which data type is being used. With the methylation, again, it was a 7% to 10% rate, more similar to the copy number alterations in terms of how many samples were showing these kinds of effects. So the gold standard is I keep referring to this idea of we can't distinguish between a field effect and a tumor cell contamination. And so our next thought was how would we actually go about doing that? And our strategy was to get a pathologist to do a very careful review of these tissues to see if there was actually any evidence of tumor cells present in the tissue. So in order to distinguish these two questions, we then had a pathologist review all of these tissues. And what we found was that we actually had unbeknownst to all of the analysts, there was a positive control in our data set. In retroactively going through the data collection and the QC procedures, we noticed that one cancer adjacent sample had gotten in and had later been detected to have been very adjacent to a tumor specimen. And so this one sample that was detected by all three of the platforms in fact did have clear evidence of tumor cell contamination. Unfortunately, as luck may have it, for two of the others, there were actually no good quality sections. So we concluded from this that if you do have good evidence that there's no tumor in your normal, then that might be a pretty good indicator that you won't pick up these field effects. But for cases where that data was missing or where there was clear evidence of tumor, we were finding that the genomic data was also uncovering some evidence of field effects there. Okay, so looking for malignant cells is not all that we were capable of doing with this. And we are very aware particularly with breast tissue is very stroma rich. And so we were aware of some challenges in analyzing this data, even things to the extent of what kind of coverage do you get when you have a 40x read but only 5% of your tissue is epithelium. And we're really looking for epithelial-based field effects. So we were thinking about all of these issues and we decided that it was really important to have good characterization of the stromal and the epithelial content for our tissues. So a postdoc in my lab went through and manually annotated using a period scanning all of these samples and very carefully denoted where are the epithelial cells, where is the strom and where is the fat. And we were able to come up with composition estimates for every sample. Then Andy Beck at Harvard trained an algorithm that can do this in an automated fashion and he can now do this very rapidly on all of our samples for the TCGA. And he's also being able to collect other sorts of morphometric data from these tissues at the same time. And using this data we were then able to go back and look at were there particular marks, methylation marks in this case I'm showing you, that were correlated with the stromal or epithelial content of the tissue. And what we found is on the 450 platform there were a huge number of probes that were positively correlated, about 1,300 probes and 1,250 probes negatively correlated with epithelial content given a FDR of about 5%. With the stromal content there was also a large number of genes that were detected. So being able to look at these gene sets might help us to interpret methylation profiles where we don't have that data and it might also give us some clues to the sort of changes that we're seeing. There's a lot more that can be done here to sort of mine the biological significance of these gene sets. Okay so I started out talking primarily about DNA and the reason the DNA was so appealing for us in trying to address this question is that we have this really great sequence standard which is coming from that triplet, that blood normal. As I turn to the mRNA and the microRNA data you're gonna see that we don't have that same facility to make comparisons because we simply don't have the right tissue-based control. We don't have patient-matched, truly normal tissue. All the tissue that we're getting could be showing effects of response or stromal reaction to the tumor. However, we're interested to see what is the variation in RNA. So I had shown you earlier that there's this widespread wound response. Well, doing unsupervised analysis just on the cancer adjacent tissue only we first discovered that there were two very distinct groups in terms of expression profile. And here you're seeing that there's two main clusters, one we dubbed active and the other inactive based on the fact that the active group showed increased levels of cellular movement, inflammation, fibrosis and chemotaxis genes. So this is truly just based on ontology. But we were interested to say, okay, if there is this heterogeneity in the normal, does it have something to do with the tumor types that are formed or does it have something to do with risk of recurrence? And we looked to see whether these two subtypes were correlated with ER status and there was no significant association there. And they also were not correlated with tumor subtype and the white samples here are ductal carcinoma and situ samples. So this is in some previously published work but what we had identified in that paper is that there were two subtypes of cancer adjacent normal tissue and they seemed to be independent of tumor subtype. But interestingly, they appeared to predict prognosis in ER positive tumors. It's very difficult to predict late survival outcomes for ER positive tumors. And our thought was that perhaps something about the way that microenvironment responds to the tumor might give us some clues to understand the progression of those ER positive tumors. So when we turned to looking at the RNA data in the TCGA samples, we had this unsupervised clustering in the back of our mind and we wanted to see if we were gonna recapitulate these same subtypes. Now the TCGA data does not have the maturity in terms of the survival outcomes. It's only a few years in so we couldn't evaluate the survival probabilities for these two groups. But we were able to show that there were two distinct clusters also present in the TCGA data. And actually what I'm showing you here is the microRNA data because the exact same pattern of expression was observed and the microRNA as was present in the mRNA. And I understand this is distinct from what was seen in the breast tumor tissue where microRNA and mRNA profiles were not strongly correlated in the normal tissue. They are almost identical. So you can see here there are two clusters. And this is data from Gordon Robertson where he has many, many samples of normal tissue that were done by consensus clustering here and microRNA arrays. And then down at the bottom you can see the identity based on the mRNA cluster. So the white is active and the black is inactive and you can see that they're almost perfectly concordant with the two microRNA clusters. Interestingly, we tried to figure out what kinds of biological themes might be represented here and one of the strong factors distinguished in these two were microRNA 200 family genes which are sort of indicative of a sort of more mesenchymal character. So going back again to that pathologic data and that data about the composition of the tissue, we then wanted to ask, well, is there something about the composition that we can learn from these expression subtypes? But the other thing before I turn to that data is I just wanted to point out that what you don't see here are those few samples sticking out like you did in the methylation data where we could say, okay, there's clear evidence of tumor contamination data here. We are not picking up tumor contamination readily in these mRNA and microRNA data the way we were with the DNA data. But what we are seeing is that these mRNA and microRNA subtypes seem to be pretty strongly correlated with the composition of the tissue. There's quite a bit of variation here but that active subtype seems to be associated with high levels of adipose-rich stroma whereas the inactive subtype is associated with high levels of non-fatty stroma or fibroblastic tissue. And then the epithelial percentages were not significantly associated with either subtype, either the active or the inactive. So in conclusion, the DNA results from our analysis of the TCGA double normal data shows that there are strong evidence for either field effects or tumor contamination. And our team has some work to do in trying to figure out exactly how to distinguish those two or whether there are other ways that we can take advantage of this very rich data to try to tease those two things apart. The pathologic evaluation can get us partly there but I think there's a lot more we can do particularly in those triplets to try to understand where some of this genetic heterogeneity is coming from. And in contrast, the RNA really shows us the expression subtypes and is not strongly indicative of tumor contamination. So I'd like to thank the team that contributed all of this data. I'm presenting information on behalf of many people. Specifically Chris Benz, I wanted to note because while he didn't generate one of the particular figures here, he's been a co-leader of this group and has provided a lot of really great insights, as well as the hard work by Andy Churniak, Dan Kobalt, Lee Dang, and Swapna Maharkar from USC who did all the methylation analyses. Beautiful work. Tom Giordano at Michigan. I spend part of my year signing out breast cancers. Have you tried to incorporate fibrocystic changes, hyperplasia, atypical hyperplasia, columnar cell change? So pathologists recognize a lot of changes that occur in benign breast tissue. Maybe that's driving some of this. Have you guys tried to address that? It's a hard thing to address. It is, we have, in addition to sort of just looking at composition changes, we had the pathologist score for any type of benign conditions that they saw present in that tissue. So we can go back and analyze it although those events were somewhat rare in those adjacent normals. Because, you know, a lot of breast cancers arise in fibrocystic changes and you can divide them broadly into a proliferative type where they have more epithelial hyperplasia and a non-proliferative type and maybe that'll be useful. Yeah, and also, Andy Beck, I don't know if Andy's here, but he has a really nice DTF signature which represents some fibrocystic changes and that's another thing we're analyzing in this data to see if we can capture some of that on the RNA level. Good question. Have you actually looked at menopausal status with respect to your two different subtypes of normal tissue? On which status? Menopausal status. Menopausal status. Are you simply looking at cycling versus... We have, actually, we've been studying age in relation to those two characteristics and young age seems to be associated with the more active phenotype. But menopausal status, the association's a bit weaker than it is for age. Thank you, Melissa. Thank you.