 All right. Thank you, Lou, for that very kind introduction, and it's a real honor to be here at this symposium and just to be part of this just wonderful explosion of data generation and data analysis that has been TCGA over the last several years and really coming to more and more fruition this year. So I'm going to be talking on behalf of first of my co-chairs, Ramaswamy Govindan and Steve Balin, both of whom are also here at this meeting, and I'm trying to get to the next slide here. There we go. On behalf of all of our colleagues in the Cancer Genome Atlas Lung Cancer Analysis Working Group, many of the prominent members of whom are named here on this slide in front of us, and I'll be thanking various people individually as we go through the presentation. I think, as all of you are very familiar with, lung cancer accounts for over 25% of cancer deaths in the United States each year and is the leading cause of cancer death both in men and in women in the United States. In total, lung cancer kills more than 150,000 Americans each year and more than 1 million people worldwide. The major lung cancer histologies, as again I think probably most of you are very familiar with, are adenocarcinoma of the lung, squamous cell carcinoma of the lung, and small cell lung carcinoma. We now have projects in each of these areas as part of TCGA, and as most of you know, we've recently published the first squamous cell lung carcinoma marker paper from TCGA in nature earlier this fall. Among these cases, lung adenocarcinoma accounts for on the order of 40% of lung cancer diagnoses within the United States and about 65,000 deaths per year in the United States, and the percentage may be even a little bit higher worldwide. And so the number of deaths per year from lung adenocarcinoma probably is over 500,000. And while lung cancer is generally associated with smoking, really lung adenocarcinoma uniquely among lung cancer histologies does often occur in nonsmokers. And a lung adenocarcinoma of nonsmokers is especially prevalent in women, in younger patients, and in patients from East Asia or of East Asian origin. Lung adenocarcinoma has really become a paradigm for molecular subtyping, as in recent years the treatments for lung adenocarcinoma have shifted from histology-based strategies to molecular-based strategies. And we've really made major advances in treatment for lung adenocarcinoma with targeted inhibitors of both EGFR, such as gefitinib, urlatinib, and alk, such as chrysotinib, thanks to genomic discoveries. With my group fortunate to be able to participate, Bill Sellers, Bruce Johnson, and I, in the discovery of EGFR mutations in 2004, along with the Pow and Varmas group and the groups of Tom Lynch and Daniel Haber. And this is shown here as just one example, a patient with lung adenocarcinoma with a somatic EGFR deletion mutant in Exxon-19. And you can see here multiple nodules in the right side of the lung in this transverse CT section. And then complete clearing of these nodules after two months of urlatinib treatment slide from our colleague Bruce Johnson at Dana-Farber Cancer Institute. There have been a number of previous comprehensive genomic studies of lung adenocarcinoma, and I just mentioned a few of these here. A copy number study by Weir and colleagues, and a mutational study by Lee Ding and Gaddy Getz and colleagues, both part of the tumor sequencing project spearheaded by the NHGRI as a kind of precursor to TCGA, identifying amplifications of genes such as NKX2-1 and TERT and mutations of NF1, ATM, and APC, among others. The NCI Director's Challenge Expression Classification Project, and a recent reports by Ramaswamy Govindan and colleagues with Rick Wilson at Washington University on whole genome sequencing, identifying smoking and non-spoking signatures, and then work of Marcin Himalinsky, Alice Berger and colleagues, working together with my group at Dana-Farber and the Broad, both whole exome and whole genome sequencing, identifying recurrent mutations in RNA-splicing genes, including RBM-10 and U2-F1. And then finally, a recent report from Jong Sun So's group in Seoul National University identifying transcriptome alterations, including recurrent met-splicing alterations. So this has led to the definition of a large number of potential therapeutic targets, and this is a paper adapted from a paper by Pao and Hutchinson showing KRAS mutations, EGFR alterations, ALK fusions, and a number of other alterations, ERB-2 mutations, and most recently the identification of ROS-1 fusions and the KIF-5B RAT fusion reported by numerous groups earlier this year. But it also says to us that the leading driver in many lung adenocarcinomas remains unknown and still remains to be uncovered, and this is one of the major goals of our TCGA effort. If we look at our current project status, we've actually reached full accrual of the estimated 500 cases to the BCR, but there were 303 samples for which there were comprehensive molecular data at the time of our data freeze, October 2. But really working very closely with Bill Travis at MSKCC who led histological confirmation, we excluded many samples due to pathology review and were left with 230 samples for the lung adenocarcinoma data freeze. And the remaining cases that were excluded as not being adenocarcinoma will nevertheless be included in a subsequent pan non-small cell lung cancer study from TCGA that will also include cases initially reported as squamous cell lung carcinoma and excluded. So we have high-quality data across multiple platforms for all of these samples, including DNA sequencing, RNA sequencing, SNPRA-based copy number, methylation array data, proteomic analysis, and fusion discovered by low-pass-week sequencing and RNA sequencing. We expect to have 38 sample pairs with whole genome sequencing data, which I won't be discussing today. Our first face-to-face meeting is tomorrow, and our goal is to prepare data for a manuscript submission sometime early next year. First, I want to speak about copy number analysis of lung adenocarcinoma led by Andy Cherniak and Gaddi Getz as part of the Broad Institute Genome Characterization Centers. So I just want to point out, first of all, you can see overall here red is copy number gain, blue is copy number loss, and white is neutralish. Samples in this dimension, chromosomal position in this dimension, you can see many overall similarities between lung squamous carcinomas and lung adenocarcinomas. Gain of chromosomal 1Q, gain of 7, loss of 8P, gain of 8Q, loss of 9, et cetera. Probably the most striking difference is that chromosomal 3Q is almost never gained in lung adenocarcinoma and is frequently gained in squamous cell lung carcinomas. If we look at focal alterations, first I'll speak about the deletions. The predominant focal homozygous deletion in lung adenocarcinoma is the CDKN2A, cyclon-dependent kinase inhibitor gene locus. And if we look at focal amplifications, these include a number of cyclon-dependent kinases, cyclon D1 and cyclon D3, as well as the cyclon-dependent kinase gene CDK4 itself. It includes telomerase genes, the catalytic and subunit TURT and the RNA subunit TURC. It includes receptor tyrosine kinase genes, EGFR, MAT, ERB2, signal transduction downstream KRAS, and the NKX2-1 lineage-specific transcription factor. And finally, the MIC transcription factor. If we look at exome and RNA sequencing analysis, led by Julianne Chamelechi, Mara Rosenberg, working at Dana-Farber and Broad, Matt Wilkerson at UNC, Margin and Monsky, Brian Hernandez, Mike Lawrence, Neil Hayes, and Gatti Yetz, we see first, this is the famous slide that Gatti and Mike Lawrence frequently show. This is a variant showing the variants in mutation frequency, so different types of cancer across this dimension. Mutation frequency on a log scale in this dimension here. And you can see the highest mutation rate tumors are the carcinogen-induced tumors, melanomas, squamous cell lung carcinomas, and lung adenocarcinomas. This very high somatic mutation rate opposes a major problem in identifying significantly mutated genes. And we heard from Petar Stoyanov at the Broad yesterday about approaches to really identify significantly mutated genes and overcome these challenges. And similarly, we heard today from Nikolai Kankanov from Compendia about other approaches for this. So just some issues that we see, first of all, known recurrently mutated genes such as Herb B2 or beta-catenin do not show up as significant in this data set, at least today, regardless of the method that we've used. We see a lot of spurious mutated genes, genes like olfactory receptors. This can be eliminated by a number of signal-to-noise analyses as well as expression filtering. I think we'll still need to consider a variety of alternative approaches, including inclusion of functional significance analysis and two-stage statistical analyses. Furthermore, in the end, a much larger sample size may be required for elucidation of the full population of lung adenocarcinoma-causative mutations. This is a list of the top 21 mutated genes in lung adenocarcinoma expression filtered, generated by Julian Schmelecki and Mara Rosenberg and their colleagues. And just want to point out at the top a number of known genes, most of these present previously in the Ding & Getz manuscript, Ding & Getzadel manuscript, P53, STK11, KRAS, EGFR, RB1, BRAF. So, again, recurrent drivers from that pie chart KRAS, EGFR, and BRAF are all here. Others identified by other more recent papers, such as the Imolensky et al paper, RBM10, ARID1A, and U2F1. But I want to highlight here with stars some of the candidate novel genes and just point out a couple of these here. BCL9L is homologous to the BCL lymphoma translocated gene, BCL9, and is reported to encode a protein interacting with beta-catenin, also frequently mutated in lung and other cancers. MGA encodes a reported suppressor of the MIC pathway. And this gene has recently been reported to be subjected to inactivating mutations in B-cell leukemias and lymphomas. And MKI67IP encodes a protein that interacts with key 67, the well-known histological proliferation marker, which is encoded by MKI67, which has been found in the TCGA study to be recurrently mutated in endometrial cancer. And this is just a kind of diagram of correlation of gene mutations, just a couple features that jump out here. First of all, EGFR mutations frequently insertion deletion mutations shown in yellow here, frequently in samples with low overall mutation rates. P53 in lung adenocarcinoma, as in both, as in squamous cell lung carcinoma, as in small cell lung carcinoma, the most frequently mutated gene. And the other point that I want to make on this slide, though it's a little bit difficult, to see here is mutual exclusion between mutations in KRAS, EGFR, and BRAF. We also see recurrent loss of function mutations in swy sniff complex chromatin remodeling genes, most notably ARID1A and SMARC-A4. SMARC-A4 originally reported to be mutated in lung adenocarcinoma by Monse Sanchez Suspedes from Barcelona. Expression-based classification of lung adenocarcinoma, really some very elegant work done by Matt Wilkerson and Neil Hayes at the University of North Carolina. They did expression clustering showing reproducible classes, the bronchioid, magnoid, and squamoid classes, and using a subtype predictor based on their previous study of over 1,000 lung adenocarcinoma expression profiles. They identified these same subtypes in the TCGA dataset and then went on to perform integrated analyses. You can see that the bronchioid subtype are most enriched for those patients who are non-smokers. They are most enriched for EGFR mutation, as well as Alk, RAT, and Ross fusions, which are also occasionally seen in the squamoid subtype. In addition, you can see the SMARC-A4 mutations and KEEP1 mutations are both enriched in the magnoid subtypes. Low-pass whole genome sequencing analysis from Raju Kuchilapati's group at Brigham and Women's Hospital and Harvard Medical School, a work led by Angela Hadjipanias in cooperation with Matt Wilkerson and Neil Hayes at UNC, and I should also mention here infusion analysis of Xiaoping Su at the MD Anderson Cancer Center. First, from RNA sequencing, able to identify known fusions, Alk, Ross I, and RAT fusions in about 4% of cases, as I mentioned predominantly in the bronchioid subtype. One of the more intriguing novel fusions identified by Angela Hadjipanias analysis is a fusion between VMP1 and the ribosomal protein S6 kinase subunit, RPS6B1, in several percent of cases, and these fusions generally preserve the bulk of the catalytic domain of the protein kinase and are potentially activating. In addition, there's some intriguing peptidase fusions as well that also bear further exploration. I want to briefly touch on DNA methylation analysis led by Leslie Cope, Ludmilla Danilova, and Steve Balin and Jim Herman, Johns Hopkins, and I think one of the findings to date. We saw CDKN2A, and this is similar again to squamous cell lung carcinoma, CDKN2A, one of the most frequently inactivated genes by mutation, one of the most frequently inactivated by copy number alteration and by methylation. So we see, again, multiple means of inactivation of CDKN2A. Work of Gordon Robertson and Andy Chu and colleagues at the British Columbia Cancer Agency has identified microRNA signatures, which in particular, expression of mere 21, defines a large subset of lung adenocarcinoma. Finally, a group led by Alice Berger, Eric Collison, who's really been playing a major role in the entire project, William Lee and Mark Ladani, has examined mutations in those tumors that lack receptor tyrosine kinase and downstream signaling events. RAS mutations, EGFR, IRB2 and BRAF mutations, and the Alcaratin-ROS1 fusions, and what they found by looking at genes mutated in the oncogene-positive group, uniquely enriched in the oncogene-positive group so far was RBM10, uniquely enriched in the oncogene-negative group, interestingly, was NF1, suggesting that NF1 loss of function potentially could substitute for the receptor tyrosine kinase pathway signaling oncogene gain of function. And I think this is the first analysis that's really been powered sufficiently and has comprehensive enough data to address this question. Finally, in terms of integrative cross-platform analysis, I'm just going to show one slide out of many. Work of a number of people, including Chad Quaten, Eric Collison, Ron Bose, Nikki Schultz, Ted Goldstein, and Sam Ng. Just want to point out the significant, you know, well-known, but, and I apologize a little bit for the fuzziness of the slide, a significant deregulation of multiple genes in the RTK, RAS-RF, and PI3 kinase pathways, including in addition to the genes that we've shown, mutations of MET and of SIBL. Finally, Gordon Mills, Lauren Byers, and Alicia Diao have been doing a reverse phase protein array analysis, which has given groups that we'll be able to compare with the mutational analysis, but in particular groups with a very strong signature of receptor tyrosine kinase activity, MAP kinase pathway activity, and DNA repair pathway activity. So just conclusions thus far, both lung adenocarcinoma and squamous cell lung carcinoma have very similar copy number profiles. There's a very high mutation rate, which makes it a real challenge to identify novel candidate mutated genes, including MGA. Three distinct expression subtypes identified from RNA sequencing data with interesting mutational correlations. Multiple fusions, including a number of novel fusions expressed in lung adenocarcinoma. Multiple mechanisms for CDKN2a inactivation, a distinct microRNA and proteomic clusters that we will be working to integrate with the expression and as well as mutational and copy number-based clusters. And finally, mutational differences between the oncogene-negative and oncogene-positive subtypes, including the enrichment of NF1 mutation in the oncogene-negative group. There's an enormous number of people to really thank for their work in this project. I just want to call out a few people. Again, my co-chairs, Ramaswamy Govindan and Steve Bailen, Angela Hajapanias doing the fusion analysis, Julian Cimalecki, who's led the exome sequence analysis, Kari Sunye, who's put together the data freeze, Anil Hayes and Matt Wilkerson leading the RNA sequencing analysis and spectacular contributions from Bill Travis on the histopathology side. Thank you very much for your attention. Questions for Matt. Yeah, there's been a number of groups interested in the possible ER positivity of some subsets of lung adenode. Is there any signal of this in the TCGA data set? That's a great question. I mean, there's evidence from Jill Siegfried's group and others of a role for ER signaling in lung adenocarcinoma. We haven't asked that question, but we have our face-to-face meeting tomorrow and we will put that on our list of topics to examine in the RNA sequencing data. I had a question, Matt. So the expression subgroups, do they indicate anything about different types of lung cells that they may have come from? Are these embryonic signatures? What do they look like? Lou, I think that's a great question. The bronchioid subgroup, I think, has a lot of expression features of alveolar type 2 pneumocytes and are likely to represent growth from that lineage. I think I would turn to Matt Wilkerson for comments on the other subtypes if he wants to add anything. If he does, he can come forward. I have a quick question. So in your previous paper, you reported the difference in mutation spectrum between smokers and non-smokers. I wonder if you also see a difference in expression signature or expression subtype for smokers? I think that's a great question as well. We're still roughly, and I don't have the numbers at the tip of my fingers, roughly 15% of these cases are from never smokers. We're just starting to do analysis between the never smokers and the smokers, and I think expression differences as well as mutational differences will be important to look at. Just a follow-up question. Do you see any difference between smokers and non-smokers regarding fusions? I'm sorry, regarding mutations? Fusions. Regarding fusions. The well-known recurrent fusions, the Alk, Rat and Ross 1 fusions generally arise in tumors from non-smokers, but we haven't systematically queried the incidence of fusions between the two subtypes, between the two populations.