 Thank you, Raju. Thanks to the organizers for the opportunity to talk to you about squamous cell carcinoma of the head and neck from the cancer genome atlas. Okay. I'm giving this presentation on behalf of many collaborators, the most notable of which are Dr. Adela Nagar from MD Anderson and Dr. Jennifer Grandis from the University of Pittsburgh who are the co-chair of the head and neck cancer disease working group along with myself. And many members of the analysis working group, I'm going to try to give credit where credit is due as I go through the talk, but that may not be possible in every case just because there's so many contributors. And I actually shown here our participants at the face-to-face meeting that took place at UNC Chapel Hill in September of this year. So just to point out that folks are willing to contribute two or three days of their time to come and work on this data and I greatly appreciate that. Okay, so head and neck cancer is an important cancer. It's the number five cancer, the fifth most common cancer worldwide, 500,000 cases per year, 200,000 deaths. There are parts of Asia where it's the most common cancer usually that's in the case of nasopharyngeal carcinoma. In the United States it's the number six most common cancer with 45,000 cases per year and approximately 20,000 deaths per year. The two common risk factors, smoking with about 80% attributable risk, so 80% of head and neck cancers attributable to smoking, but a rising and well-described epidemic of human papilloma virus associated carcinomas as well. And with that in mind I've included a cartoon on the slide, not so much to really go through the details of how HPV causes head and neck cancer, but just to get some vocabulary out there because I'm going to refer to these markers many times. For the most part we're talking about HPV type 16. HPV type 16 makes two oncoproteins, E6 and E7. E6 targeting P53 and E7 targeting RB. If you look again at the cartoon you'll get a sense of some of the other important players in HPV infection and you'll also see these emerge as important players in head and neck cancer as well. And I'll point particularly to the cyclin, particularly cyclin D1 and again P16. And it's probably worth mentioning that P16 plays an important role here both as a biomarker and as part of the pathophysiology. The biomarker role is due to the fact that HPV infected cells express high levels of P16 because of reciprocal signaling so immunohistochemistry of P16 is one of the most, if not the most common, diagnostic, clinical diagnostic test for HPV infection. The data that I'm talking about today are the 279 samples that are part of the data freeze. We have a data freeze so that we can actually do the analyses. There will ultimately be 500 cases of head and neck cancer included. To be a case, the sample had to have exon sequencing, tumor SNP chips, RNA sequencing, methylation and microRNA sequencing. I will say that there's a lot of other data that included in the data freeze that will get used eventually including RPPA data, so protein expression data, but it's not available in absolutely every sample. Let me describe the demographics of the patient population. The median age of our patients was 61. This is a little bit older than this year, median age in the United States at 57. 10% of the patients are minorities, mostly African Americans. 20% of the patients are never smokers, which seems a little bit high. That may be some missing data, but in any case, that's the data that we've got. 73% of the patients are male. That's about right for the United States. In Europe, you'll see up to 90% of head and neck cancer will be in males. 11% of the patients positive for HPV as defined by sequencing, and I'll get into that in a couple of minutes. 62% of the cases are from oral cavity, 26% larynx, 11% oropharynx, and 1% hypo Most of the patients are advanced stage with 57%, having stage 4A disease. Head and neck cancer is a little unusual in the staging in that 4A does not mean incurable. This just means there's a large tumor with a large lymph node or multiple lymph nodes. Stage 4C is metastatic disease, and about 40% of the patients were alive at the time of the last follow-up. I will mention one challenge that we've struggled with, which is HPV status. Here on the screen, I'm showing seven different ways that a patient could be potentially identified as HPV positive, and so we're wrestling in the data set right now with actually defining which cases, which based on RNA sequencing, DNA sequencing, clinical history, and other factors, and this is important for reasons that will become clear later. But I'll just have some conclusions on the cohort that we have. I think this needs to be emphasized that the current data freeze, which is only half of the head and neck cancer samples that will be available, is already the largest data set, genomic data set for head and neck cancer that I'm aware of, that has ever been assembled by a factor of two for even the individual components. So these 279 cases is twice the number of expression data that are available through any other source, more than twice the number of copy number arrays, et cetera, et cetera. Again the data are, there are clinical data that are available as well, and again the data are all integrated. This is an unbelievable resource. I think of all the TCGA tumors so far, this is probably the tumor that was in most in need of this contribution, and we will be hearing about this for a decade or more. This is an incredible resource. There are some limitations, however, this is a surgical cohort, so there are relatively few oreferrinx cases, relatively few HPV positive cases, and a few smaller tumors, so these are the lower risk tumors, so there are some limitations, but nonetheless a data set to be quite excited about. Now moving on to the DNA data, this is the famous GATI, GATI gets figure from the Broad, and probably everyone at the Broad can make this now, but making it an important point which is that head and neck cancer has a very high mutation rate. There are between 1 and 10 mutations per megabase of sequencing. Not quite as high as lung squamous cell carcinoma, but probably dragged down a little bit by the fact that HPV positive tumors have lower mutation rates. This is a fairly mature version of a figure that's really a key deliverable in the marker paper, and actually I'm going to go to something that Matthew Meyerson said this morning, which is that at this point, this group, the disease working group, there is no way that we can even begin to scratch the surface on this data. Our goal is to move the TCGA data forward into the community, to present the data, to introduce the data, to show what can be done with the data. Now we are going to make some novel observations, but I think the main point is to not get in the way of others analyzing this data, and so that's our goal. But looking at the significantly mutated gene list, you'll see some many expected players, number one, CDK and 2A, or the gene that generates the P16 protein product, so that's already interesting because I've already brought in the HPV story in P16, and I'm going to get back to this a couple more times. P53 as expected, some perhaps unexpected genes, so CASP8 is an interesting target, and I'll talk about this in a minute as well, and another interesting target, HLAA, one of the MHC class 1 proteins, I'll get back to that as well. Anyone who's spent much time involved in these large sequencing projects will know that the significantly mutated gene list is a highly parameterized analysis, which means that if you tweak the parameters a little bit, you can generate vastly different lists of mutations, and many of those tweaks can be very reasonable, and if you do that and you go through the list a few times and consider some different ways to look at the mutated gene list, one of the observations you'll see is that the significantly mutated genes is highly overlapping with lung squamous cell carcinoma. In fact, of the top mutated genes in lung squamous cell carcinoma, only P10 and KEEP1 failed to commonly emerge on the significantly mutated gene list from head and neck cancer, although there are KEEP1 and P10 mutations that never rises to the level of significantly mutated. So I think this is one of the, I'm going to pause on one of the early key observations, which is that HPB negative head and neck cancer looks a lot like lung squamous cell carcinoma, and that's in terms of its mutational landscape, its copy number landscape, expression patterns, and pathway activations. Data that I'm not going to show, mostly because it's not my data, but because of TCGA, we've been able to get some early looks at it, is that HPB positive head and neck cancer looks a lot like other HPB positive tumors. I think we'll see a little bit of that through the meeting here. But it does justify even more that we need to start thinking about these tumors in different ways. And so I've just highlighted one of these thoughts here, which is the idea that some of the key mutations might be different between HPB positive and HPB negative tumors. Here's one example with PIC3CA showing a mutation rate of 35% in HPB positive samples and 19% in HPB negative samples. This is assuming that 34 of the tumors were HPB positive and 254 negative, and you're starting to see why it's so important that we get our HPB calls correct. I'm also going to show you later why it's challenging to get these calls correct. Just one slide on the whole genome analysis just to remind us that we have it. We've got some very interesting cases, but we really haven't had the time yet to develop the whole genome story, so I'm really not going to talk about these further today. But there are approximately 30 whole genomes that have been done for head and neck cancer. Going into the copy number landscape a little bit, I think this is one of the key observations, and this I think will be a figure in the marker paper showing lung squamous cell carcinoma copy number landscapes. So this is the genome from chromosome one through all the autosomes, HPB negative tumors, HPB positive tumors, and from 10,000 feet you can clearly see that these tumors that share many of the same copy number alterations, universal alterations of losses of chromosome 3P, gains of 3Q, alterations in chromosome 8, but there are some differences and I'll go through these in some of the subsequent slides. Looking at the focal amplifications between head and neck cancer and lung cancer, really very similar patterns of focal gains, but a couple of exceptions, PDG-FRA, for example, the peak for PDG-FRA on chromosome 4 completely absent in head and neck cancer, but otherwise largely very similar lists. Comparing HPB positive tumors to HPB negative tumors, this is an observation I should have already given some credit to Andy Cherniak who generated a lot of these figures and has been a great collaborator on this project, is that in the HPB positive tumors, really a striking lack of oncogenes other than PIC3CA, perhaps a little bit in terms of some CCND1, Cycling D1 amplifications, but overwhelmingly PIC3CA compared to a much deeper selection of oncogenes in the HPB negative tumors, and I think this is a novel observation. And again gets back to the importance of looking at that mutation rate of PIC3CA in HPB positive tumors. In terms of focal deletions between HPB positive and negative tumors, one in particular is striking and this is a deletion chromosome 11. So this reminds me, I'm going to make this conclusion a couple of times, that the copy number landscape in head and neck cancer appears to be very rich in terms of defining its biology, perhaps more so than the mutation spectrum in some cases. One of the challenges with copy number alterations, even focal events, is that sometimes three or four or 10 or 20 potential oncogenes occur within the amplicon, and so this is one of our challenges is defining the key gene within the amplicon. I will point out that TRAF3 does its gene expression and its copy number track in this deletion. The red samples are the HPB positive samples, so it's certainly quite intriguing that this could be the target of the chromosome 11 deletion. So you saw this morning that Chad Creighton showed clearly they spent time in the renal cell carcinoma paper validating the mutations or coming up with a list of the most credible mutations. In the head and neck cancer project, we've moved, we've taken a somewhat different approach from having multiple centers call the mutations to using the RNA-seq data to validate the mutations. This is a very powerful technique because you have an independent sample, an independent sequencing, independent alignment, and then you're checking the mutations. And the way to read the figure is every column is a sample. The height of the blue bar is the total number of mutations from that sample. The height of the yellow bar is the number of the blue bar that actually had any coverage in RNA. So was the mutant base covered with any RNA whatsoever, even a single read? The red bar is the fraction of samples that if that mutation was covered, was it validated in RNA? And if you think back to Chad's figure this morning where essentially all that was happening there was folks using the same DNA to call mutations, I think you'll see that the RNA confirmation rate compares very favorably from independent sequencing reactions. And so here we're seeing greater than 80% validation if the base was covered. The RNA-seq is an incredibly rich source for structural variants in the transcriptome. And I'm really not going to have time to get into this much today other than to give you a couple of conclusions. One is that at this early point, and this is up for debate and needs to be validated, there's really not any convincing evidence that there are recurrent in-frame gene fusion events. And this is similar to what we saw with lung squamous cell carcinoma. So these are sort of in-frame oncogenes. However, there's quite convincing evidence that structural rearrangements in the DNA and the resulting transcripts are functional, more likely in terms of tumor suppressor gene inactivation and loss. And I think this is a novel observation that we're going to try to make. Shifting gears a little bit and thinking about some of the patterns that loosed out showed us this morning, thinking about the use of expression analysis to identify molecular subtypes of head and neck cancer or of any tumor type. I'm going to start with the example from lung squamous cell carcinoma, the manuscript that was published in September this year in Nature, where we described four subtypes of lung squamous cell carcinoma, classical, primitive, basal, and secretory. There are many stories in these data, but I'll just pull out one for the illustration today, which is that the classical subtype of lung squamous cell carcinoma is associated with near-universal alterations of keep and nerf. And one of the ways that's going to be identified is by high expression of NFE2L2 in all of the classical subtype. We've performed a similar analysis in head and neck cancer in samples that were available from UNC, then validated in TCGA data, and I'll just tell you that we've borrowed some of the names and generated some new names. The names in this case are atypical, classical, mesenchymal, and basal. Here I'm showing independent validation of the patterns and samples from UNC and independent TCGA samples. Here I'm showing a centroid validation of these four subtypes from what's really the marker paper for head and neck cancer subtypes published by Christine Chung in 2004. And this analysis performed by Von Walter shows that the subtypes of head and neck cancer correlate strongly with those same subtypes from lung cancer. So for the basal subtype of head and neck cancer and lung cancer there's a single unified node, the mesenchymal and the secretory subtype correspond, and the classical subtypes from the two groups correspond. Having expression subtype is certainly interesting, but it's just a novelty until you can propose a model for that subtype or what the genomic alteration might be. This is a particularly exciting one where in the atypical subtype, and I'll just, for the sake of time, also point out that this is the subtype that's associated with HPV-positive infection. And so the HPV-positive patients almost all fall in the atypical subtype have completely absent amplification of chromosome 7, and most notably no instances of the focal high-level amplification at the EGFR locus, and this is true both in data from UNC as well as in the cancer genome atlas data, again suggesting that the PIC3CA oncogene in these samples may be the relevant oncogene. Again thinking in a pathway manner, looking at expression of NFV2L2, and again you'll see samples from UNC as well as from the cancer genome atlas, universal expression of NFV2L2 in the classical subtype, as well as the atypical subtype that absent in the basal in the mesenchymal, and this is the same story from head and neck cancer. I mentioned early on mutations of HLAA, which were reported in lung squamous cell carcinoma, which we were also seeing in head and neck cancer. It's a very interesting mutation. It was probably the most unexpected mutation, which is one reason why we didn't comment on it very much in the lung squamous paper. So in this instance, what I'm doing is I'm using the tumor subtypes to explore this mutation, which is otherwise sort of a curious event. And let me walk you through the figure. In the top of the figure, what's being represented is gene expression. And it turns out that HLAB and C are all right next to each other on chromosome 6, and they share a very coordinated gene expression, so I've just collapsed them for the sake of display. The same thing is true for copy number alteration, and tap 1 and 2 are also on chromosome 6 right next to each other, so they're sort of a coordinated pattern. In the middle of the figure, I'm showing the copy number. Here I'm showing mutations of HLAA, A, B, and C, tap 1 and 2. And here I'm showing DNA and RNA detection of HPV. So in the interest of time, I'll just point out a couple of the patterns, oh, and one other thing. As a proxy for lymphocyte infiltrates, we've got expression of CD3 and CD8 as markers of infiltration into the tumors. And what you'll see in the classical subtype, universal lack of expression of HLA, A, B, and C, and in large part due to deletions of the gene, but not universally. So for the most part, most of the HLA, A, B, and C mutations occur in the basal subtype, and these are mutually exclusive. So I don't have time, I guess, to dwell on the figure too much, but this is one of the early views of helping us to try to understand a mutation, which was otherwise quite curious, now starting to see some signals that actually there might be pathway activation and signaling in a coordinated way. Speaking of pathways, for those who have been involved in TCGA and other large sequencing projects for the last five to seven years, we've spent a lot of time thinking about RAS signaling, AKT, P10. Well, one of the great pleasures of working with the current group is not only do we have new faces, we've also got new expertise, and so we've really expanded our thought process in terms of some of the pathways and the targets that we should be looking at. This is a figure generated by Carter Van Weis, I'm not sure if he's here today, but who's really been contributed greatly to this project, pulling out really survival and death pathways, which we have not looked at in our sequencing projects before. And again, I'm going to go back to Lou Stout's story this morning, thinking about coordinated events, those mutations that occur together or in an anti-correlated manner. There's a lot going on in this slide, so I'm only going to talk you through one of the stories, but I mentioned earlier on mutations of Caspace-8 and HRAS. So one of the very curious findings is that Caspace-8 mutations occur only in the basal and mesenchymal subtype and frequently in conjunction with HRAS mutations. When there is a mutation of HRAS or Caspace, there is never an amplification of CCND1, which is the 11Q amplicon, which happens to also be right next to FAD. It's unclear which of those two genes might be the true target of the 11Q 13 amplicon, but the pattern is unmistakable. For those patients that have amplifications of CCND1 or FAD and expression of those oncogenes, they have universally low expression of genes from a second amplicon on 11Q, 11Q 22, with additional death-related oncogenes, YAP1 and BIRC2. So some patterns emerging, you know, I think these are some of the patterns that we're going to be evaluating as we move forward with this manuscript. In the interest of time, I really don't have, this is just not the time to talk about all the data types. I will say we've seen some amazing contributions from British Columbia as we have in other tumors with identification of tumor subtypes based on microRNA and some of the earliest looks at differential clinical outcomes within these data sets. Similarly, there's, if you have time to come by the poster, I'll show you some great examples of coordinated methylation gene expression data, particularly for P16, a very interesting story, and also description of methylation subtypes by the group, by the methylation genome characterization centers. Finally, I think this is my last slide. One of the most exciting observations is through unbiased sequencing for the first time, being able to, because it's unbiased, to detect DNA and RNA that we weren't looking for. And in this case, it's viral RNA. So what I'm showing here, and this is data that's described in detail in a poster by Matt Wilkerson in the poster session, is the fraction of patients here on the top row, the fractions of patients that express some HPV type 16 RNA. What's interesting about this is that this rate approximately 20% is far higher than the number that have a clinical diagnosis of HPV infection. And it's also far higher than you would expect based on the fact that only 11% of these patients have oral pharynx tumors. In addition, there are other viruses in the tumor that are also detected at high levels. And again, I'll refer you to Matt's poster. But most prominently, herpes virus, and we have near universal coverage of the herpes genome in at least two of the samples. We'll get some more insight into viral sequencing in a talk that's given tomorrow from Raju's group. So final word, thanks to the contributors. And we look forward to getting the state out into the public. Thank you. One question. Yeah. This is Yudin Huang from Arkansas. So very nice result, I mean, related to the molecular subtype. I'm very interested in that, and I'd like to talk with you more later on this topic. I have a question related, indeed, related to the TCGA sample. Like on our computational side, when we do further computational analysis, always a new technology developer. So is the TCGA the same sample for later on, like for further verification or further new technology? So, I mean, TCGA, when they prepare for the sample, do they save extra sample for later on computational verification? I think the short answer is sometimes. Sometimes there's extra sample available. Kenna is shaking her head, yes. And when there is, the program team has been very, and there's an important question. They have made those samples available, but the samples are ultimately limited. Thank you. Thank you. Thank you, Neil.