 Questions. Our next speaker is going to be talking about functionalizing the cancer genome. And Linda Chin from Harvard School of Medicine is going to be the speaker. Thank you, Mark. Thank you to the organizer for the opportunity. The speaker is an honor. I want to talk a little bit about the effort in TCGA and what I believe are important next steps to understand and translate the information to impact on medicine. To start, as Dr. Lifton did, I'd like to remind all of us why we are here, our goals in cancer medicine, which is to prevent cancer, detect this early, because there's no question that that's where we have the biggest impact on survival. And when that fails, intervene appropriately. And we have heard today that cancer really, ultimately, is a disease of the genome. And therefore, it makes sense that if we could understand what the genome is telling us, we can do a better job in managing the patient. And that is, in my view, what personalized medicine is. Now, there is evidence that genomics already impact on science. And I want to highlight a few examples here. Certainly, we can start out with the example of BCRAO, the Philadelphia chromosome, that taught us the power of targeted therapy. And the Herceptin example that show us that we really need a biomarker that allows us to select the right patient population for a drug to be effective. And certainly, the poster child, the BRAD mutation, which really was discovered by probably the first systematic cancer genomic efforts, leading to effective drugs that's likely to be approved this year. So that's merely eight years from target discovery to a drug, lightly approved drugs. And that compared to the example of GLIVAC, which took 41 years from discovery to a target to drugs in the clinic. So given that, I think it's clear that cancer genomic can impact dramatically on medicine. It will enable a more rational and effective approach to prevention, which is targeting the underlying etiology. It will certainly help us detect cancer early on by targeting, in the rational way, targeting the known allele that occurs early, and applying technology, such as serum proteomic or imaging. And when that fail and the patient comes in with cancer, we focus a lot of our efforts on using the genome information that guides us to new therapeutic targets and biomarkers that allow us to select the right population of patients for drugs against that target. And ultimately, it's quite clear that monotherapy is not going to be effective. It's not going to give us long-term cure. Therefore, we need to begin to think about combination, co-extinction strategy. And learning from the genome will give us a rational way to go about doing that. So it is with that hope expectation that the NIH, NHGRI, and NCI come together and launch the pioneering effort, the TCGA. The pilot project ran from 2006 and 2009. And I would say it was definitely the first effort, coordinated effort, where there is intention to characterize the cancer not just on one dimension, but multiple dimensions, not just looking at copy number change, or not just looking at sequence alteration, but really try to understand what the expressions are, both the coding and non-coding, and also, in fact, trying to map the promoter methylation patterns in the same samples. And that project had moved on. And I got this recent update from the NHGRI, I believe, the council update. The pilot ended in 2009. And it was continued, based on the success of that, in phase two. And the phase two have gone, and this is the plot of the sample that's entering TCGA, provided from the NCI office, where we're right on track to complete the 3,000 cases to study in the phase two project, which will be cases from these 20 tumor types from diverse organ systems. And there is a very aggressive and ambitious plan to complete these comprehensive analysis of each of these 20 tumor projects by 2014, I guess. That's what it says. And the current study design of the TCGA is that all every one of these tumor samples and the match normal will get whole genome sequence and a good percentage of them whole genome, in addition to sequencing the transcriptome, the microRNA, and mapping the methylation patterns. And we're also beginning to layer in proteomic analysis on some portion of these samples. So we are really trying to build a comprehensive map of what the genomes is, what the transcriptome is, and hopefully linking it to what the proteome is. Now, how is this all possible? We have heard a lot about earlier today, it isn't being made possible by their transforming technology, the so-called next-gen sequencing or massively parallel sequencing technology, which I'm not gonna go into, you heard a lot about there already today. But one thing that hasn't been mentioned a lot beyond the cost is the fact that this technology doesn't just give a cheaper sequence, it gives us a lot more information and much more accurate information. This is a slide I borrowed from Gettigast from the bro and it really highlights what the next-gen sequencing technology can do. It certainly can identify poor mutation and small insertion and deletion, but it can do it much more sensitively and more accurately than the O-capillary-based sequencing methods. It could also do copy numbers, like the array-based technology, but in contrast to that, it will do it with digital quantitation, it will tell you precisely what the copy number of that particular sequence is, which the array-based technology is not able to do. And importantly, it will give us one new dimension information, which is rearrangement, that's the previous technology we were completely blind to with the previous technology. This is the first time we'll be able to map the rearrangement down to base pair accuracy in a high throughput manner. And then beyond that, we can begin to interrogate evidence for non-human pathogens and sequences and their roles in human disease. And certainly there's a lot of evidence in clinical medicine that infectious etiology plays a role in human disease. So we already heard about how much the sequencing costs have come down. And Eric Lanter have shown that how much the production of sequencing data was, have changed at the Bro Institute, but I like to use this slide, which I got from N-A-G-R-I, to show how this is also changing the way we can think about what cancer genomic can do. This is a little piece of advertisement I pull out from a company, Illumina, that advertised that their machine, high-seq 2000 in one single run, which takes eight days, can generate 200 gigabase of sequins. And this is the plot from N-A-G-R-I reporting on their N-U sequencing production for physical 2007. This is from all of their sequencing center. I believe not just cancer sequencing, all of their sequencing output in 2007. And the total is 140 gigabase. That level of growth, which have continued and I believe this is way off, probably 10-fold less than what actually happened in 2010, because this was a projection earlier in the second quarter of 2010. This growth had changed the way we can think about what we can do with cancer genomic. We are no longer limited, I think, by the real state that we can sequence on. We don't have to choose between number of sample we need to provide the statistic power versus how many genes we can sequence. We can do them all with this technology. And I think we just heard from Brad Bernstein that we can apply this similar to the epigenome. So it really is transformed the way we can think about and we can look at the genome at a level that we never imagined possible. But it also created a lot of challenges, certainly storing data, transferring them, not to mention mapping them. And then you have to analyze, make sequence variants call, calling, determining translocation, the rearrangement. None of this is easy, none of them was easy and is still not easy and is still an evolving science. And on top of that, the rate of growth is enormous. I guess I forgot to show this. This is the latest accounting from TCGA alone. The sequencing output per month is about 17 terabytes. So this is enormous challenge, not just to deal with the data, process them, map them, but also to make sense out of them. And someone earlier this morning already mentioned and asked a question about a cost of analysis. And that clearly is a major bottleneck. And recognizing that, I think the phase two of TCGA have begun, its inception have anticipated some of this although I don't think anticipated the amount of the challenge, but certainly it differ from the pilot in the sense that it have built and funded dedicated genome data analysis center. So it's no longer just a collection of center that generate data, they actually have data center, center that are really devoted to analysis. And moreover, the center together are really trying to think of ways to accelerate making sense and analysis of data such as building automated analysis pipeline. And here's an example, the TCGA analysis pipeline that's based on the bro where we try to automate and have fast turnaround predefined analysis and example of being able to ingest all the available data over 2,000 data file into a pipeline that can analyze each of data type, generate analysis for each single data type in addition to correlated data type such as which mutation is significantly correlated with survival, for example, in that particular cohort and do so in a predefined automated way providing a very fast turnaround with results that's human readable so that this can serve as a companion to the raw data that's released to a public because majority of community, which I'll come back to, need to use this data and validate them and functionalize them, but there is a challenge for them to make sense of the enormous raw data. So the analysis is important. Moreover, I think by automating and putting this in the pipeline it provides reproducible and uniform intermediate data files that can free up the analysts in TCGA and outside of TCGA to do higher level analysis such as trying to figure out what the meanings of a mutations and its relationship to genes that's known to be in the same complex that might be altered by methylation or by genomic alteration. They can focus more on that type of analysis that's really aimed to understand a biological question or answer a clinical need and so that we can accelerate the process. So with that and with all the talk you have heard earlier today I think it's safe to say that with the effort of TCGA, along with our international colleague ICGC, in the next five, 10 years we will have a complete atlas of all the somatic alterations and somatic epigenome alterations in the cancer genome of all major cancer types. But then the challenge is how do we go from here to here and it's obviously not a single step and what it takes is a lot of work and it was called the Valley of Death or Big Gap and I think one of the things that I think we need to begin to think about how to take all this information and do it in an effective and efficient way to support and make that knowledge actionable and translatable in the clinic. And one of the example that I like to remind you before I go through this example is I mentioned at the beginning that the BRAD mutation and development of BRAD inhibitors certainly the poster child that we like to think about and we like to see many of that coming out because the short time lags between discovery and the drug that's impacting on patient survival. But I think we need to remember that that process would not happen in eight years if we didn't have the prior knowledge that BRAD is a kinase, the BRAD signal in the MAPCAN-A signaling pathway and FOSTROMEC is a good downstream reporter of BRAD activity that one can use as a target engagement and respond to ID throughout the entire process of drug development in a process. So without that knowledge, we wouldn't be able to develop a drug. But for many of the genes that we are identifying now, we really don't have a clue as to how they function and what they do or whether even the enzymatic activity is there. So we need to get that knowledge. And there are no easy way to do it and I'm not sure there's a high throughput way of doing it but I want to now give an example of a study that we have been working on that I hope to highlight the value of taking the next step, functionalizing, not just showing activity but actually understand how a genetic element functions in cancer and how that information is necessary to make it a translational oscillation. So this is a paper published by TCGA a couple years ago defining using this data that glioblastoma really represent at least four major molecular subtypes which are defined here such as ploneral, classical, mesenchymal and they all are enriched with specific genotype such as the classically EGFR v3 mutation that we are known to associate with glioblastoma is really only seen in the classical subtype. The IDH1 mutation that has been mentioned earlier is almost exclusively observed in the pronural subtype and so on. Now, one of the things that we were interested at least someone in the lab, Johnny a post lab fellow was interested was what is the difference? What's driving the molecular differences between these subtypes? You can look at a transcript of all these GBM using the TCGA data and interestingly you find that the major difference on the transcriptome level it really exists between, you can barely see a mesenchymal and pronural subtype. So here are the hypothesis that maybe some micro RNA is regulating a collection of genes and that's really underlying the molecular differences between pronural and mesenchymal. So those are the hypothesis but how do you go about testing them? Well, we know we have these data very complex data from TCGA and we need to make sense of it and this is where computational modeling becomes valuable and we went to our collaborator, Jim Collins at BU and asked him to help us develop and use his CRO network modeling algorithm which I'm not gonna go into to try to build a regulatory map of micro RNA RNA in clear blastoma which we did with his help using this about 200 sample which were matched RNA micro RNA data and this is here the value of doing the integrated genomics in TCGA. You cannot do this analysis if your RNA micro RNA data are generated on different sets of samples. With the network algorithm we generate the furball that represent the network which by itself is not very useful. There are 29,000 edges among several hundred micro RNA and thousands of messenger RNA but we had a question that we wanted to use this for so then make this so that we know how to use this information. We take these network relationships and we ask how is that different between pro-neuro and mesenchymal and without going to a lot of detail we identify 70 micro RNA that really drives the separation of between pro-neuro mesenchymal and you can see that all the messenger RNA edges connected to these 70 micro RNA really account for 85, 90% of the genes that make up for the signature gene reported by the TCGA paper to define the pro-neuro versus mesenchymal subtypes. So now you can go in and ask what are these 70 micro RNA in doing and most of them are unknown but I'll give you one example. We decided to focus on one of them which is the Mia 34A and with the hypothesis well because it's also a micro RNA that has evidence that in some tumor type is deleted, is in the region of loss it express a very low level on the pro-neuro subtypes which fit our hypothesis and its CR edges are enriched for pro-neuro signature which I'm not showing you. So that led us to hypothesis that Mia 34A can be a candidate determinant of the pro-neuro molecular subtypes. So the first thing we have to do is first prove that Mia 35A actually does something in glioblastoma and we can do that very quickly by proving that it is a tumor suppressant mirror. Okay, so we can do this also function study in human glioma cell where we overexpress the mirror and show that we eliminate some tumor genicity of these cells in vivo, we can do the converse which is to use decoy, knock down expression in Mia 34A in a immortalized human astrocyte and show that now they become transforming and tumor-genic. So, and we can do this multiple cell system in human and mouse system. So we prove that Mia 34A is a tumor suppressor in glioblastoma but that information is not very helpful. There's not much more we can do with that information. So we decided to go a little deeper and say how does Mia 34A contribute to the pro-neuro versus metancombo-singletia? Now, this is to have a simplistic view to think about regulation between microRNA and RNA but we can certainly think that there could be direct interaction where the microRNA directly bonds to the target gene and negatively regulates expression but it also could do so through intermediate such as transcription factor. Where you get a very different relationship. Still, a relationship that you can see in the network model but it wouldn't be the same kind of negative correlation. So I'm not gonna talk about this, I'll just talk about our effort to focus on the period of direct target on Mia 34A. Through using sequence analysis and correlations, we identified two period of direct targets on Mia 34A and then the criteria we use is down here. Now, using one of the things that we are aware of is the sequence prediction is at a very high false positive rate. So what we are asking here is that we only focus on predicted targets that are predicted by all three algorithms. Now, we went on to prove and I'm not gonna show you the data by Luciferay reporter assay that DL01 and PDGFRA are indeed direct targets regulated by Mia 34A through direct binding and we can show regulation on protein level by Mia 34A of these two target genes in human and mouse cell. So they're clearly novel direct target on Mia 34A. What's that mean? Is that relevant? Well, the reason that we were interested in this is because what we know about glioblastoma. This is an old review. It's very old from 2007. Really needs an update now. But it shows here a point that I want to make. Glioblastoma is clinically defined as two types, primary or de novo glioblastoma versus secondary which progressed from anti-ceven low-grade glioma over five, 10 years. It becomes the same, similarly aggressive glioblastoma multi-forming. The classical signature gene associated with low-grade glioma that progress to secondary GBM is PDGFRA. And what's not shown here is through work of Wogelstein's group and then the molecular classification paper from TCGA. We know that these secondary GBM are the GBM that has IDH1 mutation. They are of the pro-neuro-subtype. So in other words, PDGFRA is expected and is a signature genome pro-neuro-subtype GBM. With respect to mass signaling, which is what DL01 regulates, we can see that actually, I don't show here, Heidi Phillips' paper actually shows that mass is activated in both classical and pro-neuro-subtype GBM. And we can see that signature also in the TCGA data. So we know that MEAD34A is low in pro-neuro-subtype GBM and it up-regulates because it's low. Notch and PDGFRA signaling in pro-neuro-GBM, which we do see in human tumor. So we can see that human relevance. But there's another important point. How do we know that this relationship we can see in silicone through some artificial assay, where we do reporter assay to show they truly interact, we do artificial overexpression or knockdown studies to show that one can regulate the other? I would say all of these are artificial. How do we know that these things really happen in vivo? Well, it's not that easy to go in vivo to test this because not only can we not do it in cell line and cell and themselves are limited. For pro-neuro-GBM, this is particularly problematic because there is no known pro-neuro cell lines. And I think after many years searching, there's probably one cell on the IDH1 mutations that's out there. So majority of models, cell model system, are not pro-neuro. And that's where the mouse genetically engineered model become useful. The Pinox lab had published two years ago a P53P10 genetically engineered model that leads to spontaneous high-grade glioma. I'm not gonna go into the details. So we took these mouse tumor and profiled them and asked, well, P53P10 are seen in human pro-neuro subtype GBM based on TCGA data. And when we take the mouse tumor and profiled them and ask what type of tumor are they, you can see that they are significantly enriched for pro-neuro signature genes. In other words, the P10P53P3 model is a pro-neuro model GBM. And consistent with that, MIA34A expression is very low in these tumor and PDGFRA is overexpressed. And in fact, this is a IHC from the paper showing that in the tumor part of the brain, you see high-level activation PDGFRA, which you don't see in normal brain. And we can show in this system that MIA34A does regulate PDGFRA. And same thing with not signaling, we can use the GEM model, the P53P10 cell, and show that when MIA34A is knocked down and they form tumor, those tumor have high evidence, a high-level activation of not signaling. So what that means is that by using the model system, the in vitro system we have, based on in-silical prediction data, starting with the sort of multi-dimensional data from TCGA using a network modeling, we formulate a hypothesis. That is, MIA34A defines a subset of tumor and this subset of tumor will look like pro-neuro and they activate, they have concurrent activation of both match and PDGFRA. And now this is, and we can show in the in vivo model that similar human pro-neuro GBM that this does happen in a real tumor. So that gives us a framework to understand how MIA34A may be contributing to molecular signature of pro-neuro subtype. And importantly, it gives us a hypothesis that we can test. A hypothesis that says perhaps MIA34A defines a subtype of glioblastoma that's sensitive to combined inactivation of match and PDGFRA and they were actually drug targeting both of this pathway in clinic. So this is an example where starting with a biological question, leveraging TCGA data and really using high-level computational mining to develop a framework to test the hypothesis. But at the end is understanding how the event contribute to tumor to lead to a hypothesis that's translatable. Now this remains to be tested, but it is a testable hypothesis and it may lead to a translational impact. So what I hope to show with that one example is that we can do cancer genomic. I think the technology is here, the capability is here, but let's not underestimate the challenge of analysis, not just bioinformatic analysis, but higher-level computational analysis that formulate hypothesis, provide framework for experimental testing and understanding of mechanism and ultimately is that understanding that lead to actionable information that we can translate. Now, translation is not easy, but we need to also focus on doing this part. In the last minute, I want to come back to this little box here and just say something really quickly. Yes, we can get sample, we can profile the sequence to tumor and generate cancer genomic data, but what type of sample we sequence and how we collect the sample and what information we know about sample is also an important aspect. It's not easy to do, but we need to think about it because I will give you one example. Right now, a lot of the community's effort really focus on identifying new targets and new biomarkers, but we have to remember that there's another really important impact that genomic can play which is improving the way we manage the early-stage diagnosed patient. Why is this important? Because that's our majority. Majority of our patients are diagnosed at low stage. They currently are treated by surgery and triage based on pathology and clinical staging and much of our cutting-edge effort focus on the tip of the iceberg. Now, that's not important, but we shouldn't forget this. And the reason is we need to do a better job. I think this was supposed to be animated, I forgot, because what we know clinically is the pathologic and clinical staging cannot identify all the patients that can be cured by surgery alone. 10 to 15% had inherently poor prognosis and they need additional therapy in adjuvant settings, but we have no way of identifying those patients. What we need now is that we need molecular characteristics to identify that for us, such as, I'm gonna skip this part, such as a paper that is in the same issue this week in nature, identifying prognostic marker, then identify prostate patients, diagnose with prostate cancer, and identifying their risk of recurrence so that we can enlist them more appropriately into therapy or not. That has tremendous healthcare economic as well as quality of care impact. So how does genomic help here? I think we need to think about evolution of cancer genome differently. I think a lot of data now shows that it doesn't happen as a serial stochastic event where you pick up one event, you become early stage cancer, you pick up another mutation, now you become more aggressive and so on. In fact, we know through genomic study and some mechanistic study that at the transition point from Benign to Mid-Lintern, these tumors already have numerous alterations in their genome, and depends on the hand that they are dealt with, they are either inherently very aggressive or they inherently benign. That's not to say they couldn't acquire more events, but they are pre-destined at the beginning to behave a certain way, which means if we can get at this early stage cancer, understand what their genome is and identify the genomic event that predicts poor outcome or aggressive behavior, we can identify the high risk patient among the early diagnosis and provide them with appropriate adjuvant therapy and spare the one that can be cured without a toxic downstream therapy. Now, one other point that I was gonna take out but I do wanna mention that Dr. Lipton mentioned in order to do this, we can sequence many human patients early stage cancer to find what's different between the aggressive and benign, but these are early stage tumors. They are tens more than the late stage disease and you have to have very long follow up to know whether they had good or poor prognosis. This is where the extreme case becomes really helpful and we can get to the extreme case using genetically engineered model. When we engineer them to have black and white outcome and that can serve as a starting point in leveraging evolution conservation as another way to get us in to a shorter list of candidates which we can then take them into functional study, identify one that can truly dry metastasy and then turn out these are also oncogenic by themselves so they're really true therapeutic target as well and importantly, we have shown, or at least in our study, that these are not as expected, not just permastics in a particular lineage, they could be cross lineage prognosis. This is an example of metastasis genes that we identify in early stage melanoma that can be prognostic in melanoma but they turn out to can also be prognostic in three different cohorts of breast cancer, suggesting there are some fundamental process that early stage cancer cell have if they have those genetic events, they are wired to behave more aggressively whether they are in the skin or the breast microenvironment. So at the end, I wanna say that let's not forget that cancer genomic can impact here to identify prognostic molecular based markers who complement our standard care which is pathologic and clinical staging and since these tumor deregulated early stage cancer, they can be identified as bone-informed prognostic biomarker and therapeutic target and importantly, since we can identify the functionally active run and through mechanistic study, we can also predict what the right therapy is that the patient would most likely benefit from in an adjuvant setting. So I'll end here by saying that I hope that we all believe and hope to see that cancer genomics will impact and lead us to genomic medicine or personalized medicine but on different levels, not just in therapeutic targets but ultimately in early prevention, early detection and management of early stage patients. I just wanna say one more thing in terms of acknowledgement. The Michael Ornate, near 34 study, started by Johnny with Ayla from Jim Collins Lab, aided by Saatchi, who's our computational biologist and also our team at the bro who really work on the analysis pipeline. Thank you.