 Well, thank you very much Linda. It's it's really great that TCGA has a scientific symposium And it's wonderful that so many people have signed up as I understand it from outside the narrow TCGA Community to really open up the the doors of this project in a very physical way Just is putting the data freely available on the web has been opening it up in an electronic way So what the organizers asked me to do is give kind of broad overview to kick off of a first symposium about fulfilling the promise of cancer genomics and I've chosen to organize it around a past present and future theme particularly focusing on What lies ahead which means what's not accomplished what we don't know how to do and have to know how to do What we haven't organized yet to do So for those of you who are you know new to the TCGA project I think this is the sign of a good project is that Whatever we've accomplished. That's all fun But really the question is what what really lies ahead is our work and so that'll be the theme So let me just kick off here the past well TCGA goes back a long way it goes back to 1914 Not actually as a working project, but to Bavarian's proposal that chromosome defects cause cancer Now this was proposed quite a long time ago almost a century ago and yet That sort of lost ground considerably over the course of the 20th century with the idea that viruses cause cancer and that by 1970 you could find a statement in a scientific paper with with this great certainty the viral origin of the majority of Malignant tumors is now documented beyond any reasonable doubt One should always be suspicious about statements like that Because that pretty much was the opening bell for finding out that the vast majority of tumors are not caused by viral changes Really with the seminal work of Bishop and Varmas with the discovery of the fact that viral oncogenes are related to cellular genes And that many of the defects most of the defects in cancer are indeed due to defects in cellular genes of Bob Weinberg's discovery of a first proto-oncogene mutated in the human by 1986 All of the major mechanisms of screwing up a gene were known Point mutations and we're asked translocations in able Amplifications of Mick deletions of RB. They were all known that you could in principle do those things But it was also clear that it was going to be unbelievably painful to find these things one at a time through transformation assays or through walking along the chromosome to map genes in different ways and so Renato Del Beco Pretty important thing It was around the time people were beginning to talk about a human genome project the idea of having the complete sequence of the human and A number of arguments were given but Renato made a Strong argument in the pages of science magazine Saying that sequencing the human genome was especially important because it would provide a turning point in cancer research He said we have two options either we're gonna do it A piecemeal or we're gonna do it wholesale. We're gonna have the whole genome available to us So these arguments in the late 1980s eventually prevailed and the human genome project got started as everybody knows and by 2001 There's a draft sequence in 2003 a finished sequence and so by about the year 2000 people could begin to take Genomic approaches systematic approaches The early couple years of the genome era saw the uses of micro arrays for measuring DNA copy numbers and RNA expression It saw early efforts using that sequencing technology that was available at the time of DNA sequencing It saw the discoveries of RNAi and the recognition of the hints that one might be able to apply it systematically or still early for doing such things and People began notably the Sanger Center with its leadership Johns Hopkins folks in Boston of the Farber and the Broad Began doing systematic projects discovering mutations In classes of kinases primarily at the beginning BRAF pick 3CA eGFR all of which have important therapeutic implications and Studies integrating genomic information of different sorts pointing by a variety of methods to new kinds of genes well again The thing to do is be Unsatisfied and impatient with how things are going and by 2004 It was clear that the idea that a human genome sequence would be a good thing for cancer was was true But we needed to think more systematically We needed to think in a more organized fashion and the National Cancer Institute's National Cancer Advisory Board put together a working group with the harmless name working group on biomedical technology whose official mission was to Figure out ways in which biomedical technology could be used to accelerate Progress in cancer research and it very quickly focused on two things One of which was the idea of what was then called a human cancer genome project and this committee was put together Of a number of people and it got involvement of another 50 or 60 people in it and it produced a report in February of 2005 The report made the following arguments it said well cancer is a genetic disease cancer is a highly heterogeneous disease Cancer is an understandable disease Systematic understanding would have major implications for the identification of cellular pathways than there like cancer selection of therapeutic targets Resolving cancer into homogeneous groups faster and more finished efficient clinical trials improved applications of drugs epidemiological study design identification of markers for early detection It said we're still ignorant and it said that a systematic understanding of the cancer genome would be Technologically feasible within the next decade and that Its cost would be reasonable would take about a 3% increase in the budget to the NCI to cover such a project These by the way the slides I'm using for this little segment are actually the slides I used in the spring of 2005 reporting this report out or so so I didn't I didn't change any of these here And I think those still remain a good set of reasons now The argument that was made there was that the human genome project was a good analogy when the human genome project was proposed It was in prehistoric days when ancient cavemen were running around without PCR technology and other things and the thought of Actually before automated sequences were available the thought of doing a human genome project was absurd But 1986 it was clear to folks that there were going to be ways to do it And that those ways would appear before us only if we started down the path that there is a virtuous cycle You start a project that creates demand people produce technologies and onward and onward and indeed the real cost of the human genome project Although nobody wrote it down in 1985 would have been 20 billion dollars in 1985 technology They said a dollar a base but in fact they were completely nuts when you go back and cost it was five or six dollars a base and By the time of 2005 the cost of doing another human genome had plummeted to 20 million dollars from 20 billion dollars very good And they pointed about the plan was to be flexible and incremental and let it evolve and learn So the same sort of lessons were proposed for what should be done with human cancer genome project that We should start with what you could do you could do genomic loss and amplification with snip chips and other copy number arrays Get going on stuff like that. You could do point mutations in coding regions of some genes And we set the goal of point mutations in all coding regions Nobody breathed the word about sequencing the entire genome because that would have been considered so nuts as to put people off So the report was very clear about we'd sequence all the coding regions Although it was a clear understanding that one would go further But there was only so much you could say at the time and that there were no really good ways to do chromosomal Arrangements and epigenetic changes at the time, but people could imagine them people could actually propose them there were ideas for how to do it None had been implemented in 2004 2005 yet, but it was clear it could happen so the goal was set out for identifying all genomic alterations significantly associated with all major types of cancer by creating a large collection of Samples and completely characterizing with respect to Chromosomal loss and amplification with arrangements mutations aberrant methylation and gene expression pretty much the plan there is today as Usually happens in these cases Many people hate it It's a natural human tendency the objections were a All cancer genes are already known Be there are so many cancer genes that it's hopeless to ever find them all One wished you could get those two arguments together in a room so they could annihilate each other rather than to have to answer each separately But oh well And then this is so expensive. It will destroy all scientific research and suck all the money out of the NIH That's a usual argument that gets made in these cases the argument was made perhaps Most colorfully in this this nature by a technology article called the human cancer genome project One more misstep in the war on cancer that goes on to explain that this is a proposal to spend $12.5 billion on this because they multiplied 12,500 cancers by what they thought the cost of sequencing a genome was then Rather than what the report said which is we thought it would fall by more than a hundred fold But anyway these arguments are always fun at the time people get their bloods boiling and eventually it turns out that you start Down the road and things get sorted out and there you go So what present did this lead us to it led us to a present We're by 2006 a pilot project had got kicked gotten kicked off under the name the cancer genome Atlas That was the name that was eventually decided and this logo was eventually drawn and the project got off the ground in a very slow Rumbling kind of way Recognizing that we didn't really have the samples we thought we had and we didn't really quite know how to do the things And that's just fine for a project the early stages of a pilot project or to figure out all of the warts Involved in this stuff and admit them and so it got and then interest began to grow around the world Obviously the Sanger Center played a major role in cancer genome sequencing from the beginning But that was a project at the Sanger now Other countries whole countries began to get involved and say we want to get get involved in this cancer genome work And an international cancer genome consortium that now has you know more than a dozen dozen and a half countries involved in this work as for the cost of sequencing and this 12 billion dollar figure that was proposed In in this critical editorial well The prediction of the committee in 2004 sort of was born out if you look at that graph where 2004 is that's about a year or two before the line starts plummeting on a log scale And that was because the folks on the committee had a pretty good idea that this line was going to go down The only mistake that was made is it actually went down more than we thought it was going to go down We said we'd get a factor of a hundred within five years We probably got a factor more like a thousand and change in those five years But no apologies there. So it got cheaper. That's good. So what's it gotten us to right now? There are a lot of cancer genome papers from efforts and here. I'm thinking about Whole genome efforts or whole exome efforts, but whole something efforts That are either published or that I know to be in print and I do not pretend that this list is complete So if your cancer is up there that's published or not in print, don't yell at me Just tell me and I'll put it on the slide I did what I could in the past several days with the help of my colleagues to try to fill out a list here And I did a very rough count of how many samples were involved now many of these projects actually are smallish They might have you know in some cases a few samples or 10 samples that were looked at genome or exome wide Followed up with with discoveries followed up in larger sample collections Some of them involve a couple hundred samples all told I get about 600 samples are reported in these papers of either genomes or exomes But we know that there's an awful lot more because we sent out some emails to the larger centers and got people's Rough counts of how many whole genomes and how how many whole exomes and these are patient So this is times two because there's a tumor in a normal and my best count from the rough Numbers that came back were more than a thousand cancer genome pairs have been done and more than 9,000 cancer exome pairs have been done There's an awful lot of data sitting around here and coming into papers and going to be coming out So that's very exciting What's been the implications of this scientifically? Well Insights from genomic approaches have emerged the notion that we knew all cancer genes may have been Overstated we didn't really know all cancer genes. We didn't even know all classes of cancer genes a Lot more has been found out about protein and lipid Kinesis B rafts and pick three CAs and pick three are one the regulatory partner of pick three C.A. EGFR FGFR 2 jack 2 but new things lineage survival genes Transcription factors essential to the survival of certain lineages the first being MITF and melanoma nkx2-1 and lung sox9 sox2 Epigenomic regulators a long list of a growing list of epigenomic regulators Pointing us to the critical role of epigenomics in cancer metabolic enzymes the important discovery of IDH1 from the Hopkins group producing a neometabolite RNA Splicing factors emerging in recent years a whole list of RNA splicing factors important translocations I remember when translocations were a feature of blood cancers, but not solid tumors until that turned out to be false and the demonstrations of by a world Chenayan of Tempest to Urg Translocations these at its family translocations of prostate cancer this incredibly important out translocations in lung cancer Interesting surprises even within the past year a notch known to be a Anko gene in T cell ALL is now a tumor suppressor gene that is loss of function of notch Also causes cancer, but different kinds of cancer squamous cell cancers This is actually pretty important because gamma secretase inhibitors which inhibit the notch pathway are We're being tried in Alzheimer's disease and they were noticing patients were getting squamous skin cancers and There was some mouse connection This strongly suggests that you better watch out about inhibitors of the notch pathway and it won't be enough as has been suggested to just look for skin cancers because These are also clearly involved in head and neck cancer So you're gonna have to be monitoring patients on an on a any notch pathway inhibitor for head and neck cancers as well So all sorts of insights emerge and of course therapeutics are emerging BRAF a fantastic story of the past year EGFR, of course with EGFR inhibitors FGFR to pick three CA alch And then things on the list there that we know pharmaceutical companies are very actively involved many pharmaceutical companies have IDH one Inhibitors in in the works etc So a lot of work is being driven out of the pharmaceutical industry and I can think of no pharmaceutical company out there that works in cancer That is not being driven right now by cancer genomics This notwithstanding the fact that the New York Times likes to report that nothing much has come out of genomics as it did Therapeutically, one would say the entire therapeutic industry and the entire pharmaceutical industry with a relation to cancer is now completely revolving around cancer genomics But that's where we are That's not enough We got to do much more So what I want to turn to a bit is the future and if we have time I want to even throw it open to a broad discussion about the future a bit So what are the goals? What are the challenges ahead? I'm gonna try to sketch out a vision For the next five to ten years of what things have to be done Many of them I think can be done pretty soon within a couple years Some of them will take longer some of them we actually still have no idea how to do But I'll sketch it out and the goal is not to say this is the right vision The goal is to say we should all together be thinking about that vision We now finally have the TCGA humming. That's just the time to start throwing monkey wrenches in and saying okay But what else should we be thinking about? So let me start by breaking it up into two pieces One the goal of a comprehensive catalog of the cancer genome Structural information and the goal of a cancer therapeutic roadmap is what I'll call it functional information I'll talk mostly about number one, but then I'll turn to number two Under a comprehensive catalog of the cancer genome We need to know all driver genes in all types of cancers and we know we need to know how they correlate with clinical phenotype By driver gene, I mean any gene in which mutations can occur that propels the development of cancer Well How are we doing and what are the challenges? Here are the items that I want to touch on under the cancer genome catalog Our ability to interpret events. We're doing pretty good at finding events, but can we interpret events? Genes with significant mutation rates focal amplifications and deletions arm level events methylation events Translocations integrating different kinds of events non-genic targets germ line events. How we do it? Let me dwell on this first one Finding significantly mutated genes it is now possible to find mutations There's no doubt the sequencing is good the mutation callers have become good We can find mutations finding mutations is not the same thing is finding significantly mutated genes as We look at the projects that come out. There's this tale of genes that are important You know they're important genes in this tale here But if we got it right do we have the right genes being found? Are we finding too few genes that or too many genes? This incredibly important paper from the Johns Hopkins group on breast cancer where they looked at 11 breast cancers And then did a follow-up on 24 breast cancers contained. Well, I think scientifically very important work But statistically an erroneous conclusion, which was a calculation and inference of how many Significant genes there were it estimated that based on these 11 samples there really must be about 122 Significant genes and breast cancer and that had to do with delicate issues of statistics and It's a minor point, but it was an important early indicator that getting this right is very important How are we doing? I think we're not doing as well as we think and there's gonna be a talk by Michael Lawrence this afternoon at Three o'clock that will go into this in much more detail and I'm in a channel a bit of Michael Lawrence's point if I may hear We are not so good at getting significance just right But I think Mike has made a lot of progress on it the lists currently as produced have too many significant genes and yet They also not complete because we haven't gone far enough our tails are contaminated by stuff How do we know? Mike Lawrence and Gaddy took 457 lung cancer samples squamous plus adenocarcinoma and put them through the official standard Analytical pipeline now called fire hose assuming a uniform mutation rate of 10 mutations per megabase and now because the sample is so large They had the power to detect 843 Significant genes that should bother you. I don't think there should be 843 significant genes, but that's the official TCGA analysis as of today It also includes in those 843 genes 146 olfactory receptors. I am Personally dubious. I do not mean to be prejudiced against smell But I am dubious that 146 olfactory receptors contribute to cancer The cub and sushi protein which this TCGA reported is involved in ovarian cancer comes up again in lung cancer and frankly in almost every cancer seems fishy Rianadine receptors are very important to cardiac calcium channels seem fishy Titan largest protein in the human genome hundred thousand nucleotides there Comes up all the time The statistical folks know there's a problem here So you got to go back to first principles Defining significantly mutated genes you find the mutations you pile them up You see how many there are and you assign them a significant score You do a significant score by comparing what you see to what you expect based on the background model pretty easy What's the background model? Well, the simplest background model would be a constant mutation rate across the whole genome Every site every gene and across every patient. It's easy to write down the math very simple program and There you go and you can look at the distribution of how many things you'll expect per gene or per sets of genes across Patients you'll get some nice plus on distribution You'll set your threshold in order to have a genome-wide significance level and a little green line tells you anything above that is significant But what if in fact it's heterogeneous What if some genes are mutated at a higher rate and some are mutated to lower rate? The tail is fatter. That's all you really have to know if you have the mixture of two distributions The tail is fatter than you expect if three quarters of genes are at two per megabase and a quarter at six per megabase And you'd put the green line in the same place the tails to fat You'll declare way too many significant genes. So is there non-uniformity? Is there heterogeneity? Yes, the data are now in from the TCGA when we look and own by the way a Symptom of this when you have the wrong model the problem gets worse and worse and worse the more data You have eventually lots of things become significant. So cancer types show very distinctive mutation rates across 3000 samples now That we have in our database largely these samples done at Broad as well as some other publish samples here. We've got Very different mutation rights. They vary 20 fold in their means across cancer types and Individual tumors from one type to another here can differ by a thousand fold in their mutation rates That's a big deal the melanomas and lung cancers a lot things associated with lots of mutagens or over there on the right and and such Not only they have different rates They have different patterns those pretty colors down at the bottom indicate the frequencies mutational spectrum of Different cancers and they're different if you make a radial plot where you plot the distance from the center is the mutation rate And the angle reflects Some aspect of the mutational distribution you get this really pretty picture where the lungs are over there and the The melanomas are down there and the GBMs are up there in terms of having very different specter I actually think it's really interesting that the lung has these two stripes there Lung might have two meaningful subtypes there that have somewhat different Processes and we ought to understand what they are Melanoma has two stripes very interesting head and neck cancer that has HPV involvement and cervical cancer that has HPV Involvement is clustered up there with bladder cancer Suggesting perhaps that too has a viral involvement in many or most of its cases You can read fun things off these pictures, but for the moment what I want to read is heterogeneity So there's another thing Not only are the sites different the genes themselves have inherently different mutation rates one reason is Genes that are more highly expressed have lower mutation rates due to transcription coupled repair And the simple bar chart tells you that and it's a non-trivial difference here That's like a 1.3 1.4 fold difference which matters between the most expressed and the least expressed But that alone is not enough if you take that out of the picture You still see based on whole genome samples and cancer tremendous regional differences in mutation rates What's going on? We now have enough data to see that they differ Well, here's Shamil Sunayev suggested That we should look to the people who are doing germline human genetics the Thousand Genomes Project They too have been noticing funny differences in mutation rates across the genome and with regard to the germline and germline cells Not somatic mutations. They found that there was a significant correlation with replication timing late replicating regions in the germline Have higher mutation rates than early replicating mutations You can tell yourself the story that maybe the pools of nucleotides are used up or pools of some factors are used up Well, is that true for cancer somatic mutation as well and the answer is let's draw in the replication time in black Yes, looks pretty good and below we've got going from blue to red early to late replication of the mutation rate is rising Does it matter? Yes Here's a region very high mutation rate the black line goes up now by the way these replication time measurements saturate At a certain point that little black thing over CSM D3 Should be going up probably you can project that it's going up and this is just a saturated assay for replication time What is that CSM D3? That's the cub and sushi domain protein that we've been finding in all these cancers there It's it's in a region of very late replication and its mutation rate is not surprising given that Especially when you recognize that it's truncated at the top What about those olfactory receptors? There's a region of 16 olfactory receptors there on chromosome one high mutation rate locally in the surrounding DNA non-genic DNA and late replication time moreover if you just make a histogram of early versus late Reploitation time olfactory receptors as a whole are fashionably late in the genome and so there you go So now you could start correcting for heterogeneity across sites heterogeneity across genes and regions and heterogeneity across patient samples and when you do that What Mike Lawrence and Gaddy find or that the before depending on where you set your FDR level You had 588 genes and now there's 61 you had 108 olfactory receptors now. There's three if You use a harder nose cut off there you started with 261 you get out to 18 No olfactory receptors you do this across a bunch of cancers The olfactory receptors are your canaries in the mind they go away You feel good when your olfactory receptors go away there on your lists and I think there's obviously more work to do because I have reasons to think it's not perfect yet, but this is a really important thing in the end We need to know the background rate We need to use our data to bootstrap up the background rate across the genome in order to call the significance correctly Michael talk about this more and you can talk to Gaddy about this There's just one example of the care that has to go into getting this stuff right now focal amplifications and deletions In each of these cancers using either sequencing or arrays or something like that you can look at the focal amplifications the focal deletions across the genome and You often find something like 10% of the genome in a typical cancer is engaged in either focal amplifications or focal deletions In the ovarian cancer paper for TCGA the high-grade cirrus ovarian There were 63 significant deletions 50 significant amplifications the global cancer map with 3,000 samples of a paper that was published a little while ago 26 different types you see things like this How are we really doing well the amplifications? I think are mostly right The challenge with the amplifications are finding the driver gene in the amplification. That's a question of numbers The more and more samples you get the more you should be able to home in on and that's beginning to happen But we need large numbers to know which gene in the amplification There may be functional ways to complement it as well with RNA I if you have the right cell lines and we'll come back to that a bit The deletions still have some real issues attached to them Because we're how are we counting whether the deletion is significant to what depends on what the background deletion rate is in cancer cells We have no way of measuring the background deletion rate It's the foreground and the background there is no null model for what deletion should be happening And so it's not obvious we can say this is significant relative to other regions of the genome But if a region was deletion prone but not cancer related just deletion prone We could make a mistake and so has been pointed out by Mike Stratton and folks at the Sanger You should feel much better if you see homozygous deletion because then you had two events If you see a significant rate of that or a deletion in trans to a mutation We have to get a little better. I think at figuring out what the right gene is and which of these deletions truly are a significant I think many of them will be but there's work to be done arm level events Almost the entire genome is either Significantly gained or significantly lost in some cancer. There are few chromosomes that don't do much but for the most part something like for an average cancer 25% of the genome is engaged in an arm level amplification two copies or deletion of a copy or two We can document all of that it's easy it's beautiful It's also almost entirely cryptic For the most part we should be honest and say we haven't a clue what the driver gene is Why does the cell want to duplicate that is it even one this it could be that there are three things that it Wants two copies of and the best way to do it is duplicate the arm. We don't know and we don't have good tools There are some suggestions that you know cross out seven that the amplification of seven is not necessarily driven by eGFR But maybe by met there's some other but we don't have any systematic ways We need to figure out the arm level events because they're 25% of the chromosomes involved in these things Methylation has been triumphant as part of TCGA There's just gorgeous ability to study methylation of promoters across the genome and Beautiful work such as this one here in glioma of identifying subgroups that have Methylation distinct CPG Island methylator phenotypes that define distinct subgroups of glioma beautiful beautiful work And yet We have no way to find what the driver gene is in there. There's a lot of things that have been methylated But we don't have an easy way to say Who matters the methylation is often going to be all genes with a given chromatin structure get methylated in a certain circumstance Hundred things may get methylated one of them or four of them are the critical targets We have to start being getting to document What are all of the methylated? Targets that matter now they're important ideas here. There's been interesting progress here But systematic approaches to figure out which methylated targets truly matter in a in a meaningful way We have far to go Translocations there are these important Translocations that have been found of course and blood cancer is known for a long time BCR able of course the ERG Translocations the out translocations lots of cool biology amazing progress has been emerging from the TCGA glioblastoma these there's an example this beautiful Chromothripsis a chromosome going through the shredder all at once in some catastrophic event When prostate cancer a beautiful example of a daisy chain of eight successive events of A's attached to B B's attached to C C's attached D and back up to the front involving multiple cancer genes as if and surely it means that these events occurred simultaneously so there are Catastrophes that are of different sorts that are happening There's even been some very nice demonstrations that they affect the 3d Struct that they are related to the 3d structure of the genome as people begin to build 3d maps of the genome things that get broken and rearranged next to each other can be shown to be Physically in 3d space on average much closer to each other and the distributions of lengths of somatic copy number alterations that you see in cancer match pretty well with the contact distances that you see in these 3d maps but We're nowhere near done with enumerating the translocations because we don't have that much whole genomic data and Because translocation rates and patterns vary across cancer types So we need to deal with the significance of these events and we need a lot more data to reach completeness They're very important, but we need more data Putting together these events can be very powerful in the TCGA ovarian paper It's noted that BRCA can be a BRCA one or two can be inactivated either germline or somatically But you don't see both that's kind of nice the the mutual Exclusivity is a very helpful clue the fact that they cover more of the space or by epigenetic silencing Now here you feel good that that epigenetics is directed at BRCA But it's these other low side that is getting methylated You know what to say about beautiful work of Leslie cope in lung has shown this picture here that Leslie has given me permission to show of CDN CDK N2a Where you have samples that are methylated mutated or deleted and you kind of have a mutually exclusive Distribution that covers essentially everything you feel really good then does it look how often do things look like that? So this integrating events We need a lot of data to get this right and this will help us see the Significance of many events when we see multiple independent ways of hitting them in a systematic fashion non-genic targets We still are focusing on the coding region so that we're sequencing the whole genome We can't really point to an awful lot of non-genic space that we can say has been importantly mutated in cancer Oh, there are a couple places in whole genome papers We say you know there's a statistically significant pile up of a few more mutations than you'd expect in some non-genic region some non-coding region But it's hard to know what to make of those things unless you for example in that little region under the question mark No, that was an important element. That's really tough to make a significant call around these sorts of things and You know we talk a lot about the whole genome But we really have very little to say about the whole rest of the genome in cancer We need to admit it and figure out ways to do it microbes well The idea that viruses cause cancer is not entirely wrong of course viruses do cause some cancers just not most cancers and The list is not done yet. There's beautiful paper from fung et al showing the presence of a new kind of polyomavirus and Merkel cell carcinoma is a beautiful example of that that's come through a whole genome sequence Oh transcriptome sequencing in this case Matt Meyerson's group has demonstrated the presence of a particular class of bacteria Fusarium in colon cancer whether this is a cause or an effect remains to be demonstrated And that's really a critical thing is one can find correlation, but causation is more of a challenge here But once you know it you can at least begin to look germline mutations Well, we know germline mutations that cause cancer BRCA 1 p53 leave leave from any syndrome But you know there's more heritability for cancer than is accounted for by the handful of genes we know You know two sisters one gets breast cancer the chance the other gets breast cancer is considerably higher than we can account for What's going on common variants of small effects rare variants of large effects genome-wide association studies have been pointing to low Side that are involved. Those are very interesting Rare variant studies. Well, we got to do them so far. There's remarkably little data We could be mining our TCGA data for rare variants, but we have to get serious about the numbers involved It takes Goodly numbers if you want to detect rare variants that are present collectively the sum of the rare variants are present often at a frequency of about 1% and if you want ones that say Increase a cancer risk by five fold you need about three four thousand tumors to do something like that No, I'm sorry. I'm misreading the graph here 1500 tumors to do something like that That's not impossible to do but we have to get to those numbers to be able to see it and be able to record That would be to recognize a totally new gene We weren't expecting of course if we just want to do it in the three or four hundred already existing cancer genes And we know that that gene is already mutated in that cancer type Semantically then it becomes statistically much more favorable to be looking and we should be doing that kind of looking So we ought to be mining TCGA more for the germline information about cancer because that could be of important predictive value for Patients it is certainly for early screening. So we just need sample size to do that So that's kind of our tick list. We're doing great tremendous amount of data has been generated. We can describe events We can't fully interpret those events We have to get better at it and we have to begin to integrate across tumor types Not just integrate across samples but begin to pile together as I know is a theme of this meeting all the tumor types and say What emerges when we look across many tumors? Important well a bottom line here is we have barely begun. There's a lot more that's got to be done to accomplish these goals. I Anticipate that pretty soon people will start saying oh, we've sequenced the cancer genome. We're done with that. We should move on We have not yet begun in this in terms of what we really need to extract from it And so I suggest one holds the line against these early declarations of complete victory on things How are we going to achieve completeness? Well Sample size Fact one the original report from the NCI called for 250 samples per tumor type based on a calculation of an expected background mutation rate We now know that background mutation rate is wrong for many cancers redo the calculation You need more samples. It's dependent on the mutation rate It's probably gonna be the case that for many of these tumor types. We need several thousand camp Samples in order to get the desired sensitivity. That's not terrible. It doesn't scare us anymore to do that Sequencing depth. There are some pretty unfavorable cases only five percent of the reads from pancreatic tumors come from the pancreatic Cancer cells as opposed from the surrounding cells. Oh, well, one can probably brute-force that scope so far TCGA has focused on the large Large tumors primary tumors surgically treatable tumors pretreatment tumors We focus on surgically treatable tumors because if the tumor is not surgically treatable. There's a problem with the ethics of Biopsying that tumor people say maybe there's a problem fundamentally with the reimbursement of biopsying that tumor Whatever it is. We have to get past that and understand The the wide range of tumors that are out there. We need to understand the intra tumor diversity diversity within a given tumor Different cells within that tumor. We have to understand non-resectible cancers Metastases and I know work is going on to do that. We have to think systematically about it We need to understand pre cancer polyps for example more and more feasible is the requirements for for nucleic acids are dropping And we ought to be accessing circulating tumor cells. These are all part of our future I don't think there's any controversial here. We just have to do it If we're gonna correlate this with clinical information though, we're gonna have to take another step We're gonna need to create something which I'll call a global cancer alliance a shared knowledge base where patients around the world Can choose to contribute to their genomic data and their clinical data so it can be Used to improve the understanding and treatment of cancer Ideally we're gonna be having drugs coming out over the next decade many many different types of drugs emerging being used in different Kinds of combinations. How are we gonna capture that knowledge? We're gonna need the cancer genome We're gonna need clinical records and you can say as much as you want about the problems with this the Privacy challenges and the computing chat But go explain to our children why we didn't do this because we couldn't figure out how to do this and why people couldn't consent The vast majority of cancer patients when given a chance to make their experience even more Meaningful by making a gift to future people who might face cancer will just say yes We ought to start from that for that proposition that if most people would every one of us in this room Would say yes we and so if everybody would say yes There's a problem with us ringing our hands about the fact that everybody will be worried about that They're mostly gonna be worried about the fact that they have cancer and somebody else in the future might have cancer We need to think about how to do this Now that's the comprehensive catalog of cancer I'll be much more brief on this question of a cancer therapeutic roadmap and then just throw it open For a cancer therapeutic roadmap we have to go to function We need to be able to recognize the functional pathways and the mechanisms in which targets act We need to know for every cancer as a function of its genome what its vulnerabilities are We need to know as a function of its genome for any given treatment You're gonna give it what ways it has to escape that treatment the ways it's gonna become resistant That is what I mean by cancer therapeutic roadmap. We know the pathways We know the vulnerabilities and we know the mechanisms of resistance in advance so we can design therapy To be maximally likely to succeed By cancer therapeutic roadmap then Some genes we know what they do they should be put in pathways, but genes are emerging for example the multiple myeloma study FAM 46C I didn't know what FAM 46C was it's called FAM 46C because it's in family 46 number C letter C in that family its function wasn't known But you can take the expression pattern of FAM 46C and compare it to many other cells In particular to many multiple myeloma samples and darned if it isn't perfectly correlated with ribosomal proteins I mean perfectly correlated with ribosomal proteins. It's involved in ribosomal proteins protein trends in the translation of proteins Etc. We need the systematic knowledge to rapidly attach function We also need to know where to intervene the mutated gene may not be the best target Especially if it's a loss of function Inhibiting something that's lost its function is not really a smart idea Even inhibiting something that has gained the function when we don't know how to inhibit it may not be the best place We need to plan vulnerabilities So I just mentioned projects that I know these are my favorite projects But there should be many of these things Something that Todd Gallup and Justin Lamb at the Brode have built the connectivity map which is kind of a Google for biological function You want to connect up the effects of diseases and genes and drugs No way to do it is put them in a common language But say gene expression for any given gene or drug just get the expression pattern and Put it in a big compendium and then when you have a new gene gene a and you knock it out You see what the expression consequences are if knocking it out look it up and say golly knocking out gene a looks a lot Like knocking out gene C and drug and applying drug f That's good, and you can do sorry and you can do that if you treat a rat with estrogen one of the early examples They show and you look at gene expression in the uterus and you look it up The same signature comes up of treating cells with estrogen And if you apply minus one to that signature the anti signature comes up with Tamoxifins and selective estrogen receptor modulator so you can look it up if you didn't know you could find it Many examples have been shown now where you can look stuff up So what they wanted to do is expand it in geo the gene expression omnibus They're about 300,000 expression profiles, but they're scattered all over the place many many different experiments They're not they're not comprehensive They were expensive to generate what they wanted to do was simply generate one big compendium of knocking out genes You know in many different cell types studying the expression pattern and because spending a couple hundred bucks on an AFI chip for every Expression is kind of pricey they worked out a way to do it for three bucks that basically involves You don't need to look at all 20,000 genes a thousand genes will serve as a proxy You can impute the rest you can do the thousand genes by using beads with appropriate tags in the right way and Anyway, they made it work. They so far have 300,000 profiles that they themselves have generated here With 4,000 different compounds 2,000 genes that have been knocked out or added back across 10 different cell lines This has got to grow and this is like Google the more data that's in there the more powerful it gets It's a network effect the more we populate it the more conclusions get drawn The NIH has come to support these projects under the matter of links, which I forget what it stands for But anyway, it's links the library of something anyway These kinds of things are good. You can use them to test function in folks who are studying inherited disease like Crohn's disease They find GWAS hits in gene 80 g 16 l 1 you look at the perturbation of that gene It's related to the perturbation of another gene. They're this other guy's involved in autophagy Therefore this guy's involved in autophagy and it turns out it is involved in autophagy You get a whole bunch of hits in a study like this Crohn's GWAS you look at all of them you say all right I know what they do but these seven all produce very similar signatures when you perturb them That's a good thing and onward these kinds of tools are the things we need we need them not just for RNA We need them for what are the epigenomic consequences of doing things? What are the proteomic consequences of doing things? What are the metabolomic consequences of doing things? It is Google for biological function We also need Google for cancer function in particular So there's a project that that a number of my colleagues Levi Garroway and others have going called the cancer cell line encyclopedia Being supported as a public effort by between Broad and Novartis I'll give that as an example and again there are other projects like this and they all should go forward Basically take a thousand cancer cell lines Study their complete genome and study their properties in a wide variety of assays start looking stuff up thousand cancer genomes From sorry thousand cancer cell lines here. You need to grow cell lines a thousand cancer cell lines know their genomes You know all the aberrations Knock out every gene with RNA I in a pooled fashion Which RNA is make which cell lines drop dead and then you can start doing Google You can say aha cancers with amplification in gene a are selectively killed by inhibition of gene D Cancers with mutations in gene B are selectively killed by drug E and Many such things are emerging out of this study bill Han had a beautiful paper on systematic Investigation of genomic vulnerabilities and early reports from something called a project Achilles Resistance catalogs take a thousand cancer cell lines Take treatments that you might want to do BRAF inhibition for example and ask how could I overcome BRAF inhibition? Well, I could go to the clinic and try it on a patient, but why not do it in a dish first? What genes when you add them back or knock them out will overcome BRAF inhibition? What stromal cells will overcome BRAF inhibition? Maybe they're giving an agent and again you can find those questions these kinds of systematic Functional databases will empower a community to go beyond the descriptive aspects of TCGA and Levi-Garrow is a beautiful example here where with specifically with regards to BRAF inhibition. He was able to show that cut amplification the turns Confers resistance to RAF inhibition and indeed then after seeing this in cell lines demonstrated It's also true in patients that is a mechanism that's used in patients So all right the goals comprehensive catalog of cancer genome. That's pretty much the same goal. It was laid down some time ago It's just we have to get serious about completing that goal across all the different types Truly understanding those events and a cancer therapeutic roadmap this functional roadmap now All of this is the kind of large-scale comprehensive views This is also going to take an entire community of cancer investigators around the world around the world to truly translate into therapy so The goal of projects like this are to make sure that all the data all the tools all the computational tools are out there But it's the goal of the whole world to use this information and to use it to drive cancer So drive drive cancer treatment. So the data has to be open and it has to be usable Very important. It's not enough to say it does exist on the web. Good luck You really have to make it usable and we have to engage an entire community in doing it Anyway, I was asked to give some kind of an overview That was some kind of an overview I would like to acknowledge the fact that this is not a talk obviously from my group or from the road or anything like that This is a talk of you know, what we as a TCGA community have been doing what we as an ICGC Community have been doing and it's a remarkable thing what we have been doing to get to this point to think that From those early days when we imagine what might be possible to see where we've gotten is just stunning I do want to make special acknowledgements to a couple of my colleagues at the road who helped in thinking about this talk I'm channeling ideas and things from a number of people Gaddy gets and Mike Lawrence and Todd Dallab and Stacy Gabriel and Levi Garrow and Matt Meyers and Bill Han Justin Lamb when the chin has been a member of the road community and still is even though She's in Texas right now in our sequencing platform. I shall stop there. Thanks I think we have time for some questions And I I like to start Overview talk Thinking about the next steps the cancer therapeutic roadmap that you challenge put on the table Do you see? Value Thinking about mirroring what we're doing in the cancer genomic area Do you see value and approaches to corn may and centralize such an effort to? Minimize redundancy and accelerate Go the progress in that and and if you were to do it how How would you do it? And then what do we have to do to get that going? So I'm always suspicious of words like centralized The TCGA is strong because it's not exactly centralized wrong words Sorry, but no no, but I wanted to say was to coordinate is good and mostly what it is is taken on at the necessary scale if every Investigator in the world takes their favorite gene and knocks it out with RNAi and their favorite cell type and we attempt to aggregate those data They will be useless because there will be no quality control on them We won't have all the stuff filled in so there's always a right scale for doing any effort It has to be a scale that's going to be able to get High quality data that people can count on like we have our sequence has to be good sequence collectively It's got to be able to be comprehensive enough So I don't want to centralize the approach But I do think we need significant scale efforts in places that are willing to do heavy lifting and we need to coordinate those and The things that ought to come out of them are what the community needs many of these ideas start off as pilot projects You know what what might be useful and they've got to prove themselves as useful I've tossed out it could be useful to have proteomic signatures and you can say oh It's infinitely expensive on the other hand there are cheap methods now coming along to make anti-phosphor proteome Antibodies that are panphosphor proteome that you can fly in a mass spec and maybe you can do that cheaply So there's always a kind of goal a game of is it pretty cheap to get that data if it costs a buck to get the data Maybe so yes We should have a discussion about creating a cancer therapeutic roadmap vaguely along the lines I've described I do not mean to be didactic that what I put up there is what it should be But it should not be the case that every investigator out there must discover all those facts him or herself We collectively should be responsible for making sure that any of those facts are Collected done at high quality are openly freely available Hi, you touched on this, but maybe you could expand further on the topic of integrating Let's get at systematically integrating non-coding effects Into interpreting cancer genomes is there anything going on as part of the TCGA? At the moment, I think it consists of smart people at different centers trying to figure out how we're going to take these non-coding regions. There's nothing I Can't say anything vaguely intelligent other than correct for the background mutation rate and look for an excess and The problem is many of these non-coding regions are small And so you're not going to get an excess in a small little region We get excesses often across a large coding region So maybe you have to aggregate some of them Maybe you have to take all the regulatory regions for a gene and that makes a big enough target that you can detect it That would be good if we knew what all the regulatory regions for a gene were so That's the challenge right now and I'd be very happy to see three great examples And that would teach us a lot. So I put it out as a challenge to the students I don't actually have great examples Well, I'm you know, I'm a co-author on a paper that points to a region that does actually reach statistical significance But that isn't the same thing as actually being deeply meaningful Well, there is some effort as part of encode, but it seems like this could be expanded quite a bit Oh encode is trying to find important regions across the genome But integrating that with cancer mutation data and figure out which one should be it I don't mean to say that people aren't aware that there are such regions and they're being well characterized But encode and TCGA do not yet have a fruitful love child One more question. I Won't like to ask about the heterogeneity of disease and how we can approach that through targeted therapy If you just take as a number say 25, let's say a certain carcinoma may have 25 possible mutations Accom any combination of three of which would create a tumor 25 choose three without replacement gives you 2300 possible combinations of three yet when we treat patients we start with a single targeted therapy and wait for it to fail When are we going to address this problem of combinatorics and multiple targeted therapies? So I'll offer my personal opinion, which is the only way and I don't think this is so controversial The only way we're going to effectively treat cancer is not serially But in combination at the same time the idea of waiting to use one agent Using one agent and waiting till it fails and using the next agent and fails in the next agent is stupid It comes back from old days of agents that just are Incredibly toxic and you can't combine things but in point of fact the answer is clear from HIV In HIV we do a triple drug therapy every each one of those drugs will fail if you use them in cereal They fail if you use them in parallel you multiply the probability P is the probability of escape from the first drug P cubed is the probability of the escape from all three drugs P cubed you win because it's a very small number P. You don't win the answer for us in cancer is What exponent do we have to raise P to to exceed the number of cells that are there now? I'm hoping it's not going to be a gazillion different drugs because it'll be a certain number of pathways You're going to figure out where to inhibit that pathway the point about finding these multiple points of inhibition by these project Achilles methods is that if we can make targeted agents with big therapeutic windows safe agents without lots of side effects The right cancer therapy it better be use them all together now It raises a lot of questions drug companies don't like to develop three things at the same time the FDA is still getting its mind Wrapped around how to run Approval for two or three things simultaneously. They'd rather approve each one separately, but you know, we're seeing it B Raph B Raph is the B Raph inhibitors miracle drug except it fails within a year now people are thinking about what they're going to add To it they're not going to run that serially. They're eventually going to run that in parallel we the whole point of My plea for systematic approaches is precisely to address your question when we have a cancer therapeutic road map That says here are multiple places to interdict and here are the ways cancer will get around it by amplifying that Let's be prepared to be able to cut that off That is when we will have really rational cancer therapy We're lucky in some cases with Glee vac and we're lucky in some cases with other things The goal of TCGA and its successors of creating a cancer therapeutic roadmap is not to have to be lucky Anyway, that's some thoughts