 All right. Hi. I'm Dan Kastner. I'm the scientific director of the National Human Genome Research Institute. And it's my enormous pleasure to welcome you this afternoon to the Jeffrey M. Trent lecture. Jeff Trent, as many of you know, was the first scientific director of the NHGRI. And he was, of course, the founding scientific director starting in 1993. And really, Jeff, over the course of a nine-year span, built the intramural research program of the NHGRI into, I think, a real powerhouse of human genetics and genomics. Jeff, of course, went on after his very successful scientific directorship to be the founder of TGEN, the Translational Genomics Research Foundation in Phoenix, Arizona. Jeff couldn't be with us here today, but is still very actively involved in his chosen field of cancer genetics. And in 2003, the Jeffrey M. Trent lecture in cancer research was initiated by the intramural program of the NHGRI. In order to invite leading figures in the world of cancer genetics who would embody some of the same ideals of energy and enthusiasm and imagination and really raw intelligence to the world of cancer genetics. And so it's my enormous pleasure to introduce this year's Jeffrey M. Trent lecture. And that is Dr. Steven Chanick, who is the director of the Division of Cancer Epidemiology and Genetics of the National Cancer Institute. And Steve is truly someone who falls into that mold of the Jeffrey M. Trent lectureship. Steve is someone who has been, for a number of years, a leading figure in the study of susceptibility loci for human cancers, who has been a leader of the efforts of the Cancer Institute, and is also, I would say, a Renaissance man and a real mensch as well. So Dr. Chanick got his undergraduate degree actually at Princeton University in 1978. And his AB was actually in music. And I understand that he may be breaking into song at some point in the course of his presentation. In any case, he went on to Harvard Medical School and got his doctorate in medicine in 1983, and then went on to do his further clinical training in Boston, in pediatrics and in pediatric infectious diseases, and in pediatric hematology oncology. And he did research at Boston Children's Hospital in the Dana-Farber Institute with the legendary and still active Stuart Orkin, studying molecular biology and human genetics really at its inception of the modern era. So Dr. Chanick then came to the NIH in 1991 as a medical staff fellow or senior staff fellow, I guess it was, in the pediatric oncology branch. He went on to become tenure track, and in 2001 received tenure. And over the course of the subsequent years, he's advanced in terms of his career in the National Cancer Institute, and in 2013, August of 2013, he was named to his current position as the director of the DCEG. Really he has, as I mentioned, done enormously in terms of his science and in terms of his administrative things, but I think something that really stands out in my mind is something that's a real measure of the man. And that is that since 1995, he has been the medical director of Camp Fantastic, which is a camp for children with various pediatric cancers that's held every summer for a week, and it's run by the National Cancer Institute and Special Love Incorporated. And so anyway, I think that that's just a telling sign of just the complete man, Dr. Steve Chanick. So Steve, take it away. Thank you very much for that lovely introduction. I just want to assure you I will not break into song, I do not play the guitar, and I will leave it at that. It is really an honor to be recipient of this lectureship named in honor of Jeff Trent, someone I know very well, who is a very dynamic and lively character who worked with on and off over the years, and energy and passion are two words that certainly come to mind when thinking about Jeff and his vision for where and how cancer research should go forward have always been infused with those two, I think, very important qualities, along with the scientific rigor. And I think, you know, I'm sorry that he's not here, but I, you know, I have corresponded with him quite a bit by email in the last few days, and he was very happy to know that someone who was interested in the germline, and primarily, primarily the germline would be speaking, because that's a part of Jeff's portfolio, and today I will speak almost exclusively about the germline, but I think it's in the context of understanding how the germline informs our understanding of the different kinds of cancer that we encounter. So let me start with the question of heritability in cancer. We know that, going back to 1866, that Paul de Broca, the famous neurobiologist, had observed heritability based in his own family with breast cancer, with a number of women in his family's sisters, aunts, mothers, and the like, and had actually published and described this familial clustering, and then in the interim there were legions of studies of twins, families, and siblings studies that began to really assess what would be the risk if you had one particular cancer in a family that another family member would have that, and that's still an important bulk work, I think, of how and in what way we prosecute the question of the genetic basis of cancer. In 1969, Joe Framene, with Fred Lee, observed the familial clustering of multiple cancers in families, not just one cancer, but a number of them, and it was subsequently identified that the mutations in the TP53 gene are responsible for a high fraction of the Lee Framene patients, and they were ascertained through these familial studies, and I'm going to come back to this question of looking at Lee Framene-like mutations in the population, particularly in osteosarcoma, and then Al Knutzen postulated two-hit hypothesis for retinoblastoma, really a central tenet, I think, of how and what way we look at germline genetics, thinking of the diploid as the model, but as I'll talk a little bit later, the genome does come apart, and we certainly know that there are many different ways in which copy number states can vary. And then finally, the chase using the technologies of the 90s, the 80s and 90s led to the first positional cloning of the familial breast cancer gene in 91, and then subsequently by 94, it was described as BRCA1. This is in the background of looking at heritability from an epidemiologic point of view, where twin registries were really quite valuable, or have been quite valuable, and so this is an important table that I will come back to when I show you the current status of how we're looking at the genome-wide association studies to be able to explain a fraction of what we think would be the heritability. But the key issue here is looking at the heritable factors for prostate, colon, bladder, breast, and lung. Five of the major cancers that we face in this issue of shared versus non-shared is an important question, which really underlines, we think, the importance of the germline genetic susceptibility in the context of the different kinds of exposures, and I will come back to that. So at this point, I would say, why do we study germline susceptibility? Well, we can try and explain the heritability of cancer. We certainly know about it clustering in families and distinct populations. But now, with the genome project behind us in the annotation of HAPMAP and knowing what genetic variation looks like in the common, we now have the tools to begin to really ask the question, and sporadic cancers, which represent probably 90, 92, 93%, depending on your definition, how can we explain genetic predisposition? We know that within families, there are increased risks for breast and colon, you know, for one and a half to two-fold increase just in the general population. But I think the ability to look at genetic susceptibility is really crucial to begin to try and pull apart the many, many things that are contributing to genetic susceptibility. And then, of course, the value of this in using it for risk assessment for individuals, which I would say we are very far away from being able to do other than the familial, and I think we have to be very careful not to oversell precision prevention, precision medicine at this time with respect to predicting individuals' specific risks for cancer. That's a place where we all want to go, but we have a lot of distance to traverse. But I think the population-based screening issue becomes very important in how we use the information that we have now and that we're about to have in front of us to begin to think about stratification that may have real public health implications for using screening trials, screening tools, and the like. And I think genetic variation is helpful for that. We get tremendous insights into the etiology of cancer, the opportunity to look at gene and environment interactions, and particularly, as I mentioned in the beginning, how the germline informs somatic alterations. And then, of course, everyone is excited about pharmacogenomics, but this is a very difficult thing to pursue. And we're sort of at odds with most of the industry because it's not in their best interest for us to identify the 30 percent of the women who should get herceptin and exclude the 70 percent from being prescribed that drug. So this is something that sort of at a cultural level as well as a scientific level is really lagging behind. There are some very exciting examples, but I'm not going to really focus my talk on that today. I do want to start my talk and really separate the spaces. When we think about cancer genetics, we really have at least four different spaces as demarcated here. One is the germline, which is where I'm going to spend most of my time talking, but we also have the somatic. Those are the alterations and the actual tumors that we see in the NCIS-like sequencing from TCGA, where we see all of the large-scale events that have taken place. We know that we're heavily in the range of discovery, but with very little clinical action at this time, and I think we have to be very careful in not overselling what we can do with particularly the germline information other than in very select circumstances. I think we all want to get to the next stage, but we still have quite a ways to go. So when we think about the germline, this is sort of the outline of what I'm going to talk about today, we know of at least 110 to maybe 115 cancer syndromes, where we know that there is a very important mutation in the germline that explains the familial clustering of cancers in a family or sets of families. And these are very important in giving us very good insights into cancer biology and cancer drivers. And I'll come back to that. Through the genome-wide association studies, which I'll talk about, we know that there's some 470 regions and there are literally thousands more to be discovered as we supersize, and I try and make the argument for why we should continue doing that. We know that the somatic, when you look at the TCGA and the ICGC, the cancer genome atlas, the NHGRI and NCI, joint effort has been a resounding success in beginning to give a portrait of the landscape of genetic alterations. And we see that there are these drivers, those things that we think are very important, and we use both frequency and biologic investigation in the laboratory, but we also recognize that there is heterogeneity and metastases that are real challenges. Now, if we go to the clinically actionable of all the things that we see on the upper left hand corner, only a small fraction, can we really go into the clinic and advise or talk to someone in what we think is a really, you know, a really sustainable and supportable position? In the same way, only a small number of agents have really come to market. It's probably larger than this. This is an example of the targeted therapy where I think the Precision Medicine Initiative that Dr. Varmas certainly has been talking about is a very important one in being able to target the alterations in the tumors that we would be able to actually intercede in either stop or slow down the growth of those particular cancers. And we do this in the context of looking at TCGA where we've had extraordinary lessons that have come from looking up and lining up all the different mutations and seeing the spectrum that we can see literally at a four order of magnitude difference in the number of mutations and then the types of mutations. How many are real drivers versus how many are passengers? And this is something that's an important element that's come out of the somatic sequencing that I'll come back to in the germline in a minute. So for instance, if we look at lung adenose CA, lung adenocarcinoma, one of the most common cancers, heavily driven by smoking, the attributable risk is somewhere depending on who you're talking to, 75 to 85 percent for this particular cancer. And we can explain a fair number of the cases that we see having these mutations and genes that are quite disruptive, that we understand something about the biology or we see the frequency thereof tells us that this is an important event. And we have a number of agents that are very important that could be used and that are going into clinical trials right now that are very exciting for lung adenose CA, but this is probably further ahead than just about any of the other cancers. When we then look hard and ask the question next of how and in what way does this really line up with what we understand with the genetic architecture, I want to take a step back and we're going to first talk about this space here, the rare alleles that are causing Mendelian diseases or familial clusterings of cancers, the BRCA's, the TP53's, the patch and the like. And currently at this time we know of about 115 and they're scattered all across the genome and the interesting thing is that they're almost all ascertained families. They're rare mutations with very strong effects from an evolutionary point of view. It's very hard to sustain them in a population. There's a lot of very interesting population genetics. There's a whole other lecture on the BRCA. We found our mutations that we certainly see. And then we certainly can see oncogenes and tumor suppressor genes. So those are important with respect to the kind of sort of classical genetic models of whether you're looking at an autosomal dominant or autosomal recessive. But we certainly can see these in these 115 genes or so that have been identified. And if we look here at a classic BRCA1 pedigree, we know that in most cases, having a BRCA mutation is not necessarily 100% likelihood that that woman is going to develop breast or ovarian cancer. Depending on the specific mutation, this is a very important point about where in the gene the mutation takes place. And then there are 24,990 other genes, not to mention the environmental exposure. So, you know, there are large consortia that are identifying what are the important clinical modifiers as well as biologic modifiers of BRCA1 and 2. And I think that this is a very exciting thing that's going forward and we're just beginning to identify maybe the first 10 or 12 regions of the genome that are interacting with BRCA1 that may be very important in contributing to the risk for breast cancer. But the question is then, what type of breast cancer? Because we know that there's heterogeneity, there are different types of breast cancers that do track with the BRCA1 and 2, but not perfectly. Again, this notion of genetic determinism, we have to get past. It's a more complex scenario. And I think that further we get into this, the more we recognize these factors that we really need to be able to identify and put together these critical sort of compendiums or catalogs to be able to then look in larger studies. So if we look at these hundred, excuse me, 110 genes, pardon me, interesting enough, about a year ago, a NASROM in the UK published a very nice paper in Nature in which she looked at at that time this question of what fraction of those genes were identified in the germline as explaining a familial cancer that had already been identified in a somatic setting, not necessarily in the gene or not necessarily in that cancer that was being identified in that family, but nonetheless was considered to be an important driver. And interestingly enough, about 50% at that time, and now if you revisited this, it's probably up to about 65% of the familial cancers of these 115 have, you know, the mutations are lying in a place that we know somatically is very important in one or more cancers. So these have high frequencies and these somatic mutations, again, are the important drivers, but I want to keep separate the idea of what's driving the cancer once it starts as opposed to where the germline is, you know, a susceptibility factor, and this is where we have to invoke Knudsen's two-hit hypothesis in the retinoblastoma model that you can start with a germline and then add a somatic event happens on top of that in certain key genes and then there's very high risk for cancer. So when we looked at, you know, this particular list, we certainly in DCG have had a longstanding interest in TP53 and the leaf fraumani syndrome and the characterization thereof, and we had done a large genome-wide association study of osteosarcomas and found a few regions that looked to be very important for risk for osteosarcoma, but one of our junior faculty, Lisa Merbello, had a terrific idea to take those same samples and sequence P53 with next-generation sequencing, not Illumina, but eintorrent. There are other technologies that do not work and can be published for the young fitter out there. It's important to recognize that, hard as it is to get past that. But the issue was to look at our distribution of osteosarcoma, and I show this because the punchline is going to basically show that there's a divide in terms of where we think the genetic susceptibility with respect to P53 is factoring. So we also know that there's some other things such as the war, the Rothman-Thompson syndrome and hereditary retinoblastoma. We know that osteosarcoma can arise in other settings. So it's very interesting that we went and sequenced all of the P53 exons, the UTR, and tronic flanking regions, and then we classified on the basis of already the international classification to leave from any syndrome mutations that are already in the IRC database, and then ones that are likely to be those of deleterious mutations, and then rare exonic variants, which had very low mass in the public database. So when we looked at this data, what was interesting to see was roughly 10% of the children and young adults who had osteosarcoma, none of whom had been ascertained through family studies, and we had no evidence of family history in these 765, were harboring one or more, or not more, but one of these P53 mutations. And so when we look at a different way here, as you can see the P53 mutations by age overall as opposed to looking at 0 to 10, 10 to 19, 20 to 30, we could see that there was a break by the time of age 30. So in other words, when we look at P53, and those mutations, and osteosarcoma, if they're going to happen, they're going to happen earlier in life, and this is an important hand now, to really start to layer on age in terms of when is the risk really an important risk, and when does it go away, that you can say someone is pretty much out of the woods. So we thought that our findings are very important, and they're getting published in JNCI in the next week or two, and kudos to Lisa for really pushing through on this, that the young onset osteosarcoma has a distinct germline genetic counseling, the P53 mutation testing, should this be introduced into the children's oncology group? So there's an active discussion now, and we've shown this data in Europe, and the Europeans are looking at this as well, asking the question, in the pediatric oncology clinic where you're seeing, you know, children with osteosarcoma and young adults, should you be considering, you know, a more formal type of genetic counseling, or should we do one is not saying that this means that every osteosarcoma patient tomorrow should get screened for P53, I mean we have to think hard about how we're going to study and validate this, but we find that these are the kinds of exciting things that we're getting in this discovery series, and how and in what way we move to the next level is clearly very important because the other question is the answer is of course yes, and so we are moving down that road towards exome sequencing now and potentially whole genome sequencing. So I think, you know, it's important to recognize that there is a whole spectrum of how we're doing these analyses. So if we come back to this figure here and we see that there are these very rare, highly penetrant strong mutations that we see, and then as we bleed down into these lower frequency with some moderate levels, we'll be able to identify, and then of recent the genome wide association era has really focused on common variants that have been implicated in GWAS. And so I do want to talk about these because these are very important in building a polygenic model of seeing that there are many, many small effects that are contributing to the risk of both common and uncommon cancers. So as we go forward and we see really what is a genome wide association study for those who have a weak stomach, it's not dissimilar from a vaccine study where you start, you go a long period of time, and then there's a p-value or a set of p-values that you're either really happy with or you're really depressed. So it's not for the weak of heart going forward and doing these large scale studies. But I think the key issue here is starting with case control studies and whether you're in cohorts, as we've learned very good case control studies, there's large inclumeration of cases to be able with adequate controls to be able to identify the big top signals, but we don't really have the full panoply of all of the polygenic signals that are part of that. But as we go forward, we know that we go from the cases and their DNA to scanning them to doing the QC steps and the agnostic analysis, and this is really what's different about genome wide association studies. I always thought that p53 was interesting in osteosarcoma, so I'm going to therefore look at that. That's a very dangerous thing to do, and then we have our Manhattan plots in our replication, and these are stages that take a number of years, but they are now part of the fabric of genetic susceptibility studies, and as you can see here, as of December, the field now has about 475 that have been published in a number of variation, and interestingly enough, only about 8% of them are shared, which raises this question of how much is the bias of how the studies have been conducted versus what is the shared heritability, and I'll come back to this concept of shared heritability as we look at the large set of GWAS that we have and try and compare them in some very sophisticated analyses. And the other thing is what's very interesting is of course, there are a number of regions. So it's telling us, I think, an important question that there are certain genetic regions that are very important for risk for developing the cancer, but whether you go on to develop an aggressive form of prostate cancer or whether you survive breast cancer or ovarian cancer, they're not the same regions. It tells us the complexity of these things going on over an extended period of time, as we can look here and see there's some hundred different regions. It's just like shotgun across the genome. They're scattered, but very few differentiate between aggressive and non-aggressive. A number have been published, and most of them die on the alter of attempted replication. There may be two or three that are able to actually survive, I think, stringent statistical significance, which is very important to the research of those regions, and you don't want to send people off to work on false positives, and so there is value in the genome-wide significance. So for prostate we really don't see a whole lot of measure of things really coming together. Whereas in testicular cancer, which has the highest familial risk, what's really interesting here is just about all of the hits and they come at a much faster rate, and they're getting ready to be published from conglomerating the studies, and you can see all of them localized to genes that are important in telomerase regulation, germ cell development, and sex differentiation, which has really been quite striking. It's the one cancer with the highest heritability, as we know that in the twin studies monozygotic twins from the world, so there's clearly a very strong, it's a very rare cancer, so it does raise this question of the issues of absolute risk, and we'll come back to that. And I think it's also the one genome-wide association hit that's of a high enough effect size from the kit ligand, which is an interesting gene under selection, having to do with hair color at this moment, but there have been some issues with that, and I want to give someone advice on the basis of being homozygic. Now, again, thinking about it, I don't think at this point it's ready to go into the clinic, and most genetic counselors would not jump on this and say, this is something we're going to do today, but again, this is where the discovery engine is now pushing, and in the structures, how and in what way are we going to be doing this? And the interesting thing what Aloysia and Jaumeng had really identified is that in each one of these regions, we saw a kind of pleotropy where all six of the independent regions have some that are protective and some that are susceptible, so in other words the exact same allele in one disease can be a protection and the other can be susceptible, which really underscores the importance of looking at other regions, and what we're using is if you look at skin cancer, basal cell and melanoma are in the opposite directions, pancreatic and lung as well, so this is a very interesting region that we know that there are highly penetrant mutations that are very important as well there. So when we assess these GWAS regions, there is this tension between wanting to look at all of them for risk purposes versus wanting to look at all of them for risk purposes. So one region becomes five, often they're hidden in there, and this is helping to explain a little bit more of their heritability. Some regions harbor alleles for many different cancers, and then we have to start asking the question what about accounting for exposures like smoking and lung cancer, smoking and bladder cancer, and I'll come back to the bladder cancer in terms of public health. At this point, it's really very clear that they're not ready for individuals, even though 23 me decode me and a number of groups have tried to sell these over a period of time. It's you know, genetic snake oil and I'll set enough set on that. So when we look at the GWAS, cancer susceptibility hits themselves and we look at these 475, we see a very different kind of biology in particular regions. So there are 470 distinct regions, only about 25 of which have been explained, 20% are in regions where there's no gene that's anywhere near connected to any of the correlated variants. So in other words, it's doing something for some funny RNA or something that's important in genomic regulation. When we've looked at this, less than 5% of the genes that fall under the peaks of these genome-wide showed you with the highly penetration mutations where 50% was reported and it looks like it's moving closer to 65%. Nearly all the GWAS hits that we're looking at are looking at things that are really perturbations, they're changing pathways, but they're not necessarily altering a particular gene. There are very few that are coding when we really explain them. So Mitch McKeele in the lab spent a lot of time going and really very carefully looking at all the genes and looking at cosmic, and we could see that there was really no difference in the kinds of mutations, the number and the types of mutations between those genes that were under the peaks for GWAS and those that would be randomly permuted based on genomic location, GC content, the kinds of things that we think of in terms of the locale that may be. So we come back to this architecture of genetic susceptibility of cancer and we can see that we really have perturbations of key pathways in the common variants and each one is making a small dent that's neither sufficient nor required for developing the cancer unlike what we see in the familial settings where we really do have those damage in drivers. So as we look at this map and see that well with GWAS, we're filling in this part of the space in any given cancer and with GWAS we're going here but in between these low frequency variants that are part of this oligogenic model are very tough to get out and this is where next generation sequencing and laboratory investigation really have to hit the road. So what I would now subscribe is that each major cancer has a unique underlying genetic architecture of understanding what contributes to breast versus what contributes to prostate. There may be a few shared things but nonetheless they really are in our minds very important to build these models because as we build these catalogs there may be increasing clinical utility as well as the obvious value of being able to use these catalogs particularly in common cancers where if you're able to stratify and ask the question of the changes in absolute risk for a common cancer that may have real public health implications. For a very rare cancer a shift in the absolute risk is a harder sell at this point and so I think if we had you know with limited money I would submit if we are going to really make an effort to have a complete catalog of what we think genetic variation looks like one could make the argument that some of the common cancers are the places where we should put our money and in fact we're doing that with prostate and lung and colon and ovarian. So if we look at the genetic predisposition to breast cancer in 1994 I mentioned about the cloning of BRCA1 and then shortly thereafter too we can now look and see this kind of sweep of having a number of genes that are highly penetrating but they're very rare and then the GWAS era has put a number there but we have very little in this space here that we are now going after with exome where we have to try and put together very very clever stories. Similarly we can see the doubling of the number of hits tells us that we can explain about 35 to 40% of the familial risk and familial risk is defined here we think in these common cancers is between one and a half to two fold increase so if there's someone with a prostate cancer diagnosis or a breast cancer diagnosis in a family there's an increased risk that someone else in that family has that and this is a statistic to try and explain how and in what way can the SNPs, the space of common variants explain a fraction of that familial risk because therein lies an opportunity to be able to then have stratification based on what we would see as a useful shift in the in that so prostate is looking very different here as you can see there's virtually nothing in the high risk and for the SNPs it's been very very exciting so for quite some time when the Longin Chatterjee colleague in DCG and others have been looking at this question of what are the limits of using SNPs because at one point there was a lot of excitement people were jumping all over and suggesting like in Crohn's disease where we have very high sibling relative risk that you could use SNPs to be able to move the AUC, the area under the curve to be able to get close enough to a place where we would think that it would be clinically implemented with the common cancers of breast, prostate and colon when we were looking Nalongin and Park had made some very very important estimates based on empiric information as well as now being born out by the larger consortium that there are clearly limits to looking at the sibling risk model of about two for breast and prostate and in fact just yesterday it was very kind to be able to provide an important new slide here of looking at, we know we can explain 35 to 40 percent of the familial risk of breast cancer which again the issue is in the absolute risk for screening and the question is how we do that is still very much on the table but as you can see these curves as we go from what's empirically to what would potentially be known from going ahead and looking at as many as 500,000 cases and 500,000 controls which is something that the world is moving towards right now and roughly 150,000 breast cancers and 150,000 controls and that will continue on and that allows us to really identify what we think are at least on the order of estimated to be 3,000 or more different SNPs now each of these may have their own story but it's going to be decades before we understand what those stories are but the question is using this information in terms of risk assessment is a different question and I think you know we will be most effective in having the set as possible because even if we just look at prostate cancer risks here from Antonis the UK excuse me if you look at the 76 SNPs that have been identified if you start looking at the risk factors and you look at the risk for developing disease age is very important and again as we do these studies we've been really two dimensional in our genome wide association studies and many of our familial studies and not really being able to assess particularly of secondary factors whether they're environmental and or particularly other genetic effects excuse me just a second so with that I think it's an important story to delve into for a minute with respect to looking at smoking so out of the genome wide association studies of bladder interestingly enough a gene that had been identified by the candidate gene world of one of the four or five that survived of the 10,000 attempt with publications that hadn't really survived replication is the nat2 slow acetylation gene and interestingly enough you only see the effect in individuals who are smokers so this really led us particularly with Nalange and Nat and Monsi Garcia-Closus to look very closely at this question of the cumulative 30 year absolute risk and some of the cohorts that are available to us to ask the question of what would be the risk for a 50 year old male in the US smoking and 12 snips including the nat2 and as you can see the RD here does separate quite nicely between the low risk and the high risk this is not to say that we should start screening every potential smoker for bladder cancer there are many other comorbid conditions and the like but let's just do the thought experiment that if we had 100,000 smokers with high genetic risk who stopped smoking we had effective cessation if we were thinking about bladder cancer where we have enough known about the genetic susceptibility we would eliminate 5400 cases this is just focused on only bladder cancer if we then look at 100,000 smokers with low genetic risk and they stopped we would eliminate 1500 cases so it raises this question of how and where we would apply these kinds of risk reduction strategies so it's a possible example really of how genetic and environmental risk stratification may translate into targeted prevention the so-called precision prevention but we are a long way from doing that I think the next sets of studies need to really assess and confirm this and the question is what are all the cofactors and the comorbidities of lung cancer cardiovascular disease and the like so it's really an important question but I think it's that sort of exciting opportunity that really tells us that we need to look very, very closely and why we need to have I think these very large compendiums excuse me one second so really what fraction of the polygenic component contributes to each cancer so we've done scanning of over 100,000 individuals in 15 different cancers and so Josh Sampson in our program one of the young terrific biostatisticians looked at 13 cancers used the genotype SNPs to explain anywhere from 10 to 50% of the variability on the liability scale so going back to this kind of study that I showed at the beginning for the twins you can see at this point for GWAS we can explain some fraction and depending on the cancers it can be anywhere from 15 to 50% of what we can see in terms of the familial risk with the large number of SNPs that are available this time so the shared heritability when you start looking at these and lining them up there are some very interesting things that have started to come out where you can see very strong correlations where there's overlap between cancers that you would you would expect like testes and kidneys and CLL and large cell B cell lymphoma but interesting enough DLBCL the type of lymphoma, osteosarcoma share a heritability that what's not anticipated so again this discovery element of giving us new clues and ways of thinking about diseases is very important not different from how the TCGA analysis in looking at certain mutation signatures have told us things about bladder cancer and potential viral pathogens and some of the gastric cancer and EBV these are the kinds of things that looking across these large sweeps give us new places to go so lastly really when we talk about GWAS what's in a GWAS region we know that it can inform our understanding of the somatic changes particularly in that region we have many correlated SNPs that have to be mapped and choose the best variants for laboratory evaluation so those 475 only about 25 are explained right now and a small fraction of them may allow us to really look at environmental or other genetic alterations so we have a wonderful resource that again NHGRI was really the driving force of this and it really has I think transformed how and in what way we can prosecute these and this gives us the opportunity to be able to map and go after each region so let me just give you an example here where there's a potential clinical implication I think that Mila Prokonin Olsen had identified with one of the bladder signals on chromosome 8 the PSCA gene where she mapped it figured out the very best SNP statistically and analytically in the laboratory could see that the functional SNP changed the expression of this particular gene and it just so happened there's a humanized antibody to this particular gene being tested in other cancers and so we've been trying unsuccessfully but still pushing hard to ask the question is this the disease where this particular humanized antibody should be tested and here is a potential translational application of identifying a particular biologic story that comes out of looking at a particular GWAS region this is not to promise that all GWAS regions will look like that but I think there's an interesting biology in terms of the regulation and the changes and very important pathways and particular genes that are critical for cancer so we come back to the architecture of genetics susceptibility of cancer in this sort of middle space here of the low frequencies that have intermediate effects and actually Jeff Trent was central to one of the examples I'm going to show in a minute that Kevin Brown who we hired from Jeff is an intramural investigator and he's able to do this and it's very important that the laboratory activity is there because it really does help us get past the signal to noise ratio for those that have looked at exome sequencing and seen 5,000, 2,000, 10,000 interesting looking variants it's very hard to know which ones are the right ones and to do the agnostic search for rare variants is very difficult as Eric Lander and others have suggested that we're going to need 20, 30,000 cases just to begin to identify and get the right signals to come out of this and this gets at some very profound questions so Kevin looked at the MITF gene the E318K mutation that had not perfectly segregated in families with melanoma nor was it that strong an effect in the population but it was clearly there but the interesting thing is when they went into the laboratory as did a group in France in parallel they published in Nature this very interesting study where there was a lot of biology the simulation of that particular moiety on the MITF gene sort of giving the scientific laboratory corroboration that this is an important risk factor similarly we've gone ahead and so Terry Landy and Chin Chin she in our program looking at part one an important gene in telomerase stability identified very similarly families in populations where the effects weren't were not segregating like our classical Mendelian families and they weren't quite strong enough as we saw in the genome wide association studies but they had signals in both that when we went to the laboratory we were able to identify very important laboratory experiments that really sealed the deal so to speak and I'm afraid that this space is going to be one that's going to be very difficult it's going to be much slower to get to where we're going to have to use laboratory evaluation the statistical operations to be able to get there so we know that there are many difficult regions to get at and we won't necessarily get to all of them in the low region here individually but the polygenic models of SNPs will allow us to do that and then we have occasionally these very rare things so I think at this point the report card on cancer genomics in 2015 is that it's really just the start we really have a ways to go in it's been a very exciting discovery period but we should not end the battle is clearly not over we know that discovery is in progress and we're defining very different structures for the underlying genetic architecture we know that the current profiles are better suited for risk stratification a form of precision prevention but we haven't figured out how to develop those studies yet to confirm those and then the discovery of new biologic insights in the cancer are clearly very important but the next real front here I think is to get to the environment of originating so with that I'd like to acknowledge I'm going to go on for about five more minutes Dan because I want to talk about how the genome is falling apart but I wanted to identify and particularly call out the remarkable colleagues that I've had the pleasure of working with particularly Joe Framani and Bob Hoover and Peggy Tucker and certainly a number of other individuals within DCG and the slides would go on like a Hollywood movie for three minutes of music if I were to show the 400 collaborators all over their associations but this is really the value of team science in a really very special way so let me just say a word or two about genome screening you know is it promising, is it haunting or is it dangerous it's probably some of each at this point and you know as we go forward with next generation sequencing of tumors it means we're sequencing normal tissues and it's a very hard issue ethically to how not to look at that information and when we start looking at that information we ask very important questions of what's going on with our genome are we going to have to sequence our genome more than once and I would say we may be faced with having to make those decisions because I would say aging is tough it really is, we fall apart with arthritis obesity heart disease, cognitive changes decreased oxygen consumption, immunosin essence fewer neurons telomiratrition, genetic mosaicism so one of the things that we've noticed that as we were doing six years of GWAS we had looked at over 100,000 samples in 45 different population studies we kept seeing these sort of funky QC failures but we saw enough of them that a couple of these stupid people in the laboratory said hmm there may be something there so what we unexpectedly found was large chromosomal abnormalities so about one and a half to two percent of the population over age 50 is walking around with a subpopulation of cells and their blood or their buccal component that are these very ugly looking mosaic events and I think this is something that sort of triggered this whole world of asking the question of the dynamic genome and we know that we can see this in many different populations we've extended this now to over 125,000 individuals and we know that the phenotypic expression of mosaicism has certainly been around classical genetics told us about eye disease calico-cats neurofibromatosis and the like and we know in the extreme it's an age old explanation for a subset of neurofibromatosis and trisomy 21s and turners we know that it's also very important for rare highly penetrating mutations that lead to variegated aneuploidy that tell us a lot about stability of chromosomes the families and the set 57 families as rare as they are and we also know that you know that the complex syndromes where we see and some of the leading investigators in NHGRI have identified some of these very perplexing situations where what we would say in the cancer world in AKT1 mutation in lung cancer or neuroblastom what you see in the germline that's expressed only in a particular tissue giving rise to proteas syndrome and this disease it's really quite striking so we clearly know about this but when we start looking at the large here is the chromosome you know the monculus chromosome of 122 of the autosomes and if we look at 127,000 individuals this is a paper that Mitch has led that will be published very shortly we can see that there are different kinds of events that are gains or copy neutral very large scale and we know that this is really the tip of the iceberg and we know unfortunately that this increases with aging so if we look at all of our cohorts from age 50 to 75 these events increase in size and the question is are they harbingers of neurodegenerative diabetes cancer and the like and this is really a very difficult question we don't have evidence at this point that there are strong risk factors for developing cancer per se making it complications of cardiovascular diabetes disease but again this is still the start of it if we look at by chromosomes we can see that the X chromosome is hit very hard and interesting enough and the Y has the most number of mosaices and so for men 15 to 20% of the men at age 60 are missing a good part of their Y chromosomes which is very interesting I'll show you data on that but here it's why the X chromosome is very interesting and then there are these regions on chromosome 13 and 20 that are very important in the hematologic cancer is that you see normal individuals walking around with those so the hyper mutation of the inactive X is a very active field in the somatic world so it's telling us something about the replication with the inactive X being the last chromosome to be replicated that there's more error that takes place in the ICGC and TCGA data and so we're asking this question could this be restricted to the inactive X and we're not sure but we're moving forward on that right now there've been a couple papers looking at Y loss as a causal or consequence of aging and smoking that the Swedish group has suggested that it's important for susceptibility to cancer and they've actually started a company you know I mean you may as well flush your money down the toilet the best we can tell because the data is not very strong and when we look in our large cohorts we see absolutely no effect for susceptibility but we do see age and here you can see by the time people are in their 60s to late 70s 20% of the men have lost almost all or all of their Y chromosome in their mosaic thereof and so it's really important similarly the survival analysis has been suggested but we don't see any effect whatsoever when we look in our large cohorts that we followed for a number of years and it's probably you know the probability of event decrease is interesting enough smoking you have less risk for having that Y chromosome mosaicism so in our minds this is sort of the tip of the iceberg looking at these large scale events and and a number of other groups have started to look at this with sequencing looking at favorite hematologic genes their age related mutations are associated with clonal hematopoietic expansion and malignancies looking at the TCGA data age related clonal hematopoiesis again of looking at individual sequenced base changes we've seen two populations of cells so to take a step back the hematologic cancers are very different in our mind and we've looked very closely in our scanning in our cohorts and are able to see as well as the Geneva that the there's an increase in the number of these kinds of events that are seen particularly in individuals who go on and later develop a cancer so in other words is this a potential biomarker could this be exploited to screen people who would be at high risk but we don't know what all those risk factors are but we certainly can look and see untreated leukemias particularly have an increased risk of having one of these events of 13 or 20 per se so when we looked at the CLL GWAS that we did and we particularly looked at individuals who had blood anywhere from 2 to 10 years or more before their diagnosis in these prospective cohorts we could identify a series of mutations that were seen in mosaic states and these individuals as long as 14 years before their actual frank diagnosis of CLL we don't have the full you know we don't have the full carotyping of all of the events but we can see these are all events that are reported in CLL the reds are those that are poor prognosis so it does raise this question of being able to see well you know as much as 10 12 years in advance of a diagnosis of CLL that somebody could have one or more clones that are identified at a high in a fraction that could be detected by our current technologies which is about 5% of the circulating cells that's our discriminatory difference so our implications for aging really in our minds are very important in thinking about the sort of global concept of genomic instability has been described by others and it gives us clues to what can be tolerated or potentially selected in cancer in thinking about precision prevention I think this notion of mosaicism is going to be particularly important in monitoring individuals up to a certain point are those changes specific to the cancer or are they part of a global system that's sort of falling apart and this is a very important hypothesis that I think large population studies really need to look at and we certainly know the role of this in non cancer diseases that track particularly with ages and there have been some reports recently and I think we'll see more and more so let me just end by by saying you know to be able to do this work it's really a pleasure to work with so many people to quote a great Princetonian president who happened to then move to the White House which were Wilson you know to paraphrase we use not only all the brains we have but all that we can borrow and in doing that for instance with the mosaicism we have these huge consortiums with hundreds of individuals and again these are the kinds of things that really are critical to be able to make these large and based observations that drive us back to the laboratory and make us think hard about the next public health questions and where we want to implement these particular observations so with that I will stop and thank you again and I very much appreciate the honor of giving the Trent lecture, thank you Steven thank you very very much for that spectacular lecture and limited as we are in terms of the kinds of gifts that we can bear at least publicly I have the small token of our appreciation for the National Human Genome Research Institute that commemorates this wonderful lecture. I have a little bit of time for some questions and of course there is a wonderful spread of food out in the library awaiting us after the last question is asked so anyway just two brief questions as a pathologist I have seen problems with prostate and breast in particular prostate is a natural occurrence in older males and in all through the spectrum when you find a low gleason grade prostate carcinoma or what appears to be one you don't know if it really is something that is just going to be there for the guy's life or what is going to go on so I suggest to you that the definition of what is a prostate carcinoma is important in terms of biological behavior and the second thing is in terms of breast you get what is called an atypical ductal hyperplasia and nobody knows what it is is it going to become cancer or isn't it? I understand your markers but if a pathologist is going to start calling them when they get to a certain point of atypia this is a cancer it's not going to be helpful in terms of you defining cancer risk so I think definitions by pathologists in terms of those two cancers specifically are needed before you can move forward in defining cancer risk you have to define which biologically or what is your definition of a cancer for prostate and what is your definition for breast cancer versus atypical ductal hyperplasia well thank you your point is very well taken and I would like to show you that there is a tremendous amount of effort on this so in a number of scans that have been done both in breast and prostate the issue of limiting them to individuals who have particular stages of Gleason 7 and above and a tremendous amount of effort has gone into trying to standardize in a lot of these international studies this has been one of the challenges of what's very interesting about prostate cancer is when you look at the regions that you find from scanning or analyzing 7 or even 8 and above and then you bring back in the 5, the rare 5s and the 6 you see basically the same regions are lighting up and this has been a big disappointment this has been a hard thing the idea of are there genetic determinants or genetic factors that contribute to the risk for having very aggressive types of prostate cancer we've been very hard pressed to find them we have one or two that are just putting in press another group has one or two that have the test of time of being replicated over a number of studies that spend a lot of time going through this issue of pathologic review and pathologic sort of transmissibility from one study to the next for breast cancer the breast cancer consortium have been very careful in trying to separate out the early lesions from what would you know incite to from what would be considered to be frank carcinoma and a number of the studies are now moving on using molecular characterization of saying you know luminal A, luminal B, triple negatives HER2 and those kinds of molecular, molecularly defined types to be able to more, to better refine the analyses and to be sure that we're looking at comparable things because a disease like breast cancer may be as many as 9 or 10 different types of breast cancers and so I think as we accumulate these very large studies we do have opportunities to look at the subtype specific effects and how they are but I think this epidemiologic world works very closely with scores or hundreds of pathologists worldwide to address these questions just one brief comment, your aging hypothesis has been given credibility by another pathologist here who gave a lecture last week saying that the BCL2 is known to be elevated with age BCL2 being the where's the B-cell lymphoma gene that's known to cause immortality in cells and anti apoptotic, thank you very much. So what is it about the Y chromosome and why is it disappearing faster? Good my wife's here you can ask her but well the Y chromosome is there are two aspects that are very interesting about one is if you actually look at the topography of the Y there's a lot you know there's the pseudo autosomal and shared with X chromosome so the amount of real estate that we consider to be unique is a much smaller amount that we're looking at and you know there are relatively fewer genes that are involved in things other than sex determination and fertility that are on the Y chromosome so you know there's a lot of hand waving but I don't think anyone has a very concrete explanation of you know GC content or something about the actual structure of where those breaks would take place you know it is a remarkable phenomenon that's tenfold more common in men than you see in women remember I showed you that the women have higher rates of these larger events but again that's still down in the weeds compared to where the men are. So what happens to the function of the people who are almost I guess losing most of it most of the Y chromosome. Well many of them are you know go on and maybe present in the United States or do whatever I mean it's a high enough fraction if we look at the PLCO for instance you know a cohort where there are a lot of very high performing individuals we haven't done the mapping against their professional and their personal outcomes and I wouldn't want to care to do that but it is you know an interesting evolutionary question at what point does the Y chromosome become less important biologically and I think you know the mosaicism certainly does press that question. Thank you good luck in exploring. Alright well I think we probably ought to call it to a close and please do join us for the reception over in the NIH library sponsored by our friends at the FAES. Thank you very much for coming. This was really