 Good morning. Well, thank you very much for the invitation to come to this Very exciting meeting. It's a very important meeting and In my mind and one that I'm sorry that I was not able to be it much of but I know people from my laboratory and other Laboratories and the NCI have been here and have found this to be extremely informative and most importantly provocative and stimulating because I think you know the end code and the The the the mission and more importantly the Generation of data by in code has really been the start and not the finish of a whole series of very important questions So I wanted to today in the next 25 minutes or so talk about End code particularly in the study of trying to get at the complexity of Kansas susceptibility I'm not going to give you 50 examples and 500's you know different levels of looking at methylation marks right left up down you know inside out and beyond I'm going to try and put it more in the context of how we're using it and where it's been helpful But at the same time try and stimulate some of you who may not be thinking about Kansas susceptibility to start thinking about it and how and what way to utilize the kinds of Opportunities I think that end code has really put in front of us So really when we think about the etiology of cancer it is really complex You know we think of Mendelian diseases is a particular mutation that drives a disruption of a critical gene or pathway and There are important consequences another 24,999 genes and their environment and the long non-coding RNAs and all the things that we add to that picture But I think particularly when we think about cancer, which is something that involves an extended period of time for the development They're up. We have to think very hard about the role of environment and lifestyle. You know as we all know BMI smoking You know all sorts of chemicals and the like are very important carcinogens and they're part of this equation There is this other side of genetic susceptibility and there's an argument of how much is it of each and there's some of us Who think that it's probably a hundred percent of both? Okay, and in all of the words thinking what's the genetic susceptibility? What's the setting of your carburetor as you start out in life and you expose yourself? Either willingly or unwillingly to lots of environmental challenges and then the lifestyle changes that you make You know determining your weight exercise and by the way, you know for in our Institute The view is physical activity is the next smoking and those who have lack of physical activity or do the wrong things or putting Themselves at higher risk for diabetes lung cancer breast cancer prostate cancer in the light But really we think of the environmental as the triggers and the genetics as a set point Now I'll spend pretty much the rest of the time talking about our understanding of what the set point is Because we have very little understanding of really how the triggers actually interact with the genetic makeup that we have We have a few examples, but very few and I think this is where in my mind end code needs to get environmental So to speak and thinking about not a model system But how and in what way you take that information and you plug it into trying to understand How critical changes take place that would lead to very important diseases and I happen to be interested in cancer But the same thing can apply to diabetes to coronary artery disease neurodegenerative disorders arthritis and the like So I think it's really critical and then of course we also know that there are stochastic events We know that there are errors in the program of DNA repair Which are very important and then all the chance issues and how important chance is well There's a lot of debate about this Some of us are a little bit skeptical of the statement of how How important chance is I mean I think that's more an attribution of what we don't understand as opposed to Necessarily saying that it's truly chance at a probabilistic point of view, but we can debate that later So really when we think about this we have to then now move into cancer and in cancer There are really four spaces that we live in as you can see above the line the germ line Well, it's been pretty much the most of my time talking. We know that there are a whole series of cancer syndromes that are Important mutations in the germ line that put someone at very high risk for developing cancer They're moderate penetrance genes that are part of an oligogenic model and then GWAS has been very successful in cancer And I'll talk a fair amount about that only a small fraction of what we have there is actually actionable Maybe 26 27 of the mutations that we really know what to do with another 50 We have an idea and then another hundred we think we could Surmise but we really don't have the evidence and then underneath there is the second or the third or the fourth of the fifth cancers Genomes that are indeed the somatic alterations that live in the world of drivers and passengers And we know that heterogeneity is very important And we've really accumulated an extraordinary catalog with TCGA and the cosmic data And then you know the excitement in cancer research is all about targeted therapy But again targeted therapy is a very early concept and it's not going to solve everything overnight It is a very difficult thing to do and then we have all the tests that are out there So this is sort of the space that we live in between going from research to actually trying to figure out what's actionable and Frequency and or penetrance is not the only way to move from discovery to clinically actionable to understand the functional consequences And again, this is where n-code is a very useful tool to look at many of those things Particularly when we look at the moderately penetrant genes and some of the GWAS yet, and we'll come back to that So what happens when there's more than one genome? Well, we know that We can look and see a wide Peniply a landscape of four orders of magnitude difference in the burden of Genetic changes that are observed with the whole genome or exome sequencing going from pediatric tumors that are very You know, they're fast and furious so to speak rabdoid ewing sarcoma AML Acute myelogenous leukemia and then those that are environmentally driven like lung cancers and melanoma were smoking and And UV light are driving them you can see as many as four four orders of magnitude More mutational load now the problem is we don't have large enough numbers so that we can use Frequency to pull out necessarily what we think the drivers are we have a small smidge of information that suggests that there are a Few of them, but again I think how and in what way we look at these sort of NC is Snapshots of what's the forensic picture of a cancer is very different from how we actually get there So we've known for some time that cancer is a heritable condition. We go back to Paul de Broca the wonderful French neurobiologist in the 1860s who noted in his own family Extraordinary clustering of breast cancer with his sister's mother Grandmother's aunts and the like and actually reported this did we didn't know what genetics were about in those days per se But the you know the astute you know intellect had clearly pointed that out We had ages of twins family and sibling studies We saw Familial clustering such as what Joe from any had pointed out not necessarily for just one cancer for sets of cancers And then certainly the Knudsen hypothesis of knocking out both the germline and the somatic You you could get to cancer by having a germline inherited mutation and then have something show up It's thematically and then of course the positional cloning of a familial breast cancer gene in 91 that was by 9394 really much better annotated them from Mary Claire King and on into the world of Brackham So when we look at this we spent a lot of time trying to do positional cloning and Identifying mutated genes and cancer susceptibility symptoms and they're about a hundred and fifteen or a hundred and twenty sort of Depending on your definition right now and this continues to evolve Exome sequencing will continue to put more spots on the map and one of the remarkable things is we really don't see Like we've seen the infectious disease world i.e. HLA in terms of a concentration of a cancer region And I think that really bespeaks how how complex cancer is in the multiple different pathways that lead to it These are ascertained in families with rare mutations And they have been instrumental in helping us to identify the concepts of both oncogenes Those where a single mutation would drive something in a tumor suppressor where you remove those things that are sort of protecting you Letting the cell loose so to speak and then we also know that even in the world of BRCA one the idea of the penetrance in other words You know, what's the risk that we would see is not identical for each individual again underscoring both the genetic and the environment And we've seen this more in TCGA where we clearly can see the impact of both germline and somatic mutations When we look at the nearly 500 women with ovarian cancer We could see a substantial fraction who had germline variation germline mutations that were very important That had an impact on survival, but we could also see silencing of the gene and most importantly mutations in rearrangements and Somatically so this sort of gets us to this question of these high penetrance mutations and somatic alterations when we start mapping them as Nazroman did about a year ago against the the emerging databases like cosmic and TCGA nearly 50 percent of What we are assuming are the susceptibility genes are already in cosmic and with a frequency that would suggest that they really are Drivers so these are the unfortunate errors and these are what these are the this is the world That's harder to understand. I think using the world of Encode in terms of regulation these are knocking out or doing something creating a dominant negative in In vivo so to speak and I think that that's a very different paradigm from where we go when we start for our search for common variants and complex diseases Because we're going to build up to how we see the architecture of genetic susceptibility So we have very nice reproducible technologies that start out very easily when we put large collections together and these chips give us a Multiple testing problem that we've all roared at and had difficulty trying to figure out We've come to this sort of quasi conclusion of genome-wide significance And that's been very helpful when we start to look at for instance cancer. So now there are some 490 separate loci that have been identified and more than two dozen cancers There are another 120 that are sitting out there from the anchor ray that we've seen the data and have not yet made it to publication So we can see that the world of cancer susceptibility whether you're going from rare cancers Some of the pediatric cancers like Ewing sarcoma or oscule sarcoma clearly have these common variants with small effects To prostate and breast cancer where we're now able to explain a large fraction of the familial risk And I'll come back to that in a minute. So interesting enough for all the excitement of CNVs back in the About ten about seven years ago. It's sort of the cahootech of genetic susceptibility with respect to common variants We've only seen one that's really been reproducible. That's of a common nature now when you go back to your germline susceptibility alleles that are important in these highly penetrant mutations copy number becomes very important But those are rare events. So if we're thinking of frequency Against specifically the effect size we see very little with respect to CNVs Interestingly enough just shy of about 10% are shared between cancers. So again, we don't see these soft points where we could say other than the The Tert region on chromosome 5 in 8q24 in HLA for the viral-driven cancers We don't see regions where eight nine or ten cancers are all lining up together there seems to be again the suggesting of sort of perturbation of Literally the sort of redundant pathways Interestingly enough and partly due to the the way in which these studies are conducted and have collected samples But also I think it's a question of the heterogeneity almost none of these are only two or three that are associated with outcome So raising the question what's important for getting cancer may not be as Important once you have it for your specific outcome That's a whole other world of how that germline is is functioning And that's a very important question and thinking about pharmacogenomics for which I think the end code resource is really terrific And then I mentioned that they're more than a hundred actually about a hundred and twenty that will be reported very soon Now if we look at this, we've done some work in our own Lab Mitch Mekila sitting here in the audience has spent a fair amount of time Trying to look at this same question that I showed you that over 50% of the known susceptibility Highly penetrant mutants are actually in the cosmic database suggesting that something about either germline or somatic alteration takes place Well Mitch looked at about just shy of 300 of the regions mapped them looked at the genes and Used a different a series of different approaches to try and ask the question are there indeed is their relationship between these common variants and Somatic mutations in other words pointing towards specific genes being important that we would identify through our landscape sequencing of somatic Alterations and we really don't see that in other words as you can see here These two circles look basically identical between the GWAS genes that we've mapped in the intervals in and around the GWAS hits of nearly 300 against a permutation of Analysis of genes that are not in the GWAS regions that have anything to do with cancer And it's very different if we were to go to those 115 genes that we that I referred to before We're over 50% of them We know are heavily mutated and are critical, you know as identified in the cosmic database So really you know the interpretation is that we're not necessarily looking at sufficient or required elements for developing cancer For any single one of those hits So the correlation does not necessarily imply causation And that's a dangerous view in the in the GWAS and one that some have continued to propagate and some of us feel quite strongly It's not partly because we know that we have primarily indirect associations We have markers and those markers are very important But they give us that question of how do we then prioritize and this is where there've been a whole score of papers That have been put out in the last few years Using the encode resources in DB, Regulome, and the like to be able to try and figure out Specifically how and in what way can we prioritize these and and here are just a couple of things You know the statistical approach to prioritizing based on pleotropy and annotation of the of the encode Versus actually groups that have actually tried to do this for the known loci and see if we can see patterns And the problem is with what we've seen in the genome wide association studies. We don't really see patterns We can't say beyond the very generic word that they're regular that most of them look to be regulatory But we can't say are they really all in enhancers? Are they in silencers? Are they important, you know binding sites for transcription factors for open or closed chromatin and the like? So I think we have to be really sort of careful about that So really can we use this yes and no in my mind? We can use it to start the discussion But we know that each one of these GWAS singles has to go one by one investigation the old Smith Barney television commercial for the older people one snip at a time You can't do it genome-wide and getting the answer. What's the functional complement? I realize that's a little bit of a dangerous thing to say in front of the encode audience, but I gotta say it All right it and we know that these things are giving us important insights into the biology But they're not necessarily causal, but they have a functional contribution And I think where encode really allows us I think to more effectively be be smart It is really looking at the integrating of the non-coding and regulatory information and the eqt ls and the like to be able to prioritize which variants are we going to take into the lab and actually try and make some sense though So here's an example of Mila pro-Kanina Olson and our program did a beautiful job in taking one of the bladder cancer hits For the prostate stem cell antigen. It's just by chance. That's its name So she mapped this and imputed it and then using the encode was able to look at specifically all the correlated variants and see Some of the activity that was in and around The promoter and then the risk allele actually Turns out to be very important for actually the expression and showing a difference in both mRNA as well as protein as shown with immunohistochemistry So this highlighted something we really hadn't thought about in bladder cancer per se and here is a very good example of a Translational application because this snip actually predicts for the degree of expression. That's measurable It's quite significant and one can see actually that on the market there were I mean not on the market The pharmaceutical industry was trying to develop an anti-psc a monoclonal antibody but for different types of cancer So now there's been this discussion to try and bring this in and the regulatory issues have held this up for a bit But here is a possible clinical trial taking something where you've gone from being able to map it to the functional Analysis again using encode is part of that analytic strategy to be able to say here We can show what we think are the functional underpinnings of this particular association and just this is the real lucky one This is the one and four hundred seventy five that have jumped off the pages clinically translatable But hopefully the other four hundred and seventy four or whatever are out there and right for the taking at this time So when we start thinking about this we come back to how and in what way we use this to really look at what we know Is the sort of the sweep or the architecture of genetic susceptibility and we clearly know here that in the GWAS world We've been able to find common variants that really fit this sort of polygenic model of snips and snips and more snips hundreds to thousands of snips to explain the this part of the common disease Paradigm, but we also know that there are these rare damaging Drivers that I've made the case for for Bracka one TP 53 RB patch in the light So really the question is what fraction of the polygenic component contributes to each cancer? So we've done an awful lot of scanning in our in our world in the NCI And we decided to look at 13 different cancers and then try and use some of the newer approaches to take Genotype snips and explain anywhere between 10 to 50 percent of the variability on the liability scale so in other words what fraction of The genetic contribution to that particular type of a common disease or not so common disease Can we explain by the GWAS component? And this is looking at the knowing that it's more than just the snips that have hit genome-wide significance But rather they're a lot more underneath that curve and when we look across those cancers just seen here We we can clearly see that we can explain Some fraction that indeed it does Does begin to approach what we've seen from the familial the twins and all the parent child studies that have epidemiologically Been done over the last 30 years So the shared heritability interestingly enough does bring to bear some very interesting questions We can see strong shared factors between things that have embryologic Association testes and kidneys chronic lymphocytic leukemia and diffuse large B cell Lymphoma, but we also could see things such as adult lymphoma and bone tumors in children So again using this as a way to try and put our hands around where are the things that are there that we haven't Appreciate it is as models and so going forward as we know all models are wrong But some are useful. Okay, and we have to start to think how can we use this to think about how we would predict disease? Well prediction is difficult, especially about the future and this was said by a hero of mine Yogi Berra It's also said by not a hero of mine Dan quail But really the person who really said this first was Niels Bohr an absolutely brilliant person And I think in doing this we now have with our GWAS world As well as the highly pension mutations the ability to start to map and sort of look at what that genetic architecture looks like So here looking at breast cancer. We now know that we can actually explain 35 to 40 percent of the excess familial risk with these 160 snips In breast cancer, and we know that the highly pension mutations explain about 10 to 15 percent So at this point we can see that more than 50 percent of the risk of breast cancer in a family Can be explained with the variants that we already know in hand and the polygenic models if we keep pushing them have a Potential to add more to that and exactly what that limit is is a very important quest and if we start looking at the area Under the curves so Gion Park in the launch and Chatterjee have spent a lot of time Modeling this we can actually see very interesting things that the total heritability Corresponds to about a two-fold sibling relative risk and at the moment We've been able to explain 1.4 of that too and as we continue to put that catalog together we think that we top out at about an AUC of 80 This may not be good enough for an individual patient But for public health measures it could be very important in discriminating who would get earlier mammograms Who would get earlier interventions or preventive therapies? So again, how and in what way we use this information is you know in in the public health Venue I think is coming up on the horizon for individually counseling people on the basis of SNPs I think we still have a long way to go despite what 23 and me and decode me and others have wanted us to think so we also can have to take advantage of the The most important risk factor for most of our cancers, which is age And so if you look at prostate cancer if you take the 76 SNPs you can see you can get a substantial separation between the first and 99th percentile if you look at the distribution of those SNPs Doesn't necessarily mean you're going to get the cancer, but it's important for in our minds for public health Really implementation now while we did these GWAS we kept seeing these unexpected findings of in the genome Association studies of large chromosomal abnormalities that turned out to be somatic mosaicism part of this sort of dynamic Genome where we know that there's a subpopulation that is clonally expanded and stable within either blood or buckle and We can clearly see this and as we've done now over 127,000 individuals We're able to actually see the distribution of these events hitting all the different chromosomes But particularly the x the y is even higher and that's a more complicated story that for another day But we used our GWAS chips to be able to look at these greater than two megabase Events and Mitch McKiella again had done some very nice work in putting this together and looking at that Landscape and being able to see that there were some recurrent events Which really raised this very important question I mind why are people walking around with these and many of these half of these are healthy controls who have not developed cancer per Say and you know what can we tolerate is an important question related to ultimately what may be a kind of question of sort of Genomic stability in the large as opposed to thinking in the more classical Lynch colon cancer model So we thought that it would be very interesting to ask this question as we see some of these events on chromosome 13 or 12 or 20 that are recurrent could we use the endcode data to do breakpoint analysis realizing that this is sort of first the Roughest cut so to speak but nonetheless is this going to be helpful for us to sort of hone in our regions It may be more amenable to these kinds of events taking place because we know these events take place in single base There have been a number of nice papers New England journal medicine and nature medicine in the last year showing that single base Point mutations are clearly there And then we of course know all the the classical neurofibromatosis and Turner syndrome in the light So we took 688 interstitial events in 543 telomeric events and looked at the 200 kb windows of the snips around it and looked at those permutations both with respect to the region and then other regions of the genome and What we interestingly saw was Here is how we looked at each of the different elements So the genome-wide of looking at the recombination rate versus the permutation distributions with 95% differences first thing we saw was open chromatin looked at fairly interesting to us and you see where these recombination Muse are moving over towards when we start looking at these particularly in both telomeric copy neutral and interstitial losses We also saw that repetitive elements did bear some element on this So again the question is where and why these events are occurring in these kinds of places is a very difficult question But in you know it this is instead of 37,000 feet We maybe move down to 25,000 feet and thinking about what's going on in the genome and then interestingly enough the gene-rich regions We could particularly see with respect to the telomeric events and not as well with the interstitial losses that we saw but again in our minds is raised a really a sort of a fundamental question of Really how and what way we could look at including the fragile site the different elements that may tell us are there Some regions of the genome that are going to be more sensitive and more likely to have these kinds of events taking place Because we think of a detectable most of this is miss really the tip of the iceberg We've clearly seen it in large and small events and we know it's a u-shaped curve It's seen in the very young with catastrophic diseases as well as now in the aging population And we see it lining up for neurodegenerative disorders and the like So in closing, you know to me the current challenge of taking and thinking about the relationship between germline and Sematic and knowing that the germline itself is eroding and falling apart The hardest thing that we have to explain is for for instance for both I think the highly penitent mutations and the You know the the common variants is this question of tissue specificity the origin Could it be that the effects are really mediated through the adjacent cells or through immunologic? Modulations could these snips for instance be modulating the immune system? And we clearly can see selective success of immune blockade, but not total the questing of timings effects and the hardest thing in my mind is going back to that first slide is the interaction with the Environmental stimuli so let me end by saying you know again kudos to the gang that had Envisioned and kept the encode alive despite all the different assaults and questions I think that's a spectacular resource for the functional basis of susceptibility It's an opportunity to explore many novel elements both individually with their interactions But I also have to have a call out to I think the value of team science both in the short and long term and the establishment of all of those Extraordinary thresholds and standards that many of us can use to apply to different places So there are obviously many people to acknowledge Particularly the wisdom of Joe Framani and Bob Hoover who over 40 years saw the value of biospecimens and having Exquisitely well done studies has given us that opportunity to really go into the cancer susceptibility world And certainly the acknowledgments of all the things that are part of the anchor ray consortium that I've made allusion to So why don't I stop there see if there's time for question. Thank you You know I've always been curious The the occur the occurrence of variations at these sites and then the high correlation within the specific tissue to a particular Transform state, you know seems logical, and we've heard a lot about it but in tissues where there is no evidence of a Disease state and the mutation is still there. Do we know has that been studied in the sense that do are there other Compensating factors that sort of mitigate what any misregulation going on there or what's it? It's a terrific question, and I think it raises two critical questions one is You know how and in what way Do we actually protect? And you know why is it that for instance with BRCA one is it breast and ovarian cancer? and it's not any number of other tissues per se and You know this is where the question of environmental and secondary Effects as well are very important other genes that do interact or don't interact But why why we see that tissue specificity is still very much And I think you know we we look at some cancers related to that and we say you know like with herald varmus It pointed out in the provocative questions You have some tissues where you have extraordinary turn over for instance like the heart and you have small intestines And you see virtually no cancer whatsoever At all and then you know and then you see other cancers of the skin where you see such a wide variety of the responses to to UV light and particularly the protection thereof in the repair mechanisms that it really Raises this question of really the timing and the extent to which those damages are exposed or not exposed So you know the 3d culture people are are you know showing very interesting data that if you take a mutation You put it in a particular 3d culture and then you start to let it grow to a certain point They're intrinsic environmental properties that are not necessarily determined by quote-unquote that genetic mutation That are not linear that allow for either the growth to begin to change in either its nature or in its or in its trajectory So what we really are missing is a three-dimensional character I think you know when we look at so much of this genetic information and think of it is a flat mutation I I wish I could answer your question better, but it's it's really to me the $64,000 question in in Kansas susceptibility, you know, why is it that we're not getting cancer all the time in all of our tissues? One question to what degree independently of breast cancer or prostate cancer the obvious Sex-specific ones are you stratus stratifying the risk by sex because there are a lot of other cancers that have sex-specific You you can I mean what's interesting for instance if you take the top You know five or six cancers and you get beyond the sex-specific one breast prostate ovarian But you go to colon you go to lung you go to bladder you go to pancreas You really don't see a huge difference between sexes and you don't see that the variants or the incidents really Substantially different between them So you know it is an interesting question and then you go to some others where you know as you get rarer and go down the line You you clearly see those kinds of changes, you know, I think How and in what way we can explain that I mean with lung cancer we used to do it with smoking behavior But now that men and women smoke at comparable levels We've seen pretty much comparable lung at no CA both incidents and survival in the West at least You're encouraging us to think about the role of environment in cancer Is something known at this point about how much of this would be shared environment? Private environment say within family as opposed to substantial variation across individuals I'm curious because I'm wondering to what extent would this environmental effect be captured in family history And to what extent would it be outside? Well, it's an excellent question as you know The history of linkage has always, you know When we when we have tried to do a structured analysis before we started sequencing and looking at the twin studies We've always at you know had these two categories of what's the heritable versus the environmental the environmental In most of those studies has been relatively small in what could actually be Characterized and then there is the large we just don't know I think shared environment is clearly important But again, it comes back to this question of you know the individual exposure each person You know what they're sort of big carburetors are set at and you look at BRCA one for instance You know and there we already know of you know Some fraction of the GWAS hits have very important modifying effects as we start to explain the differences and penetrance With you know by both mutations and within families So I think you know the again the quest of environment is very important in trying to sort of understand it as a model How you apply it specifically? We still have a long way to go to be able to quantify that and put that into something It would be a suitable individual predictive model, you know I mean that's where I I still worry that we have a little bit of a naive sense that we're going to be able to Explain cancer risk to people individually as opposed to in a population level or perhaps at a familial level in terms of Where you would fall in the distribution? But you know as opposed to actually being able to say you're going to get it or not get it deterministically