 Okay, well thanks very much, Adam, and thanks to the organizers for the opportunity to be here today and speak with you. The topic that I was given was discovering variants conferring risk for common diseases. Certainly a central goal of genomics is to understand the genetic basis of human disease and use this knowledge to improve human health. I'd like to argue that serious advance toward this goal is a reasonable aim for the next five years. Common diseases obviously are important. Rod McInnes will be talking about these. They're of great importance in their own right, can be immensely instructive to understand the genetic basis of common diseases, and in a real sense Mendelian and Common are just two ends of a continuum. But my charge is to talk about common diseases, and it's certainly fair to say that common diseases are responsible for the large majority of human morbidity and mortality. I think now is also a good time to be talking about planning because we have substantial experience at this point with common variant GWAS, and with the results of large scale sequencing studies beginning to emerge we have some information for rare variant GWAS. So the questions posed to me by the organizers were, what are the big problems that can be solved? What will it take to solve these problems comprehensively? And what will happen if the NHGRI decides not to pursue this area? So what are the big problems that can be solved? This is really a proxy. This is a list of the 10 most common causes of death worldwide, and it's certainly not to imply that these are the diseases we should focus on or that we should do 10. And I absolutely would not favor road injury as one of the diseases on which to focus. But this is really to emphasize that there is a broad range of human, common, complex disease, really all of which deserve and in truth demand our attention. So what will it take to do this comprehensively? Well we really need to explore the full allelic spectrum of human variation, the frequency spectrum of human variation for all common human diseases to provide better understanding of human biology and disease ideology, to set targets for therapies and to allow better targeting of therapies and in some cases at least to improve risk prediction. It's easy to think about where we are right now and to forget where we've been. It's really quite interesting, quite amazing for someone who's been doing common diseases for a long time to think how far we've come in the last little bit of time. In 2006, the number of successes we've had in terms of identifying genetic variants associated with common diseases was remarkably few. This slide, a very exciting slide back in 2007 from my good friend David Altschuler is now a little quaint in the sense that we've come so much further and even this slide is a year and a half out of date but we're at the point now of having identified thousands of associations for hundreds of different diseases and traits. However, in terms of exploring the full allelic frequency spectrum, we've really only made a start and there's much more to be done in discovery genomics and then taking that information from discovery genomics further. Common variants explain only a portion of disease heritability. In terms of explained heritability, it tends to be quite small in terms of what we imagine it to be based on the data we have so far, typically of the order of 50% or less. And this talk is not meant to be about a charge for explaining the human heritability of disease but only to point out that with this information it's clear that other variants, less common and rare variants, also were important. Not a surprise but certainly clear. Low frequency and rare variants can help us understand many of these loci we've already identified through common variant association and GWAS and also help us to understand the impact of the remainder of the genome on human health and disease. The extent and effect size distribution of these less common variants is now beginning to be revealed, only beginning. But these certainly have the potential to suggest function to lead more directly to drugable targets and to clinical action. And so exploring this range of the allele frequency spectrum clearly is important. Okay, I'll be the next one to put up the slide. I actually have a point to make with it beyond just doing it for old time's sake. Certainly one of the reasons why we can talk about the kinds of studies we're talking about is this remarkable drop in sequencing costs. One of the things that's been a little sobering though is for the last few years this has been largely flat. I mean that's not flat because that is on a log scale, so there has been some decline and I think that last little blip up is probably just noise. But what is clear is with new technology coming along with these X-10 machines and I hope others as well, that curve is going to start going down again and how quickly it goes down and how much it goes down is actually quite important to us in terms of what is feasible. If we could encourage competition so there was more than one key vendor, that would be a good thing. So it's always worthwhile to think about what we've learned so far and as I say sort of 20 years of doing common human disease the key lesson from my perspective is sample size as in real estate it's location, location and location are the three most important things in real estate. Well sample size and sample size and sample size are really important when it comes to complex diseases. That's not to say you should do a terribly designed big study but that sample size is really important. As an example and I should say most of my examples for the talk are drawn from type 2 diabetes which reflects what I do a certain amount of parochialism and not wanting to spend too much time developing a talk. But here's an example from type 2 diabetes in terms of what we've done with GWAS. When about five of us were getting started on the first GWAS for type 2 diabetes our own fusion study that I lead had a GWAS with about 1200 cases, about 1200 controls. We then followed up in similar numbers some of the most interesting findings and if that had been the end of our study design we would have found exactly one locus for type 2 diabetes and in fact one that was well known. We had a little more foresight than that and already even before we got these data we had agreed our fusion group with David Alt Shuler's group with Mark McCarthy's group to combine primary data right from the start. A couple of years later we had a joint publication with our three studies now with 4,500 cases, 4,500 controls and follow up into more where we identified 17 total genome-wide significant loci. By adding more studies in 2010 we were up to 31 by adding more studies in 2012, 49 and at this point we're close to 100. That reflects at some level better methods, better technology and those were important advances but sample size was really important in making this happen and this is not in any way unique to type 2 diabetes. This is more generally true for over a lot of different disease and traits and one of the advantages of getting to larger numbers is more possible targets but also having more findings allows you to look for patterns and pathways that might be relevant. Okay so if a key lesson is sample size I just want to re-emphasize technology, analysis tools are also crucial in our case having informative low-cost genotyping arrays was incredibly important and genotype imputation made it possible to combine data across multiple studies when different arrays were used. At least as important to talk about and to be thinking about in terms of its implications for going forward is collaboration echoing something that Eric and Adam were talking about the importance of consortial efforts. Well that's certainly been true as we've been dealing with common diseases in the context of common variant GWAS. It's every bit and even more true as we're dealing with rare variants collaboration to allow joint and meta-analysis across multiple studies. It's worth thinking about what the implications of sample size are as we go to less common and rare variants because if you think about it the sample size required to detect association at some specific level of power if you keep everything else the same sample size increases with one over the minor allele frequency. So if we're going from something with a 30% allele frequency to 3 tenths of a percent as the allele frequency everything else being same requires a hundred times larger sample. Now what we hope is balancing that will be the possibility at least of somewhat larger effect sizes. With that we have a scaling in the opposite direction. A lot of our common variant findings have been with odds ratios of 1.2 or 1.1 or even less. If you go from 1.2 on the odds ratio scale to 3 you get a 36-fold advantage in the other direction. So some compensation. All right. So beyond sample size studies design matters as well. Most typically we talk about cohort studies in case control studies. Both of these have significant advantages depending on the trait and the question and in fact they're not even mutually exclusive. Case control studies are more powerful in general for genetic discovery for diseases and since we're not doing too many overpowered studies in this field thinking about what is most powerful is important. At the same time cohorts can be useful for estimating effect size, population impact. They can be particularly useful for discovery genomics for quantitative traits and also for very common diseases like type 2 diabetes where you can have substantial fraction of individuals who might be affected with particular disease for obesity as well. Cohorts are also useful for selecting extremes for quantitative traits and you can think of sort of extremes as being cases and controls when you're dealing with a dichotomous trait like disease. Beyond the general strategies there's the issue of sort of general preferences whatever study design you choose. I'd name three. I think deep phenotyping is extremely important. The more phenotyping information you have on the people you're going to sequence or do genetic studies on the better both to help us interpret associations for your primary trait of interest and also have more traits for which we might identify interesting associations. Also important particularly when we're spending large amounts of money on sequencing is broad consent to allow us to maximize the value of the samples we sequence not just for the primary purpose that we're undertaking but to make those data more generally useful. And something that becomes particularly important now is we're looking at the impact of rare variants is call back based on genotyping. So if we have samples that are appropriately consented and conveniently available in terms of the people for bringing people back in for more extensive phenotyping or to bring in their family members because if we're dealing with variants that are rare enough the only real way to get a decent number of people carrying those rare variances is by looking at their family members. Okay careful study design and analysis can weight the dice towards increased power. There's no silver bullet. There's not one thing you can say that's always going to make things better because there are different questions and there are different genetic architectures different situations but we can do a variety of things that can help. One of these is to assay multiple populations. We can do this for a variety of reasons most important at least one of the most important is that variants that are rare in some populations may be more common or even quite common in other populations. An example of that is variant in TBC1D4 and type 2 diabetes in Greenland where something that was quite uncommon was very common in the Greenland population. And then in our own T2D gene study where we sequence 13,000 individuals actually between T2D genes and go T2D two studies of the genetics of type 2 diabetes. We sequenced 5,000 Europeans, 2,000 East Asians, 2,000 South Asians, 2,000 African Americans and 2,000 Hispanics and we found that there is a variant in PAX4, R192H that was specifically associated with type 2 diabetes in East Asians. Frequency of about 8% in our 1,000 Koreans, about 13% in our Singapore Chinese meta-analysis of those two groups gave clear evidence for an association between that variant and type 2 diabetes. Interestingly enough, if we hadn't studied East Asians we wouldn't have found this at all because in the other roughly 10,000, almost 11,000 individuals that we included in our exome sequencing project there were only three copies present. I talked before about Greenland, well that's a special population founder effect likely. Here these are just large cosmopolitan populations and still quite different frequencies for these variants. Another thing we can do and must do is to group variants within functional units. In the example I'll use here is one that comes from Jason Flanek in David Altschuler's lab. This is looking at type 2 diabetes in SLC388. SLC388 was one of the original GWAS findings we had for type 2 diabetes, a common variant for SLC388. Altschuler's group decided let's take some individuals from Finland and Sweden, let's look at different regions that we have identified as being associated with type 2 diabetes, look at genes within those regions and look to see if we can find loss of function variants that might be of interest to us. When they did that they identified a nonsense variant in SLC388 that they then subsequently genotyped in a substantial number of individuals, 50,000, 54,000 Europeans, had some really interesting results but not in any way genome-wide significant, looked up the gene SLC388 in Iceland, looked it up in our T2D genes and go T2D sets to find other variants in that gene because there weren't that particular variant was not particularly common in those groups, identified other variants, other loss of function variants within that same gene, putting all of this information together ended up with an impressive p-value of 2 times 10 to the minus 6, a three-fold protection against type 2 diabetes, a potentially attractive drug target. Now to get there, that took the analysis of 12 different variants and 150,000 individuals across 22 studies. Really interesting result demonstrates the importance of being able to combine data across multiple variants to effectively increase allele frequency across multiple studies to increase sample size, enabled by analysis of data in multiple ancestries, but what a job. Jason Flanik's a really good guy and worked really hard, took a lot of time to put this together. It would be really great if data like this were in one place in an readily usable form. And so that brings me to my next point of the importance of data aggregation. I would claim that rapid data sharing, well, after the human genome sequence, is the most important legacy of the human genome project. I think that we want to do more than just deposit data as important as that is. And so aggregating data to maximize its utility, making it interoperable, harmonized, enabling more powerful and much more efficient, efficient in the sense of just being able to do it, science and inference. If we have broad consent, we can take this beyond just the realm of one particular disease area and do it more generally. And if we're wanting to go to the next step, we can take this information that we bring together and operate on in such a way that not just specialists but others can take advantage of the data being there to query the data, either with standard queries or queries from individuals, to allow us to provide results and information to a much broader audience than just the folks who are sitting here. Next question is, how large a sample is required? Well, large samples are required. Actual numbers are going to differ based on genetic architecture, which is largely unknown, not entirely. Lander's group, Zucadol and a PNAS paper earlier this year, went through a lot of interesting modeling to suggest the 25,000 cases and 25,000 controls per disease. If you're doing exome analysis as a reasonable starting point, we can argue about individual assumptions here and there, but certainly a reasonable starting point. Obvious question is, are these kinds of large samples available? And we have a happy fact of we're starting in a situation where a lot of GWAS has happened. And so this slide is courtesy of Eric Lander, bringing together across 18 different diseases. I guess this was in February, Eric said. Different numbers of genome-wide association study cases for cardiometabolic, cancer, psychiatric, neurologic and autoimmune diseases. So not all of those reach 25,000 cases, but a lot of them there are there or beyond. And particularly if one were to say, we want to do an effort where we're going to focus on a number of diseases with this kind of numbers. Some communities might be ready and some would have the impetus to do more to bring larger numbers together. So which diseases? I claim we ought to focus on some first, develop our strategies, develop our methods, cut our teeth, make our mistakes, and come to more clear choices in terms of how we want to go forward. Given many common diseases, I think we should be opportunistic and take advantage of diseases that have advantages already. In particular, diseases where we have large numbers of well phenotype, broadly consented, callback eligible samples. Situations where we have investigator groups that are well-organized, collegial and have a clear and strong record of data sharing. And ideally, and actually importantly, in speaking to something Eric said earlier, significant financial support from a categorical institute or some other funder. And there's no perfect predictors of success, but one of the predictors I think of success for choosing a disease, choosing a group, is to look at the success of the relevant GWAS consortium. How well did those groups work together? OK. We have some key strategic issues or choices. There's the issue of NHGRI versus the Categorical Institute, large centers versus distributed capacity, common versus Mendelian diseases, discovery versus translation, exome versus genomes. I don't like any of those. I think those are all false dichotomies. I don't think we can afford to deal with them as dichotomies. I think we have to recognize that on all of those issues we need both. We need NHGRI and the Categorical Institutes to work together. The role of the NHGRI is to advance paradigms, to develop and evaluate and harden methods and tools. Many, most of the methods and tools for this kind of work are general, and NHGRI is an established and logical leader. NHGRI should provide scale, infrastructure, and capacity. NHGRI is really good at imagining and developing foundational resources, from the human genome project to other genomes, to HapMap, to 1,000 genomes, and the list is very long. And exactly, in fact, I should have been looking at Eric's slide. I would have left some of the stuff out. But one of the critical things here to remember is the NHGRI's budget is small, and other ICs are much bigger. Large centers and distributed capacity, we need both of these things. Large centers set our standards, develop analysis paradigms, and infrastructure, industrialized genomics, and enable large studies. Small centers allow us increased opportunity, give competition, enable broader range of studies, and we want broader range of studies. Both sorts of centers are important for training, for innovation, and for expanding capacity. And we should be expanding capacity. Discovery and translation, genetic discovery for common diseases has really only begun. And for rare variants, it's barely started. So for translation for common diseases, to my mind requires much more discovery before it's totally practical at large scale. Translation now can take advantage of what we know now from endelian diseases, in some cases for common, and prepare us for when we know more. And we can build a virtuous circle here if we take what we have from discovery and translation, sorry, from discovery, use it in translation. And then what we do in translation in terms of gathering data, hypotheses, samples, take that back to discovery and go back and forth. Exomes and genomes. Exomes are great from the perspective of cost, relatively speaking. Sample size, interpretability, but they're inherently limited. Genomes cost more now, but I would predict that the cost for genomes will be coming down faster than the cost for exomes, making that comparison less unfavorable to genomes. It's surely the direction will be going eventually. It's time, I think, for more significant investments so we can learn what we can with genomes and prepare for when we're ready to do it at a larger scale. And genomes also provide better exomes. So I almost didn't bother with this slide because I suspect I'm right at about time. But what will happen if NHGRI decides not to pursue this area? It's hard for me to imagine that NHGRI would not pursue this area. But if NHGRI does not, for science and biomedicine, it would result in a more fragmented effort that will be slower, more costly, less efficient, and result in less interoperable data. And for NHGRI to be a huge lost opportunity. And so last two slides were asked to give a set of opportunities that I guess you and is then going to go over this afternoon. And so the ones I've put down here, two slides. First, we can focus on some exemplar diseases. Now, whether we should be selecting diseases as a group or I would prefer having groups propose their disease and how we go about it and what the advantages are and how they can make that happen, I would probably favor the latter. Samples. Despite my comment and Eric's nice slide in terms of samples available from GWAS, we really want to encourage identification and aggregation of large, well phenotype broadly consented samples because we will need more. And for many of those, the samples are not ideal in terms of all of those characteristics, in particular for callback. In terms of a resource, a set of recallable sequence genomes, for example, loss of function carriers for every human genome would be a cool gene, would be a cool thing to have. Technology, we cannot do anything but continue to focus on sequencing technology and statistical and computational methods and tools. That is what we are about at this institute. Whole genome sequencing, as I said a moment ago, I think it's time to do more. For information, more active data aggregation and sharing and knowledge sharing is important. Discovery and translation of virtual circle, if we take advantage. And the last three I really haven't spoken of, but functional characterization of variants right now is ad hoc, post hoc, slow, we need something that's more prospective and high throughput training. We need to invest more in genome science. I think it's great that we're investing in more training for translation, but more for genome science, statistical, computational science. This is really inexpensive investment, but a very important investment. And then because every speaker is allowed to put one thing at the end of a talk that they like, until genome sequencing becomes incredibly cheap, genotyping huge numbers of samples for genotyping arrays I think is a really good idea, not necessarily for this initiative or this institute, but something we shouldn't lose focus on, that there's a lot to be gained by doing that. And I think I will end there. I will thank the many people who gave me comments in preparation for this talk and look forward to hearing a discussion. Thank you. Thanks, Mike. I think we have time for about three questions, four questions. Yes, this is actually a technical question. On the issue of sample size, you made, brought up the point of 25,000 cases, 25,000 controls. And I think I saw that that was for exome sequencing. If you went ahead and thought about it for whole genome sequencing, do you need more? How does that affect it? So you do need more. And the Zook et al paper did a nice job of looking at that. And so you can view the genomes initially as your opportunity to do the exomes really well and think of the 25,000 and 25,000 as a good way to go and having the information to allow us to go further across the genome. But realistically, we will need more. Richard. Thanks, Mike. Can you make some high level comments, please, about family studies, very large family studies in, as we move ahead in design and different collection modes? Yeah. So, so as I mentioned briefly, family studies is one way to allow us to go for a really rare variance and find multiple people who are carrying them in a way that if we do case control or cohort, we're just not going to get very many people carrying those. Even so, you need really big families to get very many carriers. And so unless the effects are really strong, they have to be truly large families. I really believe in the importance of family studies. I mean, we still have Mendelian diseases we're working on and they're surely good there. But for rare variants, for common diseases, you really need very large families for it to work. And there was, please. So I don't want what I'm going to say to be misinterpreted. I think that NHGRI has to invest heavily in common diseases and the discovery of loci that influenced them. But I do think I'm missing and I have trouble seeing the especially midterm translation into the clinic. We wave our hands and we say drug discovery. And we all really know how hard drug discovery is and drug targets. So I guess I'm just saying this for two reasons. One is I think we have to remind ourselves that common diseases have many causes. Genetics is simply one minority cause. And we shouldn't kid ourselves about the, say, five year time horizon. And I guess the other question would be for the group as a whole is I think we need to think hard about what are the ways to translate these productively and realistically into the clinic. And the only two things, again, that we're used to hearing are drug targets and kind of this nebulous incentivizing of people who are at risk, which frankly is not much of a magic bullet. So I'm just interested in your. I'd agree with both comments. Is that time for one more? Hi, thanks. Carol Bult, Jackson Lab. So your comment about data aggregation and interoperability really resonated with me. And NHGRI has really also been a leader in developing sort of ontologies and semantic standards that make that possible. So I think that also needs to be part of the discussion over the next two days is how we support, what is the role of those projects that enable that kind of data integration and aggregation in an ongoing manner. No, I totally agree with that. And we're working hard on it in the context of type 2 diabetes and hoping to work on it also in the context of psychiatric disorders. I'm sorry to interrupt. I think we are going to have to move on to the next speaker. But please reserve some of those questions for the discussion following lunch.