 No, I always joke that they can't use video of me unless they use the Michelle Pfeiffer filter. But geez, Mary Claire did great, Helen Hunt, hats off. So inspired by the picture motif, I'm actually going to consider art as in some metaphors for really looking at big picture progress that we've made over the past 10 years. I was fortunate enough to be part of the celebration 10 years ago for the sequencing and it's just an incredible time to be in science. I just feel so lucky to be part of this now where we're able to witness these discoveries as we have over the past 10 years. I don't think any of us at the celebration 10 years ago would have believed where we'd be even today knowing how fast things were moving. So I am going to be talking about sort of at a 35,000 foot overview some of the things that have happened. Association studies via high throughput genotyping which I'll make the analogy to cave painting since people think that that's sort of like where genotyping is now. That's from exome and whole genome sequencing, maybe minimalism still at this point in time. Omics beyond DNA sequence, abstraction, a word that has many meanings and getting functional information using additional omics technology I think is a very important part of where we are now. Prediction, causality, heritability, I'm going to talk briefly about. Modernism is the metaphor I'll use there. And the more complete synthesis, mockingly, I've used the metaphor of postmodernism because of course we never really get there. We're always trying to get a real synthesis of the information that's being created to really use the discoveries to get new knowledge and it's always imperfect and we're always trying. So of course we spend a lot of our time now, people like me, doing the analysis of all the genome data, trying to relate phenotype to genotype and we started out of course with a lot of high throughput genotyping now, very old hat. The sequencing is of course a much sexier way of generating a lot of this data but I think we really have to move away from thinking of these as separate things. We need to think of this as just genome interrogation and at some point this will be done probably exclusively by sequencing but right now we have to keep in mind for the DNA variants that we already know about, genotyping is still a very cost effective way of getting that information and particularly from the informatics perspective, the data storage, data management, information extraction, genotyping is still very inexpensive in comparison with sequencing for the things that we know about. Sequencing offers us the opportunity to get the things that we don't even know about yet and there's substantial value added to that. Also our cave painting, we have learned a tremendous amount from the era of genome-wide association studies. As Eric mentioned this morning, we have a thousand fold more discovery for common diseases and related complex human traits. For those of us wandering in the wilderness before this era started to actually make discoveries has been an entirely new experience, a tremendous experience but the fact is that we haven't learned as much novel biology as we might have expected given the number of discoveries that have unquestionably been made and part of that is because the signals that have come out of genome-wide association studies tend to be from non-coding, the lead signals tend to be a non-coding sequence variation and we're often unsure even what gene is being implicated by that very reproducible and highly significant association. So when you don't have a handle on the gene it's hard to get a handle on the biology. Moreover of course we've spent a lot of time picking the cherries and that's the right thing to do. I mean it's the low hanging fruit, we need to grab every bit of it that we can but it's by really mining the data that we'll start to get a better understanding of the driving biology for some of these underlying some of these discoveries and the fact is, we're learning now as you'll see, we've got clearly orchards of cherry trees to pick. The things that have been highly significantly and reproducibly associated with common diseases and complex human traits to date really are the tip of the iceberg of what there is in even the genotype level surveys that have been done. So we're on now to minimalism, so whole genome and exome sequencing has really given us an unprecedented understanding of human variation. It's been dazzling to see this unfold, the things that Sarah Tishkoff talked about this morning, the evolutionary, our understanding of human population evolution and the importance of recent demographic events in shaping genome biology has really been a product of the ability to get an unbiased view of human genome variation and it's really been fantastic. As Eric mentioned, it's enabled a successful assault on the remainder of Mendelian disorders not yet ascribed to genes and that's ongoing and continues very productively. But of course, we're learning lessons that we weren't necessarily keen to learn about the effect sizes of rare variants contributing to common diseases and their associated quantitative traits. We might have hoped for larger effect sizes even for the rare variants and it's important to note sort of all of first generation technologies we tend to have not adequately addressed power issues because we don't know where to pitch in the first generation of the studies. And so you inevitably pitch too low, we don't have the sample sizes in the first generation of studies to make all the discoveries that will eventually come when we can put more sample sizes into this, larger sample sizes into the studies. But the fact is, with thousands to tens of thousands of genomes, at least exome sequenced for some common diseases, there are not rare variants with huge effect sizes routinely falling out and generating novel biology. When it does occur, we get outsized, we get disproportionate biological information because that information is coming at the level of genes and genes where we learn biology. And so whatever proportion of human disease will turn out to be driven by rare variants, there is huge value in getting that information because it drives disproportionate knowledge of biology. But of course exome sequencing, for example, has not enabled gene discovery for all Mendelian phenotypes or pedigrees attempted. And that's a message that hasn't necessarily come through in the literature but comes through in the meetings. So people will talk about these 50 pedigrees that they ascertained in the same way for this rare Mendelian disorders. And they've got discoveries in 20 or sometimes 30 or sometimes only 10. And part of that may be the capture. And we're learning how much more we get with deeper sequencing. But part of that may be more interesting and complex models than we've appreciated even for disorders that can look Mendelian. There may be multiple contributing loci in some cases. And of course, non-coding sequence variation can have big consequences, can lead to effectively Mendelian disorders. And so we will be learning more about that as we move from exome to whole genome sequencing on a routine basis. And as I say, exome sequencing, whole genome sequencing studies done to date have not been adequately powered to enable discovery of novel contributing genes and applications to complex disorders at thousands to even 10,000 or so cases and controls. Although in the follow-ups, novel discoveries are starting to be made. So again, we need to move beyond this first generation. The very first studies are partly underpowered because the technology is very expensive at the very beginning. As costs drop, sample sizes increase, and discoveries go up exponentially. So now our abstraction, the abstract art analogies, the advances in sequencing technologies are enabling a lot of additional omics technology. So people talked about epigenomics today. It's going to be an incredibly important part of future omics efforts in understanding and pinning down the relationship between gene and environment. It's one of the ways that the environment talks directly to the genome. Protein omics, it's not proteomics, but protein omics. Rich Jones likes to call it at the University of Chicago, assay through micro-western assays. In very high throughput, he can look at actually get protein levels measured in many different tissues. And the ability to really look at genetic variants associated with protein levels I think is a very cool thing telling us a lot of new biology that we hadn't fully appreciated from just the transcriptome biology, which everybody is more familiar with, and which I think is hugely important in allowing us to draw more nuanced inferences about the associations that we have made for common human diseases. And here I think the foresight of NHGRI to invest in G-TEX, the Genotype Tissue Exchange, is extraordinary. So this is an NHGRI-led effort, including many other of the institutes, to really develop tissues from many different parts of the body and then make that available for research. So a lot of the initial studies have been on the transcriptome, but there was just an RFA recently to enrich G-TEX with other kinds of technologies to really get more information from these tissues. The individuals from the studies are very high throughput genotyping and exome sequencing has already been done. It's not hard to see that eventually I wouldn't be surprised to see whole genome sequencing of the DNA samples from these subjects. But having many different tissues really provides a lot of additional information. So just a teeny bit of some of the results. So these are QQ plots from the Wellcome Trust case control consortium genome-wide association studies. One of the first generation large-scale genome-wide association studies published on seven different human diseases. And what I'm showing here are the QQ plots. So this is the expected number of P values at any given threshold versus the observed number. And what you see is that for, instead of using all of the SNPs in the genome-wide association study for this effort, they're using just the EQTLs, just the SNPs that are associated highly significantly in cis with transcripts estimated where this has been done from whole blood. And so you can see that for the autoimmune disorders, if you look at the scale here, you're talking about P values of 10 to the minus 100, 10 to the minus 200 for Crohn's disease, type 1 diabetes, rheumatoid arthritis, that's a huge signal from cis-acting EQTLs identified in whole blood. And you can take chromosome 6 out of this. You can take HLA out of it. And while the scale will come down some from the 50s and 100s that you're seeing, it doesn't come down much. There's huge remaining signal that you can pick up by looking exclusively at these highly significant EQTLs from whole blood. And nobody would be surprised at that for autoimmune disorders. They have done disproportionately well in genome-wide association studies. So consider hypertension, which is much less tractable in the context of common variant associations. In blue, we show the QQ plot for all SNPs. And in red, the QQ plot for only those SNPs that are characterized as EQTLs in adipose tissue. And you see a much stronger signal coming off the line fairly early. So you're talking about hundreds of contributory signals here for a phenotype that has been otherwise hardly tractable in common variant associations. So I think the opportunity to look at these functional variants across many different tissues is going to inform about biology in ways we otherwise wouldn't have been able to imagine, apparently. This is not true for all EQTLs. Adipose EQTLs are working unusually well in hypertension, and that may be telling us something important about the biology of hypertension. Causality, polygenic model and prediction, I also had put under the rubric of Impressionism. I think one of the things that we have been able to learn about sort of stepping away a bit from looking directly and only at the highly significantly associated and reproducible common variant associations. So the work of Peter Vischer and colleagues to say, okay, let's look at the totality of what we are learning from not just those highly significantly associated, but everything that we measure in large-scale genome study. Proportion of the inter-individual variability in liability to disease is accounted for by all of the variation that we measure. And so looking in this way allows us to get a completely different view of how the genome is contributing to disease and quantitative traits and makes clear that part of the heritability that we were worried about having missed is there. It's just hard to see. We just haven't pulled enough of the signal away from the noise of the very large volumes of data that we're generating. So now for many traits, we have not only the previous estimates of heritability from family and twin studies that we can consider, but what's sometimes called chip-based heritability or the proportion of the phenotypic variants that we can see attributable to all of the additive effects of the genome variation that is interrogated in the large-scale genome studies. So where, for example, in height, everything that had been identified and height was better than many because they could put very large sample sizes to it. For the first 180 or so loci identified as contributing to the inter-intervigil variability in height, that accounted for only still a small proportion of heritability. Everything that you interrogate in the genome when you look at height accounts for closer to half of what we expected to find, a little more than half actually. And so the next generation of these studies I think will be really interesting. I wanted to show a few examples, and of course heritability is a very old concept. And we have done it frequently in the past by just looking at the relationship of phenotypic similarity to genetic relatedness. And that's exactly what we're doing in these mixed model approaches developed by Visher and colleagues and extended by others. But here, rather than relatives whose genetic relationships are established, we're estimating the genetic relatedness from all of the markers in the genome. So I wanted to tell you just a little, draw your attention to some interesting kinds of second generation studies that these models allow us to do. So these data for Tourette syndrome and obsessive compulsive disorder were generated by the Tourette syndrome consortium and the OCD consortium. And because Tourette syndrome and OCD were considered to potentially have some overlap in the genetic architecture, the data were collected and put together at the same time. So they were intercalated on the same plates for the genotyping studies. All of the analysis and quality control studies were done at the same time in exactly the same way on all the same data. GWAS have been published for both of these studies. But when we apply the Visher et al approach, what you see is that the heritability estimated in Tourette syndrome for the rarest of the variants is both disproportionate and quite different from the heritability estimated for the rarest variants in OCD. This is not any possible artifact of data cleaning or data quality differences. The data were generated at the same time in the same way, all QC the same way. But you see a very different pattern between the two disorders in the proportion of all heritability contributed by rarer variants. Both disorders in a sense are outliers because for many of the disorders to which these models are applied, you estimate heritability that's largely proportional to the number of SNPs in whatever minor allele frequency class you're looking at. Here, there's disproportionate heritability being estimated in Tourette syndrome for the rarest variants and much less than you might expect and then is seen for most disorders for OCD. A really interesting observation that I think is going to be interesting to follow up. And this is true despite the fact that there's substantial genetic correlation that can be estimated for the two traits. That is in a bivariate analysis of heritability, there is evidence for a significant component that's shared between the two disorders. In bipolar disorder, again, second generation for some of these heritability sorts of studies, heritability estimated from previous twin and family studies for bipolar is in the range of .7, at least .7. Some studies estimated it to be even higher. Heritability estimated by CHIP studies, so in this particular example from the gain and the bipolar genome studies was around .35 and that's similar to what's been estimated, I think, in the Wellcome Trust data. And that's with about 600,000 SNPs being included in the analysis for the estimation of heritability. But for just 27,000 SNPs that are identified as EQTLs from BRAIN, you can estimate a heritability of .2, so almost 60% of all heritability that we can estimate from the entire genome can be accounted for by just a small number, a relatively small number of C-C-QTLs from BRAIN. And that doesn't have anything to do with there being some bias in just being an EQTL. If you take EQTLs from lymphoblastoid cell lines or EQTLs from muscle, you get almost no heritability estimated for that set of C-C-QTLs and it's about the same number as the C-C-QTLs estimated from BRAIN. So again, these are the sorts of things that can give us more information for prediction, more information for biological understanding, more information potentially to take forward even into the analysis of rare variation. Causality was also, it fell in my impressionist metaphor and remains a hugely challenging issue for rare variant discoveries in Mendelian diseases for common and rare variant discoveries in complex disorders. And the beautiful work that was discussed this morning and the stickleback fish. So with the pace the discoveries are being made and the number of discoveries are being made, it's just not going to be possible to get knockouts and knock-ins in every situation to really validate discoveries. And models are not always adequate. Even, you know, some mouse models don't work for human phenotypes. Some model, we just don't have model system models for all human diseases. And it can be particularly difficult to deal with causality when you have a single family or individual with a really rare phenotype. And a group, again, in efforts initiated by NHGRI has been working on a manuscript. It's been going back and forth. A number of members of the authors are in the room. Really interesting issues that remain under considerable and heated discussion even among attendees on how to think about causality in these contexts. And I think some of the lessons that we're seeing from insulin, a molecule that we perhaps know better than any other as it was one of the first human genes. It was cloned for obvious commercial reasons, has great value. We know with near certainty for any given possible insulin mutation, whether it has functional consequences or not, the assays are, and we have very detailed assays for looking at the function of the insulin molecule. So we have a huge amount of information. As you can see, many of the mutations, there are many mutations that are characterized that cause different kinds of rare Mendelian diseases that have diabetes as part of the name. Insulin is probably the central molecule for glucose metabolism. So the black indicates mutations that can in dominant fashion lead to permanent neonatal diabetes and the yellow recessive mutations that can lead to permanent neonatal diabetes. But you'll also see in the purple mutations that can lead to maturity onset diabetes of the young. So here we have a molecule that we know rare mutations can lead to Mendelian subtypes of disease that have diabetes as part of the name. And yet there are lots of variants, lots of mutations that have been seen that we know are functional, that we know have some function in terms of altering the functioning of insulin, some of these even altering the function of cells, altering other kinds of cellular function besides just the activity of insulin. And yet they don't appear necessarily to have, to impart any risk for garden variety type 2 diabetes when they occur in the individuals in which they occur. So, you know, we sometimes think, well, okay, we see a mutation in a gene that is clearly involved in some biological process and rare variants at this gene have led to Mendelian subtypes of disease, these rare functional variants must increase risk of, no, they don't necessarily. And so we have to keep these lessons in mind as we try to move forward in interpreting and making the distinction between functionality and pathogenicity. One of the things we've learned is that there's tremendously more variation in the human genome than we have appreciated in the past. And tremendously more functional variation in the human genome than we had appreciated in the past. But it's not necessarily the case that for any given disease or phenotype, most of the functional variation at a gene at which some of that variation can lead to disease will lead to disease. So that my last metaphor was postmodernism and getting to a synthesis, of course, is something it's just like you never reach the horizon. We're always trying to get there. I think some of the main lessons that we have for today is that we've got to consider what we have in its entirety. We need to do whole genome interrogation. And even when we've got exome sequencing, we almost always also have genome-wide association data in a lot of those same samples. And so we need to be moving beyond just genes into other functional units and using all of the information that we have. I mean, among the things that we're looking at now is the ability to take, for example, the polygenic predicted quantitative trait value against the observed and see if that doesn't give us better information for identifying the effects of rare variants and for identifying the effects of contributing environmental factors. So we need to use more, ideally all, of the available information simultaneously. We need to use all of the information on genome function, all of the information on common and rare and structural variation if we're going to pull meaning out of the genome for the things that put people into hospital beds. And of course, part of the postmodernism is the recognition that we have a huge issue with respect to computing over the genomes that we have now. And so at the University of Chicago, we have moved very aggressively into cloud computing. And we believe that this sort of cloud computing needs to be made available to the public community at large. And they've actually been able to become an honest broker, for example, for NCI and the TCGA project. And this is an open resource for the scientific community even now. You can apply to use the cloud at the University of Chicago to analyze TCGA data. For those of you who've tried to do that, there's a lot of data to download and you can basically then just move up into the cloud, not have to download anything and do your computes in the environment. And I think that is part of what we are going to need to move to, to really integrate genome information into further discovery work. So I'll take questions now. These are the people in my lab who work on all of these things. The Tourette syndrome, nitty gritty analysis group. And while I take questions, if I can, I'll show you some pictures made from billions of bases. We're going to have this up on the website for my lab. You'll be actually able to take sequences. And I hope sequences of your choosing, like any given gene, any piece of your own genome and photograph, it works better, it works much better with high resolution photographs and see your genome in action. So happy to take questions. Putting together all these individual pieces of information, baby pointillism is a good metaphor. And the other question was, do you think that the reason for Mona Lisa's enigmatic smile is that she understood causality and... I would like to think that. I mean, you know, she has two X chromosomes, so it would be a good thing. I think that the genome integration, so not just integrating everything we know about the genome, but going across theomics, is really the next frontier. We have to do that. And I think pulling together the information will hugely enhance prediction, which is something that we haven't spent as much time on, because for a while we thought we couldn't do it. For many disorders, prediction would be a huge thing. We talk about the diagnostic odyssey for these kids. People with bipolar disorder, on average, spend seven years from the time they first start seeking medical care for this condition until the time they're diagnosed and treated properly. Seven years where they're going from specialist to specialist to specialist trying to get some relief. If we had the ability to do good prediction with genomics and or other kinds of omics, I think we could, you know, you'd like to think we could cut short that diagnostic odyssey, offer relief much sooner, and save a lot of health care dollars. I mean, I think there's a lot of opportunity yet for prediction. And importantly, this will come back to the ELSI, because we didn't think we could do it, we may not be so prepared to be able to do it now. And this is an area where I think we need a lot more research. Yeah, I'm a big fan of the mixed models and all the stuff that's come from animal breeding and plant genetics into this. I think they're great. But I do worry in a case control scenario, it's not quite the same. It's not kind of a closed system. There's some confounders and some ascertainment problems that could really bite us in this. And so it's not like I don't think it's a good thing to do, but I'm interested in your response to how do you deal with, especially in the rare variants, you know, the difficulty of case control ascertainment in this. Yeah, so I think it's not perfect and it's clearly less perfect in dichotomous phenotypes, particularly in a case control situation, where of course, rather than a cohort situation where you've got the whole thing. On the other hand, you know, for example, in the particular comparison between Tourette and OCD, there you've got exactly the same genotypes having been generated in the samples at the same time, on the same plates, QC the same way, and clearly a very different inference on the contribution that variation in different minor allele frequency classes are making to the inter-individual variability in risk of those diseases. And I would be much less confident about comparisons in other contexts. But in this context, these data were generated in order to be analyzed together in exactly this way. The fact that you see such a difference between the two disorders, but also between those two and many others that we've looked at, where the inference is very straightforward, you see about the same proportion of heritability in each minor allele frequency class that you expect based on the number of SNPs in that class. So I think we will see differences among disorders, and this is what I mean by we didn't learn a lot as we rushed through picking all the cherries, but as we go back and focus more, look at these data in more kinds of ways, I think we will learn more important things about genetic architecture, about driving biology.