 Okay, good morning. Thank you to the organizers for inviting me, and as I was telling a few people, it's been a long time since I've attended a pure genomics meeting, and I sort of forgotten meeting I go to more human genetics and genetics oriented meetings, and I've forgotten how much fun this used to be, or still is, I should say. So, given my current perspective, what I thought I was going to do is speak to you about a specific example that I think will exemplify many of the problems we already discussed yesterday. So I'm not going to speak specifically about ENCODE, but rather how ENCODE kind of data are leading to the solutions of problems that we simply didn't have in human genetics. So to give you a perspective on disease, at least what the disease community, or people who are interested in the pathophysiology of human disease, what is it that they want, and why it intersects so strongly with genetics, is I've sort of used a quotation from, really, in medicine, a very well-known book by Barton Chowz, who used to be a colleague of mine and has passed away, unfortunately, a few years ago, and I'll recommend this book for you, it's called Genetic Medicine, a Logic of Disease, as to that's what genetics can provide. And he poses three very simple but cogent questions that patients want to know, and of course we as scientists wish to answer, which is why me. Why this disease and why now? And if you think about it, you would appreciate that the first one has to do with a question of genotype, the second one has to do with specificity, and the third with many aspects, but clearly epigenetics and environmental factors come into play. So I think for a long period of time we've been preoccupied, really, with just mapping human disease genes, both simple and complex, for a very simple reason. This was a very, very slow task, and you can't do genetics without at least knowing what the unique identifiers are. And ENCODE has now pushed us into this much larger land of trying to understand how cis regulatory elements or enhancers, as just a shorthand, can provide us a view of disease that we never really had. And I think we all know that even though we are in the very beginning stages of just doing a simple intersection of finding sequence variants in various kinds of motives and cis elements of some defined function, that this is really very incomplete. What we would want to know is answer the question of much more mechanistic and qualitative term. And these issues have already come up in the meeting yesterday. So I'm going to give you some reasonable speculations on complex disease genetics, say, after 10 years or more, a long history of Mendelian genetics, and 10 years or more of many, many groups around the world trying to do GWAS kind of mapping and probing the answers. And the first one is that it's quite clear that there are many, many genes, as originally expected, anywhere from hundreds to thousand, height already has 800 sort of identified SNPs or more in different locations. So each trait is going to be composed of many, many different components. Even though we sometimes focus on the statistical aspect of how much of the phenotypic variants we've explained, there's a whole variety of analysis very well done that actually sort of emphasize the fact that common variants, those say with the frequency about 5%, 10%, that in many cases they explain at least half of the trait and disease variation. We haven't found most of them. And this is where encode, of course, in reverse can be very, very important. And so the question is, if we don't have the statistical power to map them individually, which is the third point, we might never be able to find all the individual SNPs that contribute individually to height or diabetes or hypertension, let alone even the rare ones, so we have to find other ways of doing it. I think at least much of our data has suggested, and clearly most of the mapping data that each locust that's been mapped by GWAS, they may in fact have more than one culprit gene in many cases. And clearly there are many, many individual SNPs that contribute to the observed association. And so one of the really big challenges is how do we get to the true genetic structure for each given gene and locust that we come across. And at the end, I'm going to come back to this issue that we've done genetics. We understand the importance of the recombination map to focus on the bit of the genome where a particular part or a phenotype maps. But I would ask you to think, and I think we should argue for a while, that what ENCODE has brought to us is the idea of physical interactions between distal elements and a gene. And so we have to consider the physical distance. We will have to step out of the genetic recombination haplotype, a haplotype, as I will show you in my example is often really defined by the functional piece of DNA that is relevant, not only what the recombination distances are, that we've statistically focused on. So here's the example. We've studied this phenotype for many, many years as a model to understand what really happens in complex disease genetics. And many of the things that, or I should say aspects of complex traits that we take as given today, we've been able to show using Hush-Prince disease as a model for actually many years. I'm not going to talk much about the phenotype, except to say that what it is is a defect of the enteric nervous system. Most of us don't think about the fact that we have an intact, almost independent nervous system in our gut, and this is a disorder in which these children are born with absence of the enteric nervous system in the gut, which leads to a functional obstruction and needs to be repaired and there are all kinds of other consequent, both problems that comes with efficient repair as well as without repair. As disorders go, it is relatively rare. It's about one in 5,000 live births, but this of course is 10 times or more common than most Mendelian disorders. It is clearly multifactorial from a variety of evidence. It's got a 4% recurrence risk and many, many other features that has allowed us to probe the genetics. At a molecular level, the study is done for a long time. We know three essential bits, and the three essential bits is much of the genetic defects that we've known arise because of defects in two major signaling pathways. Is this the one? The one in the middle is the laser end. We occur through two major signaling pathways. This is a receptor tyrosine kinase pathway mediated through RET, and this is through a G-protein couple receptor, the endothelium type B receptor, and all of the other components, and I'm not going to go into details. There's very strong epistasis between these two pathways, and we have a crude understanding as to how this epistasis arises. The second feature that is quite strong is that the disorder really occurs because of an interaction, or I should say failure of an interaction, between what these cells of the gut meson kind do. They actually secrete many trophic factors. Now we know they secrete other factors that are important for the maturation of enteric neurons and the enteric neurons of the neuroblast themselves. We at least have a crude understanding of what the mechanism is. That is, if you have a defect here or you have a defect here, that will lead to a failure of these two kinds of cells to communicate early in development. You get failure of these cells to develop and differentiate and proliferate and then differentiate, and you get lack of these cells, which is the aeganglionosis that you see in the gut. Third, there are many genes that we have very definitively mapped to, or we find mutations in Hirschsprung's disease, but they don't fit in this pathway. Of course, there are many, many other portions of the gut physiology that impacts on this that we're trying to find out, but basically, RET is the major gene, and I'm just going to tell you, and maybe in question and answer we can discuss it, that we now know that almost every Hirschsprung patient has a RET mutations. Most of them are non-coding mutations, but almost every one of them have. The essential feature for today's discussion is that even though it was found through severe syndromic forms of this disorder, we now know that there are many, many mis-sense mutations, they individually rare, none of them are full penetrance, and of course, we found analogous mutations in many other genes of the same pathway, SOC-10, which is its transcription factor, GDNF, its ligand, and GFRA1, a very important core receptor, RET, is sort of an unusual RTK in that it requires its core receptor. The surprise came in trying to explain Hirschsprung's disease several years ago. There's a study that we did originally with Eric Green in the old days, when you had to clone out pieces of DNA, you had a sequence you had to do. We did multi-species comparative genomics to identify regions, which we then tested as enhancers, and we found a very common variant. It's about 25% in the general European population or those of European ancestry, but it is a gut-specific enhancer. It is bound by SOC-10, and this polymorphism disrupts this binding and leads to low penetrance, but still quite highly important mutation or variant for this disorder. Interestingly, it shows this form, that is, a coding is involved in the long and severe forms, whereas a non-coding is involved in the short segment of the much more common forms. Because of this finding, we argued recently that because we knew of other enhancers in this region, that any regulatory variant that would reduce red expression, which this one does, which clearly all these other coding mutations do, should have similar effects. So the idea is if we can find one loss of function mutation, any other loss of function mutation that would lead to loss of red expression, even through non-coding, should lead to that. And this is where this stands from the screening, and you can imagine that ENCODE has played a, this is just a summary, has played a very big role in this. We now have three such different non-coding polymorphisms. Here are the RS numbers, which I never remember. Red plus three is the one that sits in the first large intron of red. These two are further away, as you can, I think there's a distance here. This is, so this is roughly 10 kb from the transcription start site. Red minus one is a non-coding polymorphism that's much further away. It's about 125 kilobases upstream. This is about 18 kilobases upstream, so closer to red. They have very large allele frequencies, and here are the odds ratios. And these odds ratios, by any count, four or close to two, are really phenomenal so far as complex disorders go, which is why it makes this as a model. But if you did a genome-wide association study as we did, this would clearly be recovered. In fact, we found this quite, which is why we found it early. This one was recovered from a genome-wide association study. You would not take this, because you would say, well, this is just another polymorphism that's in high LD in some sort, and of course it is not genome-wide significant. And the point that I'd like to make is once you get close to a locus, you have to try and interrogate the effect of many other polymorphisms that actually do interrupt the known elements. So here is what we did in order to find these three, that is throughout the region, we did a much more forward kind of screen by looking at, we looked at 125 SNPs that were all of major frequency that were peppered through this region in encode-defined elements of various sort. 34 of them were Hirsch-Prung's disease associated, that is by conventional GWASP value. Eight of them lay within chip-seq peaks. In the specific cell line, there's a human neuroblaster in the cell line and sort of relevant, really a relevant cell type in that sense. And all of these were enhancers, these GRIS enhancers enhancers, we really identified the hard way in this collaboration with Eric Green about 10 years ago. And what we find is three of them also showed allelic differences by doing various kinds of reporter assays. I think the effect sizes here are all substantial, not only individually, but combinatorially. So these, they are in a region in which there is some residual linkage to sacrolibrum, but as I will show you that the effects are not all at the same time in development, they act in different times in development, in different combinations, and that's the basis for finding genetic interactions even between these non-coding elements. You know, these SNPs show significant levels of LD in the sense the R-square is small, but they all lie in a pattern and therefore they're relative to sacrolibrum, they're all large. And therefore what we can do is to identify a major source of misinheritability of this locus. So I'll very quickly go through. I think I've generally explained this. So here are the three polymorphisms. In this particular case, many of these DHS sites and other sort of histone marks that came from the, both encode as well as some from the roadmap project and this is an intersection of all the SNPs that I told you about that anyway led to those three. If you looked at the transcription factors as well as if you looked at the sequence that was right under any of those SNPs we found, we'd known before that red plus three interrupted a SOX-10 binding site and this one suggested that it was a retinoic acid receptor site, later that we proved was RAR beta and this one was a gata site. This later shown to be either gata two or three. In fact, we find effects on both. So these are just Luciferase assays to show that these three elements, at least apparent control or wild type form, they do show an effect, an enhancer effect. So here's the, I will keep calling is that a genetic sort of wild type versus mutant forms or the control versus variant forms. And as you can see, it actually scales fairly well with the odds ratios and we have now done many more of these studies. By the way, some of the studies involved only two of the SNPs, some three because we're in the middle of completing all of this. But even when we take all three, we have synthetic constructs in which we can assess reporter activity of combinations of these SNPs. And here is the haplotype specific odds ratios that we can measure in a population of about 600 to 700 patients. And what you find is a log ratio or a log relationship as one would expect if you assume some aspect of mass action. So the first two elements that we discovered not only showed reporter activity in vitro that was done in a mouse neurobestrom or cell line called neuro2A, the new one is much, the human line is much better. And what you can see, hopefully you can see that the enhancers in fact, are found both at 11 and a half and 12 and a half. But red plus three, which is the Soxton bound enhancer, by the way, it's found in not only the gut but in other cell types that's of importance in the development and where red is expressed that is in portions of the forebrain and in fact in the dorsal root ganglia. And what you find is the first enhancer is active both at 11 and a half and 12 and a half. The distal minus one enhancer is found at 11 and a half and not at 12 and a half. In fact, if you do measure gene expression of the cognate transcription factor, RAR beta is expressed at 11 and a half and not at 12 and a half. So these obviously do act as enhancers in vivo as well. And this is to show that the effects are specific. That is, I told you the Soxton form and the RAR beta and the Gata 2-3 transcription factors. This is doing knockdowns in doing SIRNA knockdowns and this is all done in neuro 2A. And what we found is the expected result that the positive control or the negative controls in this case work. If you knockdown RAR beta obviously gene expression of RAR beta goes down as it does in all of these cases but what's interesting is when you knockdown Soxton red goes down as it should because it's the transcription factor but when you knockdown red what you find is Soxton goes down. This suggests not only the same is true for Gata 2 and 3 so this suggests that not only are these the transcription factors that regulate red but red at least has a feedback effect at least on the Gata 2s and the Soxton transcription factors. These are just not artifacts of any particular cell line. We can show these results by looking at a retinol and a ret wall type mouse embryo from the same early time in development and I'm not gonna go through the details. This is just to show that this effect that we found in cells can be recapitulated in vivo as well. So I wanna make a final point that one of the kinds of tests that we do in order to show specificity is that the transcription factor effect when in fact you do knockdown is specific for the so-called wall type or the native form of the enhancer. And this again shows in a number of cases. Let me see, I think my eyesight is failing me that I'll take the big one here that even though I think this is the control that when you do the Soxton that what you find is a difference that you find it on Soxton and I'm sorry on red but you don't find it on these ones because you find that the difference that you see on knockdown is equalized for the wall type form. So in this particular case, what you find is that quite interestingly and other data that I haven't shown you is that all of the variants that lead to Hirsch-Prings disease in this case that the effect all components of red signaling. This is sort of written in the form of these are the three transcription factors. Obviously they make the cognitive proteins and they regulate red. This is the part of red production that is important and I gave you some evidence that red in fact signals back and in some ways controls at least these two transcription factors. We also know that if you knockdown red or if you look at the red Null mouse embryos that they are negative regulators of its own ligand as well as its co-receptors. So this fact is, so this is another level of control on red. Finally red is only active in its dimerized phosphorylated form and this of course you don't want during early times of development you want tight control of this to control proliferation within the gut and so this form of red is rapidly in fact cleared. The signal is terminated through the action of an E3 ligase called SIBL and what we find is this is also positively regulated by red. So red is a particular case in which its production, its activation and its signal termination are all controlled through various kinds of enhancers and one of the interesting things is that we find that genetic variation in all of these components through its non-coding elements is an important part of this disease. And the total effect therefore depends on the total degree of loss of function. So I'm just gonna end with a couple of sort of pleas really that in order for us to understand even this relatively simple disorder I think it required us to understand the rest of the network that is just finding a non-coding variant outside of red was not sufficient for us to explain the kinds of odds ratios, the kinds of patterns of risk that we see that are specific to not only haplotypes but also in the groups of patients that we have. So I think it goes without saying that since no gene acts alone understanding the functional context of its expression is actually very crucial for us to get eventually quantitative answers and that's what I mean by modeling and predicting what the consequences are going to be. And for this there's a lot of extensive literature that suggests that the gene regulatory network is in fact a fundamental unit to consider because of many, many reasons indicated here is modular parts of this network is in this particular case very highly conserved and there are of course a number of classical studies that have compiled much of this information. At least Eric Davidson argues that this comes in a defined number of sub-circuit classes so it's possible that we might be able to classify them in some ways. They do provide some system levels explanation of the development and physiological functions and I think there are many opportunities for mathematical modeling that can provide then the quantitative answers. And I think this finally gives us a way for understanding complex inheritance in a way that we haven't, we are still doing it on a gene by gene basis. Many of you in this audience would know the work of Iran Segal when he originally tried to model and explain gene expression along the anterior posterior axis during fly development. But this is a model, it's a thermodynamic model, it's fairly straightforward and I think this gives us ways not only in this case to look at gene expression but there are various ways in which what we are learning from ENCODE can be used to build mechanistic models for gene activity, for trade values in some sense. You'll hear Nancy Cox speak about it and eventually for categorical diseases, what matters, which is penetrance. So I'm just gonna end with a series of acknowledgments of many people that have helped us in these studies and I know that I was asked to answer some questions which I'll just keep out while I and sort of point them out as we discuss. All right, thanks. Just for the audience. Frequency and those kind of odd ratios. Oh, I got frequency in those odd ratios that for me suggests that this is under balancing selection or positive or something. Something must be pushing that allele, those alleles that are high or keeping them high. Is that the, have you looked at that? Is there an argument for that? Yes, so the original one which has the biggest odds ratio, clearly that SNP has republished on this before is almost all of the features of balancing selection. One thing that I should tell you though is Hirschspringer's disease was an effective lethal about right now 70 years ago. So it's gone from seeing only new cases or cases that were perhaps very, very mild and that had expression to now people living and obviously reproducing and passing the phenotype along. So whatever selection we are seeing obviously had to be, I mean, so in order to explain it, it's not a recent phenomenon. There has to be a long-term phenomena. And in fact, the allele frequency shows all of the features with surrounding at least some of the surrounding, you know, within the recombination haplotype. So no, we've speculated on this. I mean, the gut is a very favorite place for doing various kinds of selection. The second is we know of receptors that are null or nearly null that have like the chemokine receptor and the CCR5 that can act as, or prevent the entry of various kinds of pathogen. RET was considered to be only neuronal, but now we know that is expressed in other cell types, including the epithelial. So it could be, it has nothing to do with Hirschspringer's disease per se, but selected simply because of things like that. So, Mike will be real short. Everyone, we've lost the desk mic volume, so please speak up when you're projecting. Oh, ENCODE clearly helped us. What fraction of that? So ENCODE clearly helped us in the last two, the first one has been known for a very long time. And, but ENCODE also suggested many others. I think what we were finally helped by was the fact that originally we had a set of, I forget 10, like a dozen pieces of DNA that previously we'd also shown were enhancers. So I think the combination of that just gave us much more confidence of which ones to follow on. Thanks.