 So, this is the seventh in the series. It's looking at replication and functional studies. It's basically answering, we hope, the question that Kang asked yesterday, you know, okay, you find your genome-wide association, then what? So, we'll talk about replication and also then about sort of how you find the potentially causal variant in neighboring regions. There are various approaches in sequencing that I won't go into in great detail, looking at associations with protein products, expression studies, and experimental studies. You've seen this by now many, many times. You can see that we're quite enamored with it, and it really did sort of come to us as being probably the main way that we want to assure that genome-wide association studies are robust. One of the reasons we put this working group together back in the fall of 2006 is that we recognized there really was a lot of disagreement as to what truly constituted replication. We knew that there was an avalanche of genome-wide studies that were sort of on the horizon at that point. Certainly, candidate gene studies had had tremendous problems with this. Replication was held to be kind of a sine qua non of a true association. And we knew that any single study would have difficulty establishing an association until the sample size has got to be reasonably large, and of course, Easton and other studies, when they went into 25,000 and 50,000 cases and controls and that probably are up there in terms of sample sizes, but initial studies had trouble, and how to interpret sort of confusing and spurious, potentially spurious findings. And one of the cases in point that was used as a good example of this problem was the DTNBP gene and associations with schizophrenia, and it first been identified as a putative schizophrenia susceptibility gene in a group of Irish families a few years ago. And then there was confirmation reported in several replication studies and independent European samples. But it was sort of different risk alleles, different haplotypes, sometimes different directions between studies. And the comparison was difficult because different studies had used different marker sets and variants to genotype. What was done in this study by the Broad Group with CD et al., they basically took the HAPMAP data and used it with all of the polymorphisms in this area to sort of identify and link together all of the variants that have been tested in these various studies to produce sort of a high density reference map of this particular region of this gene and identified five different haplotypes of this particular gene that were tagged by these six SNPs, shown here is the ancestral haplotype and there are ways of clustering these that you can kind of go back and see and determine which one was ancestral and I won't go into those. And then they had this kind of neighbor joining tree that basically showed, okay, if you changed in your 11 position the A to the T, you ended up with this particular haplotype from this ancestral one. If you then went forward and changed to the number three position, the G to an A, you ended up with this haplotype with the T there. And here if you change the G to an A and a C to a T, you end up with these two haplotypes. What was interesting was that in the study, in the various studies that had looked at this locus, they sort of each looked at a different thing. So in green is this Kerov study here. They looked at haplotype two, and I'm sorry, at SNP two, which tagged these two haplotypes you can see, sorry, these four haplotypes seen in green here. And this study in red just looked at the T variant. This study here in kind of brownish looked at these two and the purple ones looked at different ones. And so basically each one was tagged by association signal for at least one study, but implying that there wasn't one common variant that contributed to schizophrenia risk. And in many instances, some of these associations were in different direction, opposite direction, from what you would expect based on the SNP, the linkage relationships between these. Suggesting to many people that these were either spurious findings or that there was a lot more to this locus than you might expect at first glance. And we really needed to know a lot more about it. So with this example and examples like it, we sort of agreed that we, there were certain ways not to do a replication study which had been used in these various studies. If you want to use a different phenotype, that's a good way to not get a replication, use different markers. Next, what we refer to as fine mapping and replication. So fine mapping is, at least for our definition, is when you add markers that weren't on your previous, your initial genotyping platform. Duplication is you take exactly those same markers and type them. But in fine mapping, you actually add markers in between because you want to learn more about the haplotype structure there. Using different analytic methods, some studies use haplotype methods, some use single marker methods. You may or may not get replication in that approach. They use different models, different genetic models, different analytic methods, etc., or different populations. So all of these are sort of ways to not get a replication. But in addition, there may truly be spurious findings that you want to get rid of. This is an example here, the PDE4D phosphodiesterase, where I was initially reported as being associated with stroke by the decode group. And then two subsequent studies did not find an association. And a meta-analysis of these studies did not find an association. And this was published in Nature Genetics to their credit, which had published the original study. And the original authors then wrote back and said, you know, you're probably right, it probably was spurious. But it looked good when we first did it. Another good example, I think, of a potentially spurious finding is this initial report from the Framingham group looking at a common genetic variant associated with adult and childhood obesity, this by Alan Herbert, being the first author. And reported an association between the minor allele of this SNP 6605, near INSIG 2, which is a gene that seemed to have some biologic possibility for obesity. And increased BMI in Framingham Heart Study participants. It was reproduced in four additional cohorts, but in a fifth cohort reported in this paper, which was the nurse's health study. It was not seen in that group. Then there were four subsequent replication attempts that found no support. This Roscoff study, the gene did not exhibit a significant increase risk for diabetes. And in fact, what it did increase was the risk for obesity in already overweight individuals, if you just look at that subgroup. And then the loose study found no evidence of association here and actually an opposite tendency, all suggesting that perhaps this was not a terribly robust finding initially. To their credit, many of the folks who had been involved in this study initially, and you can see their names highlighted here, then went back and tried to look at why this might be. And I think one of the things we need to recognize about lack of replication is that it may be telling us something important scientifically that we'd like to learn about. So what they did was to look at nine more large cohorts from eight populations, multiple ethnicities. They had multiple designs, family-based, population-based case control designs. And they found an association in five cohorts, but there was no association in three cohorts. They also found a variability in the strength of the association over time. That's over calendar time, suggesting, and we know that there have been major cohort effects in obesity. And so that might be some reason for that. And they found replication weekly, though, in unrelated as well as family-based samples. And they felt, again, there's probably something here, but it seems to be heterogeneous, and we need to learn more about it. The same group then just very recently published this paper looking at timing of these analyses. Timing can be everything. And what they did was to look at age-varying associations and ask the question whether some of these associations they were seeing differed by the age of the participants, and there might be variants that are associated in childhood, but not in adulthood, and vice versa. And they note that it's difficult for cross-sectional study designs to detect these kinds of things. So they identified a variant intronic to the gene Robo1 that did have an age-varying association to BMI over time that was associated with BMI before age 45, but actually really before age 30, but then diminished after age 45. And they were able to replicate the fact that the association varied by age in the same direction in five of eight other cohorts, and then that didn't have power probably for the others. And one childhood cohort showed a very strong association overall, and here's the four other cohorts. This one was really driving the thing. They did end up with an overall p-value that was quite strong as well. And they note that in all of their replication cohorts, but one, the association would not have been detected if they were looking only for the main genetic effect and not for the age by with the SNP or 5832 interaction, which is interesting. When people have been looking for interactions, they've sort of focused on those that have main effects and then go forward recognizing that they're probably gonna miss some of the false negatives that have no effect when you look at it without accounting for the heterogeneity, but do when you account for it. And again, this adds to the complexity of the work that one needs to do. So one of the things that came out of our working group were sort of definitions of a robust initial finding. What is it that would give you confidence and so that when these things come across your desk in terms of being good genome-wide association studies, you'd like them to have sufficient statistical power to observe the reported effect. And that may obviously vary by the magnitude of the observed effect. So you'd want them to have sufficient power to pick up a 1.5 and if they picked up a four, maybe that was spurious. The analysis should be highly significant and we kind of declined to say how significant, but it should be using sort of a stable, well-accepted method. It shouldn't be just the only people that can replicate this or the people who use their particular method of analysis. The finding should be using a simple straightforward analytic approach so that you don't get the feeling that they sort of looked at it in a simple straightforward way and said, well, we don't find anything, so let's try a recessive model, which is not the first thing that people usually jump to and then that didn't work. Well, let's try a stratified model. That didn't work, et cetera, et cetera. And there should be consistent findings in a study that is more epidemiologically sound than some of the ones that we heard about previously. Finding should be consistent overall and within key subgroups and consistent across same or very highly similar phenotypes. We wondered, you know, an initial study by itself, one of the challenges of requiring replication is that some phenotypes, it may be very difficult to find additional samples to replicate. You also don't want to hold up, you know, progress. You don't want people to split their samples so that they say, okay, we had a 1,000-person study, we split it in half, and we have 500 and 500, neither of which have adequate power. So, and we did recognize that if there were multiple studies that were sort of showing the same thing, that's probably stronger than a single study that split multiple waves. And what do you do with particular phenotypes or studies that don't have an option for replication? One of the examples, I believe it was glioblastoma or one of the childhood leukemias, you know, they had all of the cases in the world that were known of in one genome-wide association study. Well, you're not gonna be able to replicate that. So there may be other approaches. Certainly, we have other tools in the toolbox, as it were. And there was a strong feeling that we shouldn't change the standards for definitiveness of the various false discovery or replication, false positive control rates approaches that we've mentioned. Clinical trials were another one where we thought, gee, you know, if you find an association with a bad outcome, you don't want to repeat the clinical trial to find the bad outcome, that's really pretty unethical. So I think the bottom line there was, don't just rely on genome-wide association, there are lots of other approaches for identifying and understanding associations. And we may need to have different standards for finding some major clinical significance, particularly, you know, if you're looking at an adverse effect that you really can't try to replicate. We debated a lot whether there should be, you know, a specific number promulgated for a significance level and everybody agreed no, but in general, you know, the confidence is better, you know, it's higher, the smaller the p-value is. And there was general agreement that that range should be pretty broad and perhaps a higher threshold for a phenotype that's difficult to measure where you might have a lot of noise in the phenotype. So you'd like to be sure you're not also getting noise in your genome association. There was a, you know, sort of a beware of the very smallest associations. These may well be genotyping error and in many cases in gain we found that that was the case. But once you correct for those, you should be all right. As I mentioned before, if the significance depends on a particularly funky analysis method or some strange multiple comparison correction, beware, if it depends on the phenotype definition being very, very specific, you know, BMI over 30 versus BMI continuous or some other such thing, you know, beware of that finding, what's often done is what's called permutation, randomizing the phenotypes so that, you know, just sort of assign a random case control designation to the same genotyping data and see what kinds of spurious or known to be random associations you get, that gives you sort of an expected level of association and that's a reasonable approach to take. And probably a good warning to use biologic information sort of our knowledge of pathways and that maybe a priori in coming up with hypotheses but not a posteriori after you already have the data because you can pretty much, you know, the genome and the human organism is so complex you can pretty much come up with any story to justify any findings. Somebody showed a picture of Rudyard Kipling's just so stories and said, you know, this is what you end up with. And we've talked before about genotyping quality and we emphasize this in the working group reporting results of known study sample duplicates of known standard duplicates replicating a small number of SNPs on another platform and strong caveats regarding the fallibility of genotyping. So our criteria and they're listed there, I won't read them all over, but criteria for positive replication in terms of sample size, the same or similar trait, same population or a very similar one. And it is helpful to sort of expand beyond your population, we'll look at some examples of that to increase sort of your confidence in a finding and also to show that related phenotypes have a similar finding. And you'd like the same model, the same gene, the same SNP, a highly significant association. We also thought that we should perhaps define a meaningful negativity as it were. So if you really want to say that you are not, you have not replicated this finding, it is just not there. Like those four papers I showed you for, for the INSIG-2, many of the characteristics should be similar, but it should be an identical trait and an identical population to really claim that you've got no replication. And you should be powered for the appropriate effect size, taking into account this potential for the winner's curse. It's unclear how best to do that, except to just assume that whatever the initial report is, the true magnitude of the odds ratio that you're looking for is gonna be less and how much less you can look at the confidence interval perhaps or other approaches. Okay, whoops. And then the value of replicating studies and samples of different ancestor origin can actually tell you something about what SNPs might be operative because shorter LD blocks in other, in samples particularly of recent African ancestry may sort of disjoin two SNPs that were otherwise traveling together in your initial association study. And allele frequency differences may also give you some hints about lack of replication. Something to be aware of is that most everyone now in genome-wide association follows this approach of joint analysis of the initial, the stage one and stage two studies. This is published by Skoll et al. from Benke's group about a year ago, two years ago now. And it made a very strong case for not separating out your stage one and stage two analyses and analyzing them separately, but actually putting them back together, correcting that for the multiple comparisons problem that it adds, it's nowhere near a million SNP problem and analyzing them together. And they give some nice estimates of the power that you get with this approach. So it's a useful thing to be aware of. So yeah, stage two data considered alone. So they recommend a joint analysis. So I think I'll go now into sort of how one maps or narrows an association interval. This is entering the middle. Well, this is just going from bad to worse, middle of nowhere. So, but hopefully we're somewhere because we're in the genome. So the flow of investigation from genome-wide association to actual clinical translation is a challenging area. We consider this after the initial genome-wide association and replication studies, probably a little bit more replication and fine mapping than moving on to sequencing and genotyping in larger cohorts than functional studies to try to understand what the variant actually does. Translational studies, we haven't quite figured out how to do this and Tom is going to explain that all in lecture eight. But it is a challenge, it's probably as with all translation, the toughest thing. So how did we used to go about this before we had genome-wide association studies? We tend to take an interval, usually an interval of linkage. This is an MI study from Helga Daughter at all in the Icelandic group. And they found this linkage peak on chromosome 13. And what you often would do is look at sort of your top or your peak association, your linkage signal, and then kind of drop down one LOD and one LOD score interval and then see where that kind of hits you on your chromosome and go across here. So you're dropping down to 1.5 on either side. Here's your chromosome all lined up and you're basically taking this interval. And that's what they did, it was 13 centimorgans. And what they then did, taking this 13 centimorgans dropping down, they then salted this area with a large number of SNPs that were not on their original genotyping platform and what they had done was a microsatellite study. But these are SNPs actually and there are lots more of them than there were in the initial study. And shown here are the associations they saw. These were actually testing haplotypes. And so here was the most significant finding. It was actually sex stratified. And there are various markers. So the single markers are shown here in black and then a two marker haplotype because this is a very densely genotyped area are shown in black here, I believe. Three marker haplotypes are in blue, et cetera. And I think what you can get from this is that everything really is kind of lining up right in this area. And there just happens to be a really good candidate gene here, ALOX5AP. There may have been other areas of the genome that had equally good linkage signals that didn't have good candidates that they didn't pursue. So recognizing that false positives and publication bias may be a problem here. And then what they did was to take this particular gene which the exons of which are shown here and sequence it to try to find other variants that perhaps they were tagging. They didn't realize it was sort of hidden in that region and trying to find a causative signal. And then went forward with some functional studies. But that's basically how one would do it in the linkage realm. The sequencing is an important step because you are looking for something that may not have been identified on a platform previously and might be rare but might actually be accounting for your signal. And what's tended to happen is that many investigators have done this in many different regions and sometimes in the same region. And so we're getting a lot of sequencing data out there that doesn't always end up into the common databases. And we'd very much like to see that the sequencing available for the wider scientific community. So because of that about a year ago, this thousand genomes project started. It's called a deep catalog of human genetic variation. And what it's designed to do was initially in a thousand people sequence each of them in a way that essentially you had each part of the genome covered at least twice. So a sequence at least twice. The way sequencing work is that you take a chunk of DNA and you run through it and then you kind of assemble it, match it all up together. Those chunks can be very long in the early days and very short in the more recent sequencing approaches. But basically to finish a sequence to be sure that you're confident in it, you wanna go over it like 20 times or so. This is only going over it two times initially and actually sequencing has gotten better. So they're actually gonna do four times now and I'm sure when we meet a year from now they'll be doing it eight times, but at any rate. And if you wanna learn more about this, they have a website, the thousandgenomes.org tells you a little bit more about that. So in looking at populations of different ancestry, you can actually learn a fair amount, particularly if you use one that has lower LD than the population that you started with. And so shown here is eight SNPs in the TCF7L2 region. That's that gene that's strongly associated with diabetes. Here are the SNPs sort of down the, this is the order in which they occur on the chromosome and then the correlations of them just with their nicknames up on top. And what you can see here are the LD, the R-squared values between each of them. So 2906 is associated with 20271 at a 0.56 level of R-squared. So that's in the Icelandic population that's shown below the diagonal is in the African population. And it's a little, you can see that there are some lower numbers here than there are up here, which you would expect because we know that African populations have less LD or shorter blocks of LD than European populations. But you remember our trick of sort of coloring the LD with various shades. And if you do that here, it really kind of stands out that in European populations, you have fairly dense LD in African populations, you have much less so. And actually if you were to look in this area, SNPs 6992 associated with these three SNPs, if you look at that in the African population 6992, you have much lower LD, essentially none in these regions. So basically you'd be able to distinguish between 6992 and these three other SNPs in an African population if you were to type all of those. And it might give you a hint as to what is actually associated with the disease. Recognizing that there's a lot different between West African and Icelandic populations than just their DNA in terms of environmental factors and other things. But this is at least a way with DNA sequence that you can maybe get down to a narrower region. And then looking at other ways of kind of trying to figure out, is it just a single SNP within a region or are some of these SNPs kind of interrelated? This was done by the California Group, the multi-ethnic cohort, Haman et al, who looked at the AQ24 region and said, we have this actually a number of associations and here are the associations P-value shown here and this is the chromosome lined up and here you clearly have two different groups of SNPs. Here's our strongest SNP and so what we're going to do is adjust all of our associations for this particular SNP. And when they did that, you notice all of these guys kind of fell down to the bottom here, but these remained untouched because they were independent of those. So here was the next strongest SNP and they adjusted for that. And you notice all of these kind of fell down and here's your next strong SNP. And they did this actually through about five different steps and suggested that there are at least five regions of AQ24 that may be independently associated with prostate cancer. So this is another approach for sort of parsing out which SNPs are contributing. Wanted to talk a little bit about the CDKN2A and 2B region. So this is the association shown in coronary disease in McPherson study that was published simultaneously with an Icelandic study and you can see these two genes here's your association signal and these genes are quite nearby. In this diabetes study from the Welcome Trust group, you're now all pros at looking at these. This is a D prime plot. You can see here, it's actually D. And this is an R squared plot. And again, here's your linkage disequilibrium block and the associations that they saw and there by Gali are CDKN2A and 2B now associated with diabetes. Similar kind of study now from the Icelandic group again, aortic and intracranial aneurysm. And again, here are these two genes and the linkage blocks. These are areas of recombination, recombination hotspots that they tend to show that way. And a nice sort of way of showing this, again from the Welcome Trust group where they showed their associations here. This is the association region and the black points are the SNPs. They actually genotype. The gray ones are those that they imputed. Remember I mentioned before that sometimes you can use linkage disequilibrium information around a SNP to kind of look at neighboring SNPs and kind of guess what the one right next to it is. And there are a variety of methods for doing this. We're actually testing them and gain and comparing them. But they did that and came up with imputed signals as well as genotype signals. What they've shown here is in red, the recombination rate across the human genome calculated from HapMap. And you can see that this region is pretty much bounded by these two very hot spots of recombination. And this is kind of genetic distance plotted along in this purple line here. So genetic distance is very, very low for this region where there's very little recombination. It's much higher as recombination increases. And then they also showed, let's see. Yeah, sorry, the genes, this is the genes in the same region. And the genes in the positive strand are shown here. Oops, sorry. And this is the CDKN2A and 2B. And this is a conservation score. It's basically looking at areas of the genome that are very similar across different genomes, mammals and other things. And I'll give you a better example of that in a second. So this is a way, one of the nice things about DNA being sort of a linear molecule is that you can put lots of tracks on it. And when you look at the browsers and things that Tom mentioned to you in the first lecture, you can actually sort of turn these things on and turn them off and get different information about that particular stretch of DNA. Okay, moving on a little bit into functional studies, correlating SNPs with sort of logical intermediate phenotypes. And I'll look at three examples, or at least three examples I can give you of the CDK1 association with diabetes, this association with chromosome 8Q24, which 8Q24 not only associated with lots of cancers, but also type 2 diabetes, but different SNPs. And then there's a non-synonymous arginine to tryptophan change in the solid carrier 4, member 8, SLC30A8, which is a diabetes association. And this one in particular is specific to the pancreas and expressed in beta cells. So they were very excited when they found of the various associations in type 2 diabetes. This particular one is expressed in beta cells. And so that's an expression study that's sort of supporting the association. And that first example that I mentioned, again, was from the decode group. And here they show the CDK1 association with insulin secretion, because it would make sense if insulin secretion is lower in people with the variants. Maybe they're at higher risk for diabetes. And they did show that, in fact, you had associations with these variants in all participants, males and females, with lower insulin secretion in the GG variant. Similarly, for the SLC30A8, it was also associated with insulin secretion now in more of a sort of a codominant association that's looking a little bit more recessive. So that's one way that one can kind of look at what the function of some of these SNPs might be is to relate them to kind of intermediate phenotypes. This was also done by, again, the same group, looking now in a study of myocardial infarction. They had implicated this leukotriene B4 gene in the ALOX5AP study that I showed just part of the same mechanism. And here they show that there are actually lower levels of the LTB4, sorry, higher levels in the MI cases than in the controls, and they made a case for some biologic plausibility of that, and actually showed that their causative haplotype actually had much higher levels than in the cases than in the controls, which, again, sort of supported their finding. So the problem with this is that you can find many, many different protein products that might be associated with your gene and have the same problem of false positives. But this is one way that people have looked at a gene function to be able to sort of implicate a given SNP in the cause of disease. Another way is to co-localize the gene product in the histopathologic findings of the particular disease. I'll give you two examples of complement factor H in GAB, GAB2. So this is a nice example of complement deposition in the affected retina, again, from the client paper. This particular diagram, which I found much better than what ended up in the main paper, this is in the supplementary information. And what it shows is complement deposition, which is this black stuff, in Brooks membrane, which is part of the retina that is affected by macular degeneration, and you can see that you have it in both these cases. Deposition is also in a choroidal artery, so here's the artery that feeds the retina, and you can see this black deposition here, and as well as in the choroidal vein, and here you see it again in another specimen here. And then deposition in the drusen, which is here, and drusen's are sort of the pathopathemonic feature in the retina of macular degeneration. Again, used to be thought to be, they're probably not terribly important. Now it really looks like they seem to be, you know, what's related to this disease. And here you have your complement deposition. So when CFH came up as being this very strong signal in macular degeneration, everyone kind of said, that can't be, we know that macular degeneration is ischemic, it's not inflammatory, so this must be spurious, and they said, hey, wait, you know, we see the complement is deposited in the areas related to macular degeneration. Oh, okay, maybe it's possible. And similarly with the GAB2 locus that I showed you in April E4 carriers, what they then showed was they looked at dystrophic neurons in brains of people with late onset Alzheimer's disease. And here's your dystrophic neuron with the arrow and then the little arrow heads are the neurites, and you can see co-localization. This red stuff is the GAB2. And a tangle-containing neuron, which is here, and dystrophic neurons, all of which sort of light up with this protein. And then a tangle-bearing neuron, open arrow, which is this guy here, and immunoreactive structures resembling neurons, neurites, and so they're also lighting up with this particular protein. And this last one is a neuron that's really highly dystrophic, a tangle-like inclusion, and it also is lighting up. And I don't show you the control samples, but they basically would be this background level and not the highlighted area. So all of this is immunoreactive. Again, a suggestion that perhaps this is playing a role in the disease, but how it's playing, one doesn't know. One can also look at gene conservation and expression. This was done in asthma, in the ORM DL3, and I'm sorry that these didn't replicate terribly well in your handouts. But basically what these authors found moffet at all, looking at childhood asthma, found this was part of their association plot. They then expanded this particular area here, and here are their association statistics shown there. And this is, again, something that you're quite familiar with. So this particular region here the strongest association seen here and the LD plot of that particular area. And then they did something that was quite novel and very clever. They pointed out that this area, this is from the UCSC Genome Browser, and you just basically expand on that particular area and you can see there are a whole bunch of genes here. So this is the ones that I've listed are shown here. There actually are 19 genes in this region. They had expression information for 14 of them from a variety of databases. And they also looked at conservation information. So these are conservation tracks. All of these places are where humans and chimps are homologous when you see a green thing here. Humans and rhesus are homologous. This is areas where they haven't really been typed. But one can calculate a score for how conserved this region is. That means how constant is it across species. And you can see that in particular this gene really looks pretty good in terms of being in that area that's highly conserved. So it may be evolutionarily very important. And then they went and looked at expression and demonstrated that expression was much stronger in immune cells, which immune cells are thought to be related to asthma. And so they show that it's indeed expressed in tissues that you'd expect it to be expressed in. But then sort of the sine qua non was to relate the various snips that they had found associated in the genome wide study with expression levels of the genes in that area and demonstrated if you just look at the blue lines a strong association between their snips and expression of this ORMDL3 gene and the red lines for the controls. Not a lot of difference between cases and controls but certainly differences by genotypes. And we're able then to say we think on the basis of the expression and the conservation data that it probably is this gene that's doing it. But again, much more research is needed. But at least that's sort of getting a little bit at function. And then in terms of knock down and knock out studies these are experimental studies where say you decide the ORMDL3 gene is the one that's really doing the deed. You'd like to see what happens when you remove it from an organism, it's hard to do that. And that's what a knock out model is. And more recently, knock down of these genes which is using small interfering RNAs the Nobel Prize was won for a couple of years ago to basically interfere with the transcription of an RNA. But what it does is bind to a messenger RNA and prevents it from being, sorry translated not transcribed but translated into a protein and so you can reduce the expression of that particular gene without having to destroy it in embryogenesis and that often is lethal and it can cause lots of other problems. So I'll give you a couple of examples knock down of ATG16L1, GAB2, and MLXIPL in a knock out model. And just to mention this study this is Ryu et al looking at Crohn's disease and here you see a very strong association. This was with CARD-15 that caspase recruiting domain 15 gene that had been identified actually in family studies and I'll go through a few slides on that study because it's very interesting. IL-23R interleukin-23R was the second most strongly associated region. Both of these were known when Ryu started their study so they focused on this strongest signal that they found in chromosome two. But just kind of looking a little bit at this CARD-15 and how this was initially, oh sorry and these are sort of the stats on this particular association in this gene in this region it was in exon-8 actually and caused a non synonymous mutation. So probably this is one that you certainly want to look at it may not be the causative variant but it's certainly one that you want to look at. The CARD-15 association was really quite interesting. It was also called IBD-1 initially because it was identified in family studies and when you start to identify things in family studies they would tend to get named for the disease and then a one if it was the first one that was found. So BRCA-1 was breast cancer, BRCA-1 and BRCA-2. IBD-1 was inflammatory disease, inflammatory bowel disease one and NIDDM-1 you've heard of and that sort of thing. But anyway IBD-1 then eventually was named CARD-15 and what they did was basically this linkage study and again your one-law drop read well roughly a one-law drop. They tested, I asked Shilta Masa who was involved in this why they only went to here and didn't go all the way out to there because you can see this goes out quite a ways and he said we didn't have the money so we stopped here. But at any rate they added these particular markers looking in this region for associations and then found a highly significant association here and what they did at this point was then to sequence this region and try to find as many variants as they could and people, so this is where sequencing comes in found all of these different SNPs and here's the candidate gene right in here. What was neat about this and this is hard to see but it's really an interesting study was in this gene these are the various parts of this gene these are the various sequence variants that they found and what they were able to do was a basically a functional study and in vitro study so you didn't need to knock out each one or change each one of these SNPs in an organism. But they basically showed they had a functional study for NF-Kappa B activation which is something that happens when bacteria invade. It's thought that inflammatory bowel disease is due to an inability to respond appropriately to gut bacteria and that they evoke an inflammatory response that is usually suppressed in most of us but in people who have this disease they're unable to suppress it. Part of that lack of suppression is not suppressing your NF-Kappa B activation and you can see all of these sort of dark red bars are places where NF-Kappa B is there's decreased activation of it and these particular SNPs that they identified then had even stronger activation of it particularly after stimulation. So it's a complex functional study and I don't have a lot of time to go through it but I think it gives you a feel for one way that if you can identify a functional study and then look at the relationships among all of the different SNPs you've found which SNPs might be particularly important and these were thought to be on the basis of this study. Okay. So getting back to Ryu what they did was they said, okay we'll give you card 15, we'll give you interleukin 23 but let's look at this chromosome two region and here's the SNP that they've found associated at 10 to the minus eighth. It was a non-synonymous amino acid change in exon eight of the autophagy related 16 like one. These genes have very complex names at times ATG 16L1 and what's interesting about autophagy is that it's a process that's involved in degradation of sort of dying cell organelles. It's also kind of part of apoptosis when a cell dies itself but instead of the whole cell dying right now it's just trying to get rid of cellular organelles like the ones that encapsulate bacteria that are invading and then try to get rid of those and it's also involved in the inflammatory response. So made a lot of sense in terms of Crohn's disease and the reaction to intestinal bacteria and what they did then was look at whether the expression is different or is higher in immune cells. You would expect it to be if this is an immune response and indeed it was so that made sense biologically and then they tried knocking down the endogenous ATG with interference, small interfering RNA. They proved that they could do it. That's all that this shows is that basically they were able to knock it down with their interfering particularly this particular RNA interference and then they looked at whether bacteria that were absorbed per cell after this knockdown had occurred did that decrease and it did by quite a bit. So here you decrease your transcripts by 89% and it also prevents encapsulation also by chance by 89% as well. So this is a pretty good functional study that suggests that this may actually be playing a role in this particular disease and then sorry. So this is a way of looking at functional measures and all of the functional measures are different because obviously cellular functions are different, diseases are different and what you might be interested in would differ. So this is a similar example now looking at a knockdown of the GAB2 protein gene, sorry that I mentioned earlier in relationship to Alzheimer's disease. And what they show here is here's your control. They've knocked down GAB2, S-I-R-N-A and there's a 1.73 fold decrease in tau phosphorylation which is important in the development of Alzheimer disease without changing total tau. So phosphorylation, you'd like to have much more of it and yet you're not changing the total amount of tau. So this is just the control here. Again, another functional sort of histopathologic way of looking at this. And then just one knockout model, not doing as much in knockouts now even though there are good knockout mouse models of many, many genes, but this one in particular was looking at the SREBP gene that had been implicated in associations with triglyceride levels as a quantitative trait and shown here is what happens to your triglycerides when you knock this out in a mouse compared to what's called the wild type in mice as well. So, sorry, that was a knock in. So this knock in was showing that it increases and a knock out you show that it decreases. So just to sum up then, what life after genome wide association you're looking for a putative causal variant. You can narrow the region using fine mapping or sequencing. You can look at the structure of the association region with nearby genes or conservation. You can look at associations with the protein product as a co-localized with histopathologic changes is it associated with expression levels and can you get a reasonable phenotype in knock in or knock out models? And I think that's it.