 Good afternoon. This is Greg Fero, Chief of the Genomic Healthcare Branch at the National Human Genome Research Institute. I'd like to welcome all of you to the third webinar in NHGRI's webinar series. Today we'll be talking about genome-wide association studies, describing the latest on genome-wide association study results, and what can they tell us about genomics and health. Today we'll be hearing from Terry Minalio, and after that we'll be taking questions from you. The questions will be taken over the phone, and to access the system, dial Star Juan to speak to the operator, and you'll be put in the queue for the questions. It's now our pleasure to welcome Dr. Terry Minalio, Director of the Office of Population Genomics here at the NHGRI. She is currently Senior Advisor to the Directorate at NHGRI for Population Genomics. She's been involved deeply in large-scale cohort studies, such as the Cardiovascular Health Study and the Framingham Heart Study. She joined the NHGRI in 2005 and leads efforts in applying genomic technologies to population research, including the Genetic Association Information Network Gain and the Genes and Environment Initiative, GEI. Dr. Minalio, I will put your slides up shortly. Super, thank you. And I'm glad that everyone was able finally to join again. Our apologies for the delay. Because we've lost a few minutes, I may skip over a couple of slides. I hope that doesn't disturb anyone. And there may be someone who's breathing a bit heavily on the phone. If you could just hit your mute. That would be grand. So moving on then, if you've seen my first slide, to talk about these being interesting times for doing genome-wide association studies and really looking at the genome in general. You're probably familiar with Robert Kennedy's quote, may he live in interesting times? Like it or not, we live in interesting times, which is actually part of a speech he gave in Cape Town in 1966, well worth reading. There are two other parts to that proverb that I'll just kind of skip over here for the time being. And actually if one were to look at the associations that were known through any kind of really genetic studies, there were maybe six or seven of them prior to 2005. And those were of, there were some questions as to how strong those were. There were many that had been reported, but these six or seven were sort of pretty solid. But just looking at what has been learned in genome-wide association since 2005, you should be seeing a slide that shows the entire genome. Now on chromosome one at the bottom of it, complement factor H related to age-related macular degeneration, and that was reported in March, I believe, of 2005. And then really nothing much more, nothing more until late in 2006 when there were three more associations as shown here, 2007 things started really to pick up and as time went by we really have sort of filled out the genome dramatically to the point where we're almost near asking people to stop working on chromosomes one and six because there isn't any more room on the graph to show them. But this work has really led 2007 to be called the Year of Genome-Wide Association Studies because much of the work really kind of took off in 2007. This is a paper by In Science at the end of that year. And just shown here are all of the diseases and traits that have published genome-wide association studies done. We keep sort of running catalog of these and there are over 75 of them as of a couple of days ago. So it's really going very, very rapidly. This has been referred to by Hunter and Kraft from Harvard as drinking from the fire hose and trying to talk about the massive amounts of data that are coming out of these studies. They point out that there have been few, if any, similar bursts of discovery really in the history of medical research and I think most would agree with that in terms of the number and rapidity with which findings have been reported. So what is a genome-wide association study? It's basically a way for interrogating all of the 10 million variable sites across the genome. So we have three billion spots in our genome, letters in the spelling of our DNA. And about 10 million of those differ between any two individuals. This variation is inherited in groups or blocks. So you don't have to test all 10 million points. You can test maybe a subset of those and then infer what the other 9 million or whatever are. The blocks are shorter, so you have to test more points the less closely the people are related. So when we started doing family studies, they have very close relationships and so you might only have needed 400 or 500 markers. But technology now allows us to study unrelated people, assuming that they're much shorter base pair lengths in common, so you need many more markers. This is just a stretch of DNA on chromosome 7, and as you can see at the top, you know, we're all really pretty similar in 99.9% of the genome. But every now and then there will be one that sort of pops up, like this C over A here, where some people have a C and some have an A up in that upper left-hand corner. And then you go on, everybody's the same for a while, and then there's a C or a T, etc. And you have these single nucleotide polymorphisms, about one every 300 bases or so. This is a nice figure from a review by Christensen and Murray last year that basically took an example chromosome, just sort of this cartoon up at the top. And then from there took an example gene, essentially, in that sort of second middle bar that shows various SNPs. Some of them are in exons, which are the red sections of that gene. Some of them are in introns, which are the white sections of that gene. There tend to be a few more in the introns than there are in exons, perhaps, because they're better tolerated in introns than exons. And then you see this sort of triangle-shaped diagram toward the bottom. These tend to throw people, but really what these are is just the relationship among each of these SNPs, each to each other. And this is essentially a matrix, and we've all been looking at matrices like these for years and years, maybe without realizing it. When you ask the AAA for a roadmap and a set of, sorry, here's another example of one on chromosome nine, a little bit more extended, and we'll come back to this one in a second. So you ask the AAA for a map of the East Coast, and they'll tell you that from driving from Boston to Providence to 59 miles, and from Boston to New York is 210 miles, and from Providence to New York is 152 miles, et cetera. That's the same sort of matrix as we're looking at with these SNPs. And if you wanted to color code these and say that the, you know, the distances that were really close, less than 100 miles were dark red. And those that were much further, say, more than 400 miles were white, you could do that, and you could sort of overlay those colors on this matrix here. And if you kind of turned it on its side and made it into squares, you basically have the same thing that you're looking at with a linkage diagram. And that's all that we're looking at when you see this dark red between two SNPs. It's just, you know, if you look at SNP three and four in that diagram, that's just very much like Boston to Providence, essentially. So because of this one tag SNP, or a SNP that sort of stands up for several that it's strongly related to, can really serve as a proxy for many of them. And shown here is a stretch of DNA on two chromosomes from, say, one individual. And then the same stretch from another individual's two chromosomes and then another individual's two chromosomes. And as you can see, this first SNP here in blue, SNP number three, can either be a G or a C, depending on which chromosome you're looking at. And SNP four in gold right next to it actually moves pretty much in concert with it. So when SNP three is a G, SNP four is an A. And every time there's a G at SNP three, there's an A at SNP four. And likewise, when SNP three is a C, there's a G at SNP four. SNP five, on the other hand, in bright green, does not always move together with SNP three and four. So sometimes when SNP five is a G, there's an A in SNP four. Sometimes when SNP five is a G, there's a G in SNP four and so on. SNP two, I'd just take my word for it if you don't want to check them all. But it's also exactly correlated with SNPs three and four, and so is SNP one, again, just in this cartoon. And these four SNPs could be said to move as a block. So these are what are often known as a haplotype block. Haplotype just being a string of SNPs through the same flavor along one stretch of the genome. SNP five has a SNP next to it, SNP six, with which it is in perfect correlation, also called linkage disequilibrium, which is kind of an awful name, but that's so be it. That's what it's called. And then SNP seven in light blue here, and those three form another block. And then there's this SNP sort of in brown on the side that kind of moves by itself. So if we were to take out the SNPs in between here and just focus on the places where people differ between chromosomes, you could see that for block one, you could measure any of these four SNPs and still get all of the information if you had measured all of them. So you might just pick one of them. You could pick the one with the prettiest colors I've done, or you could just probably pick the one that's either cheapest or most easy to type. And you could also pick any one of block two, and then the singleton on block three, and you measure three SNPs instead of probably 1,000 or 10,000 or so to be able to get all the information that you would from all those different SNPs. And this just shows how these kind of break up into haplotypes, and very often there are just a few haplotypes that are very common as these top three are, and sometimes then there are others that are much rarer. So coming up with these blocks and the way that the SNPs travel together in the genome was the whole purpose of the Habitite Map, and the HapMap project published its first paper in 2005 that summarized over a million SNPs, I believe. And then in 2007, there was a follow-up paper that reported over three million SNPs, and there will be multiple follow-up papers after that as well. The goals of the HapMap were to use just the density of SNPs that you needed to find associations between the SNPs and the diseases, and we'll talk about how one does that, and trying not to miss regions that had disease associations, but to produce a tool that would help in finding genes that affect health and disease, and recognizing that one needs to use SNPs for more complete genome coverage, you need more SNPs, sorry, for complete genome coverage of populations, particularly populations of African ancestry, recent African ancestry, since we're all of African ancestry, but that's because those populations are older and there's been more time for the relationships between the SNPs to break up, so you need to measure more of them. Along with the HapMap and probably stimulated by it, genotyping technology has improved dramatically and the costs have gone way down. So in 2001, as a slide from my colleague, Steven Chanick shows, we thought we were getting a really good deal if we got a genotype done by ABI's Techman method for a cost of about a dollar, you can see the cost along the Y axis there in cents per genotype, and those costs have come down really almost linearly into 2005 as shown here with various different platforms also typing more and more SNPs, and this continued, this slide is now two years old, but the same trends continue where the costs have just fallen, fallen, fallen, and the numbers of SNPs on the platforms have increased as well, and this has allowed us then to do these kinds of studies. So what is it exactly that you test when you're doing this? Well, say you have a bunch of people who have had a myocardial infarction or heart attack and a bunch of people who haven't, and you'd like to know how they differ, and in traditional epidemiology, you would look at things like their weight or their smoking history or as time went by, their cholesterol levels or their blood pressure, et cetera. Well, one can do the same thing with genetic factors and just ask, you know, is a particular gene or SNP, in this case, RS1333049, as shown at the top here, whether the different forms of that SNP are associated with being a case of myocardial infarction or control without having myocardial infarction. And as you can see, the C allele of this particular SNP is more common in the cases, 55% of the cases have that SNP, compared to only 47, so that suggests, sorry, I have that allele, rather than the controls. So that actually one can do a statistical test on it called the chi-square test and estimate how likely it is that you would get, you would see that extreme value of a chi-square if there was actually absolutely no association and you just saw that by chance. And if this was just due to chance alone, it would be a very unlikely thing to have happen. It would happen only once in 10 to the minus 13th times. So much fewer than a billion billion times would you ever see a result this extreme. And odds ratio is sort of the risk associated with that. So people who happen to carry this allele are about 1.38 or times more likely to have a heart attack than the people who don't carry this allele or 38% more likely to have a heart attack. One could also look at this by genotype because each of us carries two copies of almost every variant in the body, except for men who are missing some of those on the X chromosome, because they only carry one X chromosome. But in looking at the genotypes for this particular SNP, you can also see that the cases, 31% of the cases have the CC genotype at this SNP compared to only 23% of the controls. And then looking at the GG, the heterozygotes are about the same, but the GG genotype is much more common in the controls than in the cases. And again, one can calculate a chi-square value and a probability associated with that. And then the heterozygotes odds ratio would be what is the, basically the odds on having disease if you carry one copy of the variant compared to carrying no copies. And that's 1.47. And then for the homozygotes, it's 1.90, which means you're nearly twice as likely. So the challenge with these studies is that you basically are doing this same test 100,000 or 500,000 or a million times. And the challenges in interpreting that massive data are what make genome-wide associations so interesting. So shown here is the very first truly genome-wide study, this client study that I had mentioned in looking at macular degeneration that was published in 2005. And they tested 100,000 SNPs and they set a level, because they were looking at so many SNPs, they said we have to sort of control for the fact that if we just looked at things that happened one in 20 times would be an unusual occurrence, you're gonna see an awful lot of those things and those would be false positives. So one would want to set a very sort of stringent level, we'd only want to see something that might happen by chance one in a million times, or in this case 4.8 in 10 million times, in order to be concerned that it might actually be an unusual occurrence. And that was where that arrow is on the slide here is chromosome one, because he was just lined up along from the chromosome, the beginning to the end of the genome, essentially chromosome one to the X chromosome. And there was a very strong association. There's another association that's plotted along with basically the height of this line here. And you can see around the middle of the plot, there's another association that's almost as strong as that one. It turned out that that one was a genotyping error. And when they went back and looked at it very carefully, it was decided not to be a true association. And this can be a problem with these studies. You can make these, show these in all kinds of different fancy colors. Here's a red one looking at nicotine dependence. And again, the height of the points here just shows how strong the association is, how unlikely it is to be due to chance, essentially. This is a nice multicolored one of diabetes. There's one in gray here that shows each of the chromosomes sort of separated out for you and in red, things that really kind of popped out and were strongly related. And here, a blue one, this one has multiple diseases. So this was a very extensive study of seven different common diseases and they showed all of their associations in one plot. They like to call it the 10 million pound plot. But anyway, one way they're sort of falling from the sky. This one was done over a Christmas time and that was sort of what they had on their mind. But if one looks a little more closely at one of these associations, and this is one again that I mentioned previously for myocardial infarction, you can see that in blue here, there's an area that shows really very strong association all the way up to 10 to the minus 14th. So one in 10 to the 14th chance that this could have happened by chance alone. And that was that snip that I showed you before. One can take this area on chromosome nine and sort of stretch it out and that's this area here that I'm just highlighting. And if you sort of stretch it out, this is the same region. And it's now just looking at chromosome nine and just focusing on the blue dots. The red dots were a replication sample. But this was the finding that was reported by these authors and it's in chromosome nine. And then one can look again at our old friends, the red triangles and looking for how the snips that have been tested in this particular study are related to each other. Do they travel together or don't they? And as you can see from that middle panel where you remember the really dark things were the Boston to Providence ones. So those are ones that travel very closely together. And there are a number in the, say the left hand side of this ellipse or maybe that 10 of them or so they're kind of clumped in that region. And they seem to be in this group of triangles that's labeled one in this triangle, it's labeled one, which is one kind of linkage block, a block that moves together. So those are probably, among those you might not need to test all of them, although these authors did, but there are other places within this ellipse that are not in that linkage block. And so you'd want to test those other areas as well. And sometimes these linkage plots can tell you a lot about what might be the causative gene. So in this plot looking at inflammatory bowel disease, in the middle you can see again these association statistics and you see there's sort of a mountain of them around the 10 to the 10th to the 10 to the 12th p-value, minus log 10 p-value level, right over the x-axis that says 674, 000, 000. And in this region there are actually three genes. You can see that there's this IL-12 RB2, the IL-23R, and a hypothetical protein. And all three of these might be possibilities as being related to this disease. But if one looks at the linkage patterns, you can see that these darker triangles now just shown in black and gray here, they're really only about two blocks that are strongly associated with the disease. And those pretty much narrow you in to looking at this interleukin-23 receptor. So that's how those can sort of help point the way to a particular gene that might be causing the disease. Unique aspects of these studies, they really allow examination of inherited variability at an unprecedented level of resolution. And they allow you to look at the genome really without having prior hypotheses because we know so little about how the genome functions. It's in some ways it may be better just to say, let's set aside all our previous notions and just look and see what we find. And it's amazing what we have found. For example, and as another sort of positive to this, so once you measure the genome in this way, you can really relate it to any trait that is consistent with the informed consent that's been provided by participants. So interestingly, most of the really strong associations that have been replicated a lot in these kinds of studies have not been with genes that anyone would have suspected of being associated with the disease in question. So they weren't really on anybody's list of things that probably would be associated. And so they would have been missed in prior studies where you had to rely on a path cycle. And some associations have been in regions that weren't even known to harbor genes and no one's quite sure what that means. And that's an area of very active research right now. But as Hunter and Kraft point out, the chief strength of this approach is also its chief problem because when you make more than 500,000 comparisons per study, the potential for false positives is really unprecedented. I'm a big Gary Larson fan. This is a God callings. I hate to start a Monday with a case like this, an annual butlers of the World Bank with a knife sticking out at one of the butlers. And God knows who all these false positives are there along with the possible true positive. And so something that's been recognized for a long time in genetic studies is that false positives are really quite positive. And this sort of now classic review by Joel Hirshhorn pointed out the large number of genetic associations that had been reported with diseases. And you can see that climbing really dramatically after about 1994. But in looking at the 600 or so studies that he reviewed, there are really only six of the associations were significant in a consistent way in more than three quarters of the studies he looked at. And these are the six on here. So this was not a very good record. It was really something that was quite concerning to people. We did much of the same thing in atherosclerosis, but I won't go over this in due to time. And this led to calls among editors and journals and publishers for replication. That probably the most important way to be sure that an association was real was to demonstrate that it had been replicated elsewhere. There weren't really good criteria for what constituted replication. So there was a lot of discussion about that. And we ended up having a workshop here with our colleagues at the Cancer Institute to come up with a series of criteria, essentially, for what truly is replication and what the criteria for it should be. And we all, I think, agree that replication is probably the three most important things in confirming a genome-wide association. But it was important that the initial study be described in sufficient detail so that you could even try to replicate it because you needed to know where the cases and controls came from so you could have similar kinds of cases and controls. You needed to know things about participation rates and how they were selected into the study and how effective status or case status was defined and a number of other things shown here. And then in the replication study, you wanted to be sure that a similar population, if not exactly the same population, had been used that the phenotype was very similar. So they weren't studying height in one study and weight in another, but really using much the same phenotype and that they used the same sort of inheritance model, the same SNP, the same direction, and that they were adequately powered to detect the possible effects. The sample size was large enough really to be able to detect effect if it truly was there. Strategy for doing this was described by Bob Hoover, again at the Cancer Institute, suggesting that one approach, and this has been taken by many of these studies, is to begin with, say, a reasonably large sample, 1,150 cases and 1,150 controls with a large number of TAG SNPs, 500,000 or more, and then a replication study that might be even larger than that, but that would only test a subset of those, maybe 5% of those. And then a second replication study, again of large size, that tested an even smaller number that replicated in multiple studies. And then getting down sort of at the bottom of this funnel to even a smaller number and hopefully coming out at the end with maybe 25 to 50 loci, in this case for prostate cancer. And this is very much what was done in prostate cancer and led, I think it's only been about five or six loci for prostate cancer, but there have been other diseases in which more loci. The approach that was used in breast cancer, Easton et al published this in 2007, and they use a much smaller initial set of cases, but then a 10-fold greater size for the replication sample that test 24,000 cases and ended up with six. And this involved over 50,000 women with and without breast cancer, and these are all of the cohorts that were studied and able to enable this finding to be come up with. So these are really big, big collaborations. They're real challenges to put together. You can also have problems with false negatives. So here I'm now, Edgar's gone, something's going on around here. Even one of the false negatives might be really pretty obvious. And this was the prostate cancer study I referred to previously with 1,100 cases, 1,100 controls, then dropping down to, then increasing, sorry, to 4,000 cases and 4,000 controls with their top 27,000 SNPs selected at this particular P-value. And what was interesting about this, when they tested the two stages together, there were four SNPs that were really very strongly associated from the P-value here, this MSMB, the SNP and MSMB associated seven times 10. But when that was just looked at in stage one, the initial rank was actually number 24,223. So it's P-value was not very impressive at all. It was really way down in the ranking. And similarly, even this, the second SNP that ended up at two times 10 to the minus ninth was only the 2,400th SNP or so, with P-values that would have not have knocked anybody's socks off. So this is a challenge in being sure that your replication sample is large enough not only to pick up the false positives, but not to miss any kind of false negatives. It's been a real challenge trying to keep up with this literature. The number of published reports has increased nearly exponentially. We're 191 as of September, the end of September of 2008. And at the Genome Institute, we're trying to keep track of these through what we call the Catalog of Genome-Wide Association Studies, which is available on our website. If you can't remember the URL, if you just Google GWAS Catalog, it should come up as the first hit. And what we have tried to do here is to give a comprehensive listing of all of the published Genome-Wide Association Studies, including information on the author, the date, the journal, the trait that's being studied, the sample sizes, both initial and replication. The region of the genome, whether it's on chromosome 22 or chromosome three, the gene that has been implicated, the strongest SNP and the risk allele that have been suggested to be associated and the frequency is a fair amount of effort to pull out all of this. And really the objectives were to identify and track all of these publications, extract key information about the associations and make this widely available as a scientific resource for the community. And it includes a downloadable data file. So if people want to get on and download this into an Excel file and use it for other research, they are welcome to do that. We seek commonalities across associations, Genome-Wide, rather than disease by disease. And I'll show you some of the things that we can draw, conclusions we can draw about these SNPs. And we want to describe the approach clearly so that others can replicate or expand on it and we can maintain consistency in the approach. And we pull these out basically from published databases and various electronic clipping services that we have of news. And as I described, what kinds of information we pull off previously. We're looking here at about 180 published papers, excluding a few of them that didn't report the specific SNP. There were 145 reports involving nearly 800 unique SNPs. And then there were about 3,800 that were perfectly linked to them. So they also would carry some important information. So about 4,600 SNPs total. 83 of the SNPs in these reports had been reported two to seven times. Some of them in association with traits that we wouldn't really have thought were necessarily related to each other. Just giving some examples of those, sorry, before that, functional classifications of these index SNPs, whether they were in regions of a gene or of the genome that might be coding for proteins. And if they code for proteins, do they lead to a missense change, so a change in the structure of that protein? There were only 37 of these, 782, or only about 4% of those that were in those particular regions, even though those were the things that everybody thought, for sure, are what are going to be causative of disease. There were 11 or about 2% of them that were in the coding region and made a change, but they really didn't change the protein that was coded for, 340 that were intronic, and then a smaller number in various other parts that might be related to regulation of gene expression. And then a good 350 of them, more than 45%, that were intergenic that really weren't in any genes at all. And again, are stimulating a lot of research as to why that is, what I think. The odds ratios, or basically the probability, essentially the risk of having disease in people who carry one of these variants, compared to those who don't carry the variants, are typically fairly small, as you can see, most of them tend to cluster around the 1.2 to 1.4 range, and half of these associations. So the median is 1.28, so half of them are actually less than an odds ratio of 1.28, half are more, obviously. And this is very similar to what's been seen in Crohn's disease and the same kinds of distributions of variants explained or odds associated with disease, roughly the same idea. And what's shown in this dotted line is the power to detect these risk-low sides. So probably there are many more that have even smaller odds ratios, but they're very difficult to detect unless you have massive sample sizes. So that may be why they're not being seen. And there are some that have very large odds ratios. Those may be of some interest. It's something that would be worth looking into in more detail. I'm gonna skip through these because it's kind of a pretty picture, but I'm just showing you here what some of the very high odds ratios, strong odds ratios have been associated with and the allele frequency of those associations. And these are shown in a little more detail here with these various diseases that all have odds ratios greater than about four and a half fold. And those might be genes that would be of great importance on a public health basis, but again need to be looked at in much more detail. We've also looked at differences across populations as to how different the frequencies are in people that say European ancestry or Asian ancestry or recent African ancestry. And for the most part, they're really pretty similar. And again, just focus on the light blue here, but the pink is pretty similar as well. And for the most part, the more than half of these are under a genetic distance, which is a calculation of how different they are in populations of less than 0.7. But there are a few that have much greater variability across populations than that. And those might be of some interest as well. And in fact, in looking at them, many of them are traits for both immune, traits related to immunity and traits related to pigmentation, which we know are highly differentiated across populations. So just looking here at the top 5% of FST values of those that are 0.49 or greater, which is a pretty extreme difference among populations. In the blue, those tend to cluster among immune related traits, pigment traits, obesity traits, and then some neurological and height findings in that. And in the top 1%, so the really extreme ones, really pretty much focused in immunity and pigmentation, which are again, probably things that are quite distinct by geographic origin and allow you to survive in a particular environment that you find yourself. Some interesting findings that have been in genes that were not previously expected to be related to disease. I already mentioned the macular degeneration finding and complement factor H. Macular degeneration was thought to be a degenerative disease or maybe an ischemic disease related to blood flow, but no one really thought it was related to inflammation and yet this gene shows up very, very strongly. Some others in coronary disease, asthma, type 2 diabetes, really weren't on anybody's candidate gene list. Gene deserts, areas where there really aren't any genes at all and very strong associations of prostate cancer with the tip of chromosome 8 and there don't seem to be any genes for 500,000 megabases or more. So what does that mean in terms of causation of disease across disease similarly in various areas without a lot of genes? And interestingly, some of these associations have been in common with diseases that really weren't thought to be related to each other. So even though diabetes and coronary disease can be risk factors, diabetes particularly a risk factor for coronary disease, even when you control for that, there seems to be this association with two otherwise quite different diseases. And melanoma, I don't think anybody would have expected that to share pathogenesis with coronary disease or diabetes. Crohn's disease wasn't thought to be all that related to childhood asthma and yet they share this association. Is this real? Is it replicable? It seems to be. What does it mean for disease? And multiple cancers related to this prostate cancer signal and other signals in common in multiple sclerosis and type 1 diabetes, again, perhaps pointing a way to common, sort of a common etiology of these disease that may, we've got it used, that Crohn's disease shows up a lot here. And in fact, one of the lessons I've learned from this is if you wanna find genes for common disease in 30 associations that have been reported for this more than any other disease. So I think I'll wind up here and just note that nearly half of the SNPs that have been identified in genome-wide association studies as being related to common diseases are intergenic so we don't know what genes they're related to and we need to find out. Only about 8% of index SNPs or the SNPs that are identified in these studies are encoding regions or regulatory regions of the genome. So again, needing to look at energetic and entronic SNPs. We recognize there is some bias in genotype SNPs for an excess of mis-experience. That's one of the slides I skipped over but it's essentially some bias on the platform for what kinds of SNPs they're looking for. Most of the odds ratios are really pretty small, well, less than 1.5. And risk-aleal frequencies don't appear skewed either toward rare alleles or toward variants that vary a great deal between populations as indicated by large FST values. But the small number of SNPs that do seem to be highly differentiated across populations seem to be enriched for traits such as these. And looking at low side extremes of these characteristics might really teach us a lot about things we don't know about the genome. So I think I'll end with a quote from Sir Tim Rice in AIDA. The more we find, the more we see, the more we come to learn, the more we explore, the more we shall return. And we certainly have a lot to return to in the genome. And Greg, I think I'll stop there and be happy to take some questions. Excellent presentation, amazingly fascinating results. I would like to now open the line for questions from the audience. Diane, I think we're ready. You need to dial star one. Thank you. We will now begin the question and answer session. If you would like to ask a question, please press star one. Please unmute your phone and record your name clearly when prompted. Your name is required to introduce your question. To withdraw your request, press star two. One moment, please, while we wait for the first question. Lanny Ross, you shouldn't line us. I've been into populations that are not of that descent question for you. Given the large number of associations with Crohn's, it's a little curious to me how frequently does ulcerative colitis show up on that? I think clinicians think of those as sort of related disorders. Yeah, there may be about half of the low side that are seen in Crohn's disease are also seen in ulcerative colitis or inflammatory bowel disease in general. And the reasons for that are not entirely clear because they can be difficult to distinguish both clinically and histopathologically, but there are clearly some syndromic differences between them. So it looks like about half of them are shared now whether that's a power issue that we just don't have enough cases to be able to detect Diane. Other questions from the audience? Hi, I'm Siobhan Jones with humangeneticsdisorders.com. I wanted to know in what ways can I incorporate this into genetics education awareness for the general public? Yeah, I think it's reasonable at this point to say that this research is ongoing and it really has exploded in the past couple of years and this is what many geneticists are very, very excited about. That we've been looking and looking and looking with various tools and really hadn't found a lot that held up in lots of other studies, but this really has. Unfortunately, at this point, there's much more to be learned about this than there is to be taught about it. And that every answer we get raises 20 questions that we don't have good answers for yet. So the fact that these associations are generally pretty darn small suggests that these aren't gonna be useful really very soon for predicting disease. They may be very useful in identifying treatments or pathways that might, but I think for the moment, if we can convey the excitement of being able to find parts of the genome that everybody thought were silent and that really didn't do anything that we sort of arrogantly used to refer to junk DNA and that. Well, these junk DNA areas are associated with disease and in a very sort of replicatable, duplicatable way in ways that we don't understand. And it's a real challenge, I think, to all of us and a reason to get young people into science is to try to figure out these associations. Okay, go. Ideas of why your associations are more frequently found in immunopigment and obesity-related diseases? Well, actually, the ones that I was showing you there were the ones that differed dramatically between populations. So between populations of recent African ancestry versus European ancestry or Asian ancestry populations. And we suspect that, I mean, we know that that pigmentation varies dramatically by geography and there seem to be sort of possible reasons for why that would be. And so that, in a way, kind of reinforces the fact that yes, this makes sense. The immune-related ones may be a little bit more obscure, but probably, and as a matter of fact, we do know that there are some pathogens bacteria in that that only live in certain climates or other factors related to environment or soil or plants or allergies or whatever that are only available in certain climates. And so when those climates or geographic areas are acting on a subpopulation over tens of thousands of years, we evolved to sort of respond to that in those environmental stimuli. So that would probably be why those are differentiated as well. The obesity ones, I can't really explain, or the neurology ones, again, are sort of question marks we have to pursue. Have any more questions or comments? Again, please press star one. Again, please press star one. While we're waiting for other questions to come in, I'd like to draw your attention to the slide that I failed to put up at the beginning of the webinar. This is an additional email that you can use to reach Laura Rodriguez regarding data-sharing policies for genome-wide association studies. Any further questions coming in? I show no questions at this time. Fair enough. Well, I would like to thank all of you for participating in this webinar. We've enjoyed hearing your questions. Our next webinar will be held in two months on Thursday, January the 8th at one o'clock, I think Eastern time. I think it'll be a very interesting topic, the long and short of it, Finding Genes for Complex Traits in the Domestic Dog. I have heard this talk before and it's quite interesting. So I will leave you with the fact that you'll be receiving more information about this upcoming webinar as the time draws closer. Again, thank you all for attending.