 And so what I have to say is really not new. In fact, I'll be mentioning things that others have discussed in a lot more detail during earlier talks. But I hope that by offering what might be considered something of a consumer perspective, I can help synthesize and integrate what we've been hearing about today into the larger picture of our knowledge of how genetics and environmental factors contributed to human health and disease. Now this is a slide from the Department of Energy's public slide gallery, which was made available early on in the Human Genome Project. And the title says, Gene Chips Reveals Susceptibilities. If only it were that easy, because what they really reveal is data. And through by-dent of effort and cleverness and high technology, we can transform those data into information. This is the very same image that Teri showed earlier of results of a genome-wide association study in type 2 diabetes. But we don't really want information either. We really want is knowledge, and this is what's being touted as the key to personalized medicine. So where's the knowledge? That's what we really want. Right now, what we mostly have is a lot of data. We have more data than any other thing. We have more data than information, and certainly more than we have knowledge. So in trying to offer some comments on synthesizing and integrating the results of genome-wide association studies, I'm going to first review a little bit of the experience in replicating genetics associations in general. And in doing so, I'll make a contrast in the aims of such studies between identifying novel associations and measuring their effects and populations and mention a few methodologic issues that pertain to genome-wide associations. Then I'll describe some network approaches, many of which have already been discussed in great detail by other speakers. I'll focus on the Human Genome Epidemiology Network, which is a network of epidemiologists actually, and is the one that I work on, and I'll give a few other examples. And finally, I'm going to discuss two important results that are well known as success stories in genetic association studies to try to show how the results of candidate gene and genome-wide association studies can fit together. So thanks to that wonderful scientific resource PubMed, we can actually monitor the growth of science in this area. And these data are from a database that we have produced from PubMed on an ongoing basis since 2001 by conducting a sweep weekly of the new scientific publications added to the PubMed database and identifying the ones that are genetic association studies, mostly of unrelated persons. And you can see that the number of published gene disease association studies has grown tremendously just over the last five or six years to the point that we now have over 5,000 such publications entered into PubMed annually. Studies of genetic association that actually examine some other factor from the environment have grown at a much slower pace. Those are in green, the green subset. And then there's also a small but growing number of meta-analyses to synthesize the results of these candidate gene association studies. Now, as early as 2001, it was clear that there were problems in replicating the results of genetic association studies of candidate genes, and this is a rather famous graphic from a paper by John Ioannidis that was published in Nature Genetics in 2001 showing that often the first publication of a particular gene disease association had the most extreme outcome or odds ratio, and that over time, as the same association was studied by other investigators, the effect tended to converge either to a very small or to a null result. And John Ioannidis called this the Proteus Phenomenon after the Greek god who could metamorphose himself into many different shapes. And so basically, you know, there's been a lot of jousting with results of these scattered, non-replicating genetic association studies. And in this article John, who has written extensively on the topic, recommended a systematic approach using meta-analysis. Now, around the same time, we established what is known as the Human Genome Epidemiology, or huge net collaboration, which now has four coordinating centers. In addition to the one at CDC, there's a huge net Canada headquarters at the University of Ottawa, also in Cambridge, the UK, and the University of Ioannida for obvious reasons. And the main functions of this particular network are the published literature scan and review that I mentioned, the production of systematic reviews, methodologic work to strengthen reporting of associations, and also to promotion of network collaboration. And just a schematic to show how we have pursued this. I have here something that's a figure that was published in January 2006 in a commentary that described our approach. And the first workshop that we had to discuss this model was to focus on a network of networks, which I'll describe in a minute. That was in November of 2005. Since then, we've had others that, first of all, devised standardized procedures for reviewing and conducting meta-analysis of such associations. There's an online handbook that was published in 2006, and an addendum is being developed currently for genome-wide associations, a result of genome-wide association studies. I mean, a workshop last summer in Canada focused on strengthening the reporting of genetic associations with some guidance. Last fall, there was another group that met to discuss. This says grading, but basically the evaluation of evidence for an association. And in Atlanta next year, we hope to gather people back together to discuss this model and the body of evidence today. Now, reasons why replication of genetic associations has been challenging can be divided roughly into three categories. First of all, there's heterogeneity that we've discussed quite a bit already today. And there are many different reasons why heterogeneity may occur within the context of different studies, including differences in phenotypic measures, perhaps differences in true differences in underlying genetic factors. But there are also many unmeasured factors, including exposures that might play an important role. The second major category has to do with statistical uncertainty, and basically the usual problems, including type I error, which can occur just based on sampling variability when many, many comparisons are made, and also the problem of low power, which many of the early studies were quite small. And even genome-wide association studies may be too small to detect small effects. And this is another reason that's already been presented for pooling data and collaborating and analysis. And finally, there are biases that can affect the results, including all the usual epidemiologic biases, and perhaps particularly important in this field, publication bias, where another very likely explanation for this protease phenomenon is that positive results are, especially initially, are much more likely to be published than those that are negative. So how do these concerns differ when we're talking about genome-wide association studies? They're still, the same problems are still there. There are perhaps a few advantages here and there, and additional kinds of information one can use to get at some of them. For example, as has already been discussed quite extensively here, we can address at least one of the unmeasured factors, which is the different genetic background, especially among different ethnic groups that could result in population stratification. With respect to sampling variability, a number of statistical techniques are being explored for addressing this. And in terms of low power, there's the use of meta-analysis that David Hunter already showed several large ones, and the use of prior information from candidate genes, which can be used to inform the analysis. Now, we still have all the usual epidemiologic biases, but to the extent that the data collection methods and protocols can be made available for other investigators to peruse, the greater transparency can at least provide insight into what those biases might be. So having that kind of information available, for example, in the dbGaP resource, along with the study data, really has great potential to address, at least provide some information to address the problem. Publication bias, of course, still remains a problem. Although by enhancing access to data, as has just been discussed through either dbGaP or other data sharing mechanisms, people will be able to perhaps interrogate these other sources for the same association and demonstrate variation in the results of that. So as I mentioned, our particular network has done some work in the area of systematic reviews and meta-analysis and has made this handbook for systematic reviews available online. I'm providing the CDC link, although it actually resides on the University of Ottawa website. We also maintain a database of systematic reviews and meta-analyses, which we currently have sponsored about 50 such reviews that are published in collaboration with about 10 journals that allow us to publish those reviews simultaneously online. And we also have a citation database of about 550 meta-analyses that have been conducted so far. Also in progress are some guidance for reporting association data and publications, and as I mentioned, criteria for evaluating the evidence. And more information about all of these things can be found on the huge-net website. So is synthesizing information from genome-wide association studies any different from that and collected in candidate gene studies? Well, one thing to mention, I think, is that the priorities of such studies may differ. I mean, an important goal of the genome-wide association studies is to identify novel associations. Whereas at least now a predominant goal of candidate gene studies is to measure the size of the effect. Now, in principle, you know, both approaches can be used for both things, but currently a lot of the excitement about the genome-wide association study results stems from the discovery of novel associations that remain to be tested. Most differences between these types of studies are really a matter of degree. We still have to consider type one error. We have to consider type two errors. We still have the issue of harmonization among studies, especially of phenotypic information. Also among different genotyping platforms, this has already been discussed quite a bit, and there are methods to deal with all of these things. Likewise, population stratification is still an issue. So the more information is available about each of the studies, the more transparent they are, the better the information obtained from synthesis. So what's the purpose of conducting meta-analysis of data from genome-wide association studies? We've seen some examples. This approach can improve the power to measure small effects, to assess heterogeneity among genome-wide association studies. There are methodological challenges also discussed earlier, such as the use of different genotyping platforms, the harmonization of data, especially when different criteria are used to define phenotype of interest, and also the treatment of replication samples that are within the same genome-wide association study, a phenomenon that is quite typical. But I think, you know, to me anyway, that meta-analysis has its limits. I mean, it's definitely a good way to start, but it really is not the end all of data integration, because it's really only good for synthesizing data in one dimension. So this is just a draft of some proposed evaluation criteria for considering individual gene disease associations, and I guess a proposal rather than guidance. And basically there are five main categories that tend to span not only validity, but I guess to a certain extent utility of the discoveries. And they are effect size, the amount of evidence and replication, protection from bias, biological plausibility, and relevance to health conditions. And really only the first two can be addressed by meta-analysis, and the other things are somewhat subtle in many ways and can't be assessed in any automatic way. So I may have failed to point it out, but at the center of my big wagon wheel image was the expression network of networks. Why network of networks? What's the utility of this approach? Well, the way we think of this as a way to bridge cottage industry with big science, to quote Bob Hoover who talked about this at SCR last year. And a way to, prior to trying to combine everything and one final repository like DBGAP, there's really a great deal that can be done by investigators working together within a particular domain. And we've already heard numerous examples of that, because people who are working on the same problem tend to share not only specific knowledge and for example there are within fields groups that devise phenotypic criteria that can be used to standardize the collection of clinical data and phenotypic data and epidemiologic studies. So there's specific knowledge, there's awareness of current research problems so that the publication of the results provides a feedback mechanism to the research agenda and they tend to share funding sources. So you see for example in the National Cancer Institute which has had a consortium model in place for many years, this network of networks idea is already in place. And in other places, as Andy Singleton mentioned in his talk, you know there are various kinds of consortia and collaborations that can come together for a single purpose in an ad hoc way or for a prolonged collaboration in a research area. And many networks already exist, some of these were mentioned earlier, the first two are NIH sponsored. There are also international collaborations that tend to overlap with some of the NIH funded projects, some are independent. There are big ones like this one on genetic susceptibility to environmental carcinogens but then there are also very small ones, nascent ones that have been formed to address smaller topics such as the pre-bic collaborative to study preterm birth. Okay. Now, here's a crazy network image but I do love it because it shows just what can be done when data are made available. This is actually based on OMIM, a network model that connects genes that have been studied in association with diseases and where associations have been found and the top one is disease centered and the bottom one is gene centered and you can see these are not random. Of course to a certain extent it's looking under the light post phenomenon but there probably are true relations in there and this is based entirely on data and OMIM and was done by physicists by the way. So here's another model of the network that I think is worth showing. It's the ALTS gene database which is embedded in the Alzheimer Research Forum which is a collaborative group to promote research on Alzheimer disease and again the data are obtained by sweeping PubMed for publications and are curated in this database which can also perform online meta-analyses. Lars Bertram at Harvard is the founder and curator of that. Here's the P3G Observatory from Montreal where they are also trying to create a repository of questionnaires and comparison tools that have compiled a number of them from 11 studies in the U.S. and other countries. I think they should connect up with TBGap. So in two minutes I may not have time to tell my tale of two associations. You don't believe me, it's 340. I don't believe you, no. But anyway, okay, I'll hit the buttons fast and you will get an impressionistic image. Okay, so this association between CAR 15 and Crohn's disease is a huge success of the candidate gene era. In 2001 and as we've already heard the complement factor H and age-related macular degeneration is a huge success of the genome-wide association study era. Here's the natural history of the big discovery. The pink is CAR 15. Lots and lots of replications. It's an early success. It has offered key insights into pathogenesis and phenotype but six years later we're not entirely sure how to use this. It hasn't replicated in all populations and it was hoped at the time that it would be useful in identifying patients who could benefit from infliximab which at the time was a big new treatment intervention but it didn't work. However, genome-wide association has been helpful and since I don't have time to discuss it I would suggest that everyone who hasn't looked at this do so. It is a commentary by Lon Cardin following the publication of the IL-23R association with Crohn's disease which shows just how a genome-wide association in combination with candidate gene data can be used to expand the knowledge horizon. This is the macular degeneration. You see CFH dropped on the scene in 2005, been replicated many, many times and there already have been three meta-analyses. Another early success provided great insight into pathogenesis and progression. There's a recent study examining interaction with smoking and BMI. Directions for translation isn't clear. It doesn't currently have any utility for screening and in fact there was no interaction in that same environmental factors study with the treatment assignment and the ARIDS trial although I was very disappointed even though the author said there was no interaction I was very disappointed the data weren't presented. Isn't that the thing you would most want to know? You need to finish up, Marta. Okay, so I won't repeat this. Instead I'm going to use Terry's slide and this, you know, she called it the wave, that's good. Waves can be good or bad. I've heard it called it tsunami. Let's not call it that. It's a rising tide that lifts all boats. That's what we want, right?