 I think the whole genome association business needs a whole lot more epidemiology population genomics input. So I'd like to maybe illustrate some of those opportunities for you. I want to talk about the causes of heterogeneity of results of gene association studies. This is a... We've raised this issue a couple of times. Review the types and sources of biases will be a review for you, but particularly relate them to genomic research. Look at examples from our genome-wide association studies to illustrate some of these bias, or perhaps the potential of the bias, really wherever the defects in the literature are, and identify strategies to study design, data collection, statistical analysis, and interpretation that could prevent or minimize bias. Now, rather than showing the bummer of a birthmark slide again, the target audience of this is that there's at least one editor in the room, and some of this, I think, and I would say that your particular journal is not at fault here, but we've got everybody pointing fingers. It's a good sign, Terry. The point here is that some of this has to do with editorial review, peer review of manuscripts, and then the editorial review of that is also part of the business here in terms of accepting papers. Now, if you come from outside into this genome association field, you're kind of struck by kind of a land rush kind of thing. Someone wants to be the first fly on the beached whale claiming dibs, a lot of interest in getting there, making the first discovery, and then going on rather than kind of cleaning up the rest of the field and understanding the biology and the etymology involved. Well, this has some shortcomings if things aren't always reproducible. This is John United says, he's actually published several of these papers, but this is perhaps one of the more inflammatory ones, why most published research findings are false. This obviously hits small interest groups like the U.S. Congress and other funding agencies including the U.S. public, and obviously creates a lot of issues for us in epidemiology and people are doing observational studies. And our media colleagues don't help much. Here's the science writer for the Wall Street Journal. Most science studies appear to be tainted by sloppy analysis. So this really is our reputation, and those of us particularly in epidemiology probably are the keeper of the kingdom of proper analytic techniques relative to population wide data. This really is our business to really understand the biases and when they occur, obviously fix them and when in the planning stage perhaps prevent them. And this, I believe both of those papers were based to some extent on this Hirschhorn paper that the study that Terry has shown before, but really out of 600 gene disease association, only greater than 75, only 6 of the 101% showed in greater than 75% of the studies there to be this ability to be replicated in more than 75% of the studies. So most of the studies come up with a lot of issues, a lot of inconsistencies, and some are just downright non reproducible. So this obviously is a challenge to us, and so we should try to understand within the genome association studies what could be some of the explanations. Well, obviously in epidemiology, one of the first things you say was that there really could be the truth. It could be really the biologic situation. Biologic heterogeneity, Terry's just told us about some gene-gene interactions. For example, that ApoE4 gene, if you were to have a high prevalence of that interacting gene versus a low prevalence of that interacting gene, you were likely to find a heterogeneity there, and of course then the gene environment interaction. So some of this could be true, and so we shouldn't just toss out the whole thing, but say that this is also our job to sort these out. But there could be spurious mechanisms as well. Terry's just given you a lecture on genomic analysis quality, and we shouldn't just, as we don't accept anything, just take everything at face value, but look at the quality of the genome analysis. Type 1 error obviously is an enormous issue as we come up with essentially a million chi-squares in some of these new chips. Obviously there's going to be type 1 error, a huge opportunity for type 1 error. The limited sample size is in power obviously in all of our studies. There's a variety of cohort age period effects, and then I'm going to talk about bias. Well, you're all familiar with the definition of biases. There's one from David Sackett and one from Leon Gordus, and obviously the two parts of this is that they can really occur at any stage of inference, design, conduct, and analysis, and they produce results or conclusions which differ systematically from the truth, give us mistaken estimates of the exposure's effects on not only disease but also on risk. So those are some biases, and just to point out that some of the effects of those biases in the genome-wide association context could be false negatives, a study which includes people with a gene that could be important, which is not identified, it could be false positives, which of course is this type 1 error effect, and we've already seen some examples, and I think this is where a lot of the discussion was. I think there's been less discussion on inaccurate effect sizes, in other words, rather than an odds ratio of a size or perhaps across many genes, a number of odds ratios which would account for a substantial amount of the heritability, you would have, say, some of the biases diluting the odds ratios toward the null and perhaps underestimating them, although the possibility is also for overestimates. So these are the usual effects of bias, and just to point out that there is a problem with false positives, here's a little slide on false positive human beings, here is your ship loaded with mannequins falling into the, and the sharks are saying, what is this? I'm kind of a cruel hoax. The point here is that a false positive doesn't just go into the literature and kind of disappear, but obviously then stimulates a number of confirmatory studies whose resources could otherwise have gone on to test other perhaps more important hypotheses. So this is not just if you have a false positive, it obviously sets up a chain of inquiry that does use resources. So this is obviously something we need us and our students to be very aware of and concerned about. So just like everything else, the types of biases in genome association studies include the selection of cases and controls. These are epidemiology studies in the analysis. Obviously information on genotype or phenotype, as Terry has already alluded to, oftentimes there are more issues on the phenotype side of it than with our current genotyping. There's the analysis and presentation of results, and then finally the interpretation of results as issues. So if you look at all the kinds of biases, and as I mentioned I'm doing a systematic review of this, I think there's at least 20 kinds, there's probably many more, but there's at least 20 kinds of biases potentially encountered in genome wide association studies. And I haven't listed here the ones common to all human observational studies, response bias, et cetera, although we will discuss them later. But there really are a number that are unique or particularly especially common in genome wide association studies that I wanted you just to familiarize you with. There's some that we call the super case or super control biases that we'll mention. There's this latent case bias. Population certification has already been discussed. I'm going to talk a little bit more about it. The Heinberg disequilibrium, which oftentimes talks about genome quality, also genome quality bias in general, affecting the transmission disequilibrium test. And I'm going to talk a little bit about the winner's curse. So in terms of my getting into this field and understanding it, Terry and I decided that a systematic review of the genome wide association literature now listed in the catalog of GWAS would be a good thing to do, particularly looking at in a kind of the epidemiology of bias, if you will, the first 109 studies. So these go from March 2005 to March 2008. I had to stop somewhere. It's just like where to stop the tide from coming in because these are coming in literally every day. So we've cut this and all of those that appear in this catalog from these dates are in there. So we believe we have a complete sample of the literature. This catalog only included those with genotyping platforms with density greater than 100,000 SNPs. So there may be some other smaller studies that you're aware of that aren't in there. But this was an entry criteria. We looked at each study for study design, description of case and comparison groups, collection of genotype and other risk factor data, particularly interested in how results were presented and how they were interpreted. And I'm going to share some of those preliminary results of this systematic review with you here. So when you're looking at the whole possibility of bias, obviously within the whole genome wide association literature, there's quite a heterogeneity of variables and diseases and study designs. In terms of the phenotypes, there were discrete outcomes or traits. There were about 91 of them in 83 studies. This is because a number of studies had measured a multiple phenotypes and traits. The Framingham study, the Wellcome Trust study, for example. And about 26 studies looked at 40 quantitative traits. So this is really what we're talking about again, quite a number of different end points. But one of the key points here is that when looking at bias, of course, the study design, which is most prone to some biases of case and control selection and the collection of data, information data are the case control studies. And over 70% of the current literature in genome wide association studies are of that study design. There are about 4% or trios. Interestingly, only about 4% are nested case controls. We would expect that to change as the genetic data are collected, the cohorts are followed forward. And this this type of study design is going to be used a great deal more. And then there is a number of cross-sectional, I call them cohort. They're basically the baseline studies of cohorts which are forming and cross-sectional analysis. We're done particularly a quantitative trades looking at the genes and the lipid values of the genes and the and the height, et cetera, as cohorts are being formed. What the point is is that the fact that over 70% of these are case control studies obviously does put your antennae up about that this could in fact have some issues with bias. So as you know, the requirements for a bias-free case control study include minimization of selection bias and information bias. There should be cases, should be representative of all those who develop the disease being studied. The controls representative of all those at risk for developing a disease. I'm going to get back to that in a minute. And it'll become cases and be included in the study. And then there's this other one for GWAS. Ancestral geographic origins and predominant environmental exposures of cases are not differed dramatically from controls. And this is of course the whole population stratification issue. And then for information bias, the collection of risk factor and exposure information is same for cases of control. So this is kind of your basic epidemiology, but just to remind you of that because these are some of the issues that may or may not have been handled carefully in some of the genome-wide association studies. So what we did was set down some criteria for classifying whether or not these biases occurred or could occur. One of the problems was is that the data simply weren't presented. And so one couldn't as a reader judge whether this bias occurred or not. In our typical epidemiology papers we're used to seeing the tables and the analyses and things going along. And I will tell you that this body of literature departs from those standard forms substantially. And so what we try to do is capture this by, for example, for misclassification bias would be really the absence of description or the use of adequate means to define cases of controls. For example, as Terry said, the description of the controls as being Belgian probably isn't enough for me as an epidemiologist. I'd like to know a little bit more. Were they normal Belgians or northern Belgians or southern Belgians or whatever? For non-response bias would be the absence of description of rates of recruitment participation in cases of controls. The prevalence incidence bias, many of these series come from clinical case sets. So obviously these use prevalent cases. And of course if these diseases are studied having a sizable short-term case fatality rates or remission rates, obviously that's a setup for prevalence incidence bias that we see in our case control studies. Obviously there's a problem with misclassification. Now calm down there, ma'am. Your cat's going to be fine, just fine. Obviously the misclassification of this dog as a pet owner has its ramifications. The ramifications of case and controls are also an issue. So the methods of selection and recruitment are frequently described in a supplement or other publication. So you have your paper, you get it on PubMed, you're reading along, and there are no descriptions of cases and controls. And frequently those are in another place in a supplement not published with that, you have to go look it up. I find this a real pain because I would expect there to be really my ability to judge as I go along and read the rest of that paper whether or not there are some real issues that I need to look for. So about three out of ten of these really didn't have the methods of recruitment and selection of these case and control populations. Now there are a few baseline descriptors of cases and controls. Only about a third had tables comparing cases and controls. And I will tell you that I was incredibly liberal about any old kind of like men and women age. They got a yes. Now I also talked about partial and full kinds of tables. I would say the number of tables which I would say would be adequate in terms of a study of coronary disease which was describing blood pressure and smoking and diabetes etc. Those are probably only around 10%. I don't put those in the slide. But when getting down to the statistical comparisons of whether cases or controls were different, only about one in the 33 of papers had those kinds of analysis so that I could say there should be a statistical adjustment for those variables at the end of the study. So really quite a different literature than we're used to in epidemiology and for which we would as a reviewer or an editor bring into an epidemiology journal. But this was not anywhere as near rare as the participation rates. Now this was even more parsimonious. About 9% had some comment about participation rates in terms of how selective these populations were. And in terms of comparing participants and non-participants, there were literally only two papers which really described if the people who chose to participate and not. So a lot of the things we look for in the section of cases and controls really didn't occur. As I mentioned, two-thirds of the case of the papers were prevalent cases derived from clinical sources. About one-third were population based or incident cases. And so certainly loads of opportunity for prevalence, incidence bias, or which really gets into the question of what genes you're really studying. A number of diseases would appear to have genes which are associated with prognosis, the aggressiveness of prostate cancer, the case fatality rates of coronary, of myocardial infarction. And so when you're studying prevalent cases without the ones who have died, et cetera, you may have a very heterogeneous phenotype to some other cases in which included those individuals. So some issues there to talk about. Let's just give you an example here of a case control study. This was a study of type 2 diabetes in Mexican Americans. Obviously most of the data have been primarily in European or European American populations, although many of the subsequent replications and other studies were in just a wonderful array of studies. But certainly it was good to see Ajiwaz in type 2 diabetes in a population that we know is obviously very affected by type 2 diabetes. So 281 cases with diabetes defined by the usual technique. And then there are 280 controls from a random population without really their diabetes status ever being checked. There was no data. They weren't obviously, they were just taken from a random sample. And just from a bank rather than really even questioned about this. So about over 100,000 steps were assayed. Four genes were identified. But the real question is in terms of misclassification here, obviously, is there should be a substantial prevalence. We think 7 to 14% of the type 2 diabetes in the controls just from what we know of the population. And so an example then obviously as a reviewer I would have been very worried about there not even being any clinical data about diabetes to say nothing about the same kinds of screening that we had for the cases. It's just a kind of one of the issues we've identified. Now there are several other interesting biases here in the collection. And again the goal here is to find the signal for the gene. Identify the gene associated with disease. Maybe the magnitude of that association, the strength of that association, maybe a secondary consideration sometimes. And so oftentimes what you'll see is the use of super cases. These would be not only a person with the disease, but additional criteria that increases the chance of a genetic etiology. A case with a positive family history or multiple relatives of the family history or an early onset. These would be classic super cases. In terms of super controls, the flip side, the use of additional criteria that decreases the chance of a genetic etiology. Older age, a negative family history. My favorite is an older age person with multiple behavioral risk factors for that condition without the disease. Again, not exactly a sample of your general population who could get the disease in terms of your case control study. The final bias to talk about would be the latent case bias. The inclusion of controls in persons who could never develop the disease even if they carried the gene. And there are some of those that are for us, you know, at some point you're deep down inside epidemiology glands starts to pump out rage hormones at some point, even though it may not make any difference. It just, I don't know, I'll show you. So here's a study of prostate cancer. Do you know my association study of prostate cancer? And this was a discovery study. The first study out is called the discovery study. 1854 cases. And again, these were symptomatic, not just people, found these were people who came in with obviously aggressive tumors that were in the bone or caused urologic issues, not just one found at screening, etc. They were all diagnosed at the age of less than 60 years or had a positive family history of prostate cancer. So super cases, not a sample of all the cases that could come. Similarly, they used super controls. Again, 1890 controls. Again, they had to be older than the age of 50 and with a normal PSA. I don't have any problem with this, but the, one of the points, actually it's actually a very low PSA. But what you ended up here with actually this group of cases being substantially younger than this group of controls, in addition to the other super case definitions. So they had about a half a million SNPs assay. They found 11 new SNPs associated 10 to the minus 6 and then went into a replication study with 3,300 cases controls, which were much more generally defined. Your typical case of prostate cancer, your typical control. They genotype these 11 steps. They found 7 to be independently associated at the 10 to the minus 7th. Well, here's what they found. And if you look down your odds ratios, the ones that are positively associated with prostate cancer in the discovery study in every instance went toward the null. Okay. In those which were negatively associated, again, migrated toward the null. Now, I will give credit to the authors here. They acknowledged the use of the super cases and said that in future studies, the real estimate of risk should be these. That was in the text and as is appropriately to do. You don't always see that. But you can see the issue here of obviously putting out the net, finding some genes and then following it up with these studies in replication. The danger comes in, of course, is that when we make our risk estimates with these as really having a population relevance rather than the unique characteristic that they had. Whereas these are probably the population. And as I'll mention later, the winner's curse is when you do your study here, find these and either have regression to the mean or even worse, an exaggerated estimate of risk. And then use this for your sample size calculations for your subsequent studies or other investigators using those odds ratios for sample size calculations in which they can't replicate these as to do the super case and super control bias. And this is this issue of the latent cases. And here is a discovery study in Iceland, 1,890 cases of prostate cancer and over 20,000 controls. The problem is that 60 percent of the controls didn't have a prostate. Or I don't think so. I don't know everything about Iceland, but I would assume that the women don't have prostates. So that even if they had the gene, they couldn't express it. So in terms of your minor allele frequency of your cases and your controls, if 60 percent of your controls really weren't at risk, it's going to minimize those differences in minor allele frequencies. It's going to bring them to the null. These were then replicated in this huge number. This is Rochester, Minnesota. But you can see that in this huge numbers of cases, huge numbers of controls, and certainly some of them were women. A couple of points here. I think in terms of these latent cases, it turns out that these have been amazingly robust in terms of finding associations and doesn't seem to have made a big impact. This probably has to do with the fact that the prevalence of some of the, the prevalence of disease is, I'm sorry, the prevalence of the gene is relatively small, and the difference in mean allele frequency is relatively small. So that you can, if the prevalence of the disease in the population would have added to, would have been in the in the males. I'm not explaining this particularly well, but the prevalence of the susceptibility gene in the males wasn't that much different from the females, given the fact that the low prevalence of disease occurring in the men. But still, I think later on these are going to be issues that we're going to have to deal with, particularly if we're going to start looking at gene environment interactions, et cetera. And I honestly don't understand why the investigators, it's not like they didn't have just male controls. I mean, I mean, it's not like they didn't have them. Maybe if they're only women and all the men had prostate cancer, maybe I could see that. But so they really didn't ever just analyze these, although some of the papers do describe analysis, whether without the women and saying they didn't make much difference. You'll also find similar analyses of cases of breast cancer with substantial number of the controls being men. That's a little bit harder issues, since men can get breast cancer, not frequently, but they can. But so this is a situation of latent case bias in something we need to ponder its use. A few other selection biases. There's, of course, membership bias. Membership in the group may imply a degree of health which differs systematically from the truth. Perhaps a kind of membership bias, and certainly a confounding, is this population stratification that Terry has introduced. I'm going to say a little bit more about. And then there's this phenotypic variation bias in which you'll have multiple replication studies, all of them using potentially different definitions of the disease, obviously setting you up for the possibility of differences in the assessment of risk. So here's the Welcome Trust case control consortium. This has been alluded to several times. This took a half a million SNPs. And the design of the study used 2,000 persons from each of seven diseases. They targeted seven diseases. And their controls were 3,000 persons without any disease. 1,500 in the 1958 British birth cohort, which is a population cohort study, obviously all of the same age. And then 1,500 blood donors from the UK blood service. So obviously setting one up for a membership bias, an individual who goes in to donate blood, at least in our hospital, is really quite different than a random sample of the population. And there's been, as Terry had mentioned, a lot of concern about us just detecting genes associated with blood donation. And so now having said that, also in this analysis, sometimes there'll be the 3,000 controls used. Sometimes they'll use the 2,000 cases with, say, one of these disorders. And the other 12,000 cases will go into the control group with the 3,000 as control controls. So they weren't random samples of the population either. So a number of these that caused the classically trained epidemiologists, at least this classically trained epidemiologists to wrinkle one's brow. But I'd have to say with the Wellcome Trust being very successful in identifying a number of gene associations would appear to be amazingly robust, but still issues of concern. So I just want to give another example or so of this population stratification just to say that these different allele frequencies due to the diversity of populations of origins and unrelated, it's really a confounding. There have to be differences in the disease prevalence between populations. There have to be differences in allele frequencies. And therefore, with different admixtures, you'll have your classic confounding triad. And so what you have here is coming out of the womb of mankind in Africa here, and then spreading across, you have obviously most of the genes are shared, but Europeans having some separate ones, and East Asians, others, and Africans one. And so the degree to which you're admixt around here could lead to a situation in which a gene in a disease, which is say more frequent in Africans, in an admixt disease, which is more frequent in Africans, could be associated with genes which are more frequent in Africans, not on a causal basis, but on this population stratification basis. And the classic example of this is from Bill Knowler reported here in this Lancet paper looking at Native Americans. And there is this diabetes gene. It had a prevalence of 1% in African Americans, and obviously Native Americans, and obviously diabetes prevalence is very high in this population. And then you have this polymorphism, which is very frequent in Caucasian Americans, and obviously a lower prevalence of diabetes. And when you put them all together into a study of, say, a Native American population, you end up with this kind of a 2x2 table showing the protective suggestion of protection of this genotype here, with those with positive, obviously having relatively little diabetes compared to not with diabetes, and vice versa. Then what you could do is stratify, essentially, by this index of Indian heritage, similar to what we would do in the next slide describing large number of SNPs that classify population stratification. Here, just one index of Indian heritage stratifying from low to high. And then looking at the association across the street, and you can see very little association once this stratification. So clearly, this association was confounded by the degree of American Indian heritage, which when taken to account showed no association with the disease. So probably the classic description of population stratification. Now, what one could do is then use unlinked genetic markers for this. Obviously, the population stratification allows many marker allele frequencies to vary between population segments. Any disease more prevalent in one subpopulation will be associated with any alleles and high frequency in that subpopulation when you look at the whole. So this is the whole issue of population stratification. But however, you can also use these multiple markers by analyzing these unlinked markers and essentially adjusting for them. So what Sladek did in this study in France, this was just a study within the country of France. It was looking at type II diabetes, 660 cases, 614 controls, upwards of 400,000 SNPs. And this SNP 200 kilobases from lactase gene on chromosome 2 was one of the associated. It was strongly associated with type II diabetes, but it also was known to have a strong North-South prevalence within the country of France alone. Now, when we're not talking about Asia versus North America or, you know, big differences, we're actually talking about within the country of France, what they used then was adjusted for 20,000 plus SNPs that were not related to type II diabetes as a measure of the population stratification of the genetic heterogeneity across North to South France. And after adjustment for the stratification, most of the association was removed. So again, another measure about how this could be important. Lactase, you might think should be associated North-South. The milk drinkers in the North and the wine drinkers in the South of France. Obviously, you could see how that could work on a genetic basis. Phenotypic variation bias, obviously, is also an issue. Here's a description of cases in a study of atrial fibrillation. The first sample, you had hospitalized patients with atrial fibrillation. The second sample, you had hospitalized patients with hospitalized with ischemic stroke or TIA with atrial fibrillation. The third one was hospitalized patients again with acute stroke. And the fourth sample was was known atrial fibrillation in patients with hypertension. So what you have is all sorts of other things. And I should think if you study this, compared to this, I would imagine you'd come up with some genes that were associated with stroke here with hypertension. Obviously, this heterogeneity of the definition of the phenotype leads to possibilities of heterogeneity. We talked a little bit about genotyping quality bias. Just to point out is that a number of papers did really lack the genotyping control and really the quality control criteria that were used or what we call the call rate, the success rate. We've also, a number of papers did not test for Hardy-Weinberg, this equilibrium as a measure of quality control. And there were a number of examples of transmission, this equilibrium testing bias. Some of them I think caught in some not, but some distortion of the frequencies at the end of that time. One of the other questions is, is DNA collected and handled identically in cases and controls. This was from Clayton, this type one diabetes study. The grid study, 1958, British birth court looking at the controls and examining a small number of SNPs here, looking at lymphoblastoid cell lines using the same protocol, but done at a couple of different laboratories is the point of this slide. And despite randomly ordering them and masking them as case control status, there were a number of extreme associations who really couldn't be repeated with second genotyping probably having to do with different laboratory techniques. In terms of environmental exposure bias, information bias, frequently what we saw was a lack of collection or presentation of known environmental causes of disease or comparison between cases and controls. These occurred very frequently. Again, this had to do with this table one that was frequently missing and didn't really allow the reader to really judge whether or not there should have been a further adjustment or not. Just further with this, the confounding control bias was the lack of statistical adjustment or stratified analysis in the presence of potential confounding. And so even though there was or large differences identified, frequently there was no statistical test then to follow through and adjust for them. So there were few comparisons of these exposures between cases and controls. In only about 36% again, there were these tables comparing the cases and controls in terms of what important risk factors were known. Again, the statistical comparison then was really only in about 3.5%. The statistical adjustment was only in about one out of six and only about one out of six were stratified by analysis by a potential confounder. So in terms of the things we'd like to see perhaps in the analysis side of things or using the collecting information, displaying it so we can make a judgment and then mathematically taking into account some of these differences, obviously an area where we think the analysis could have been better. This is a study on macular degeneration and this study was from Hong Kong. But just to point out that this did have a table one looking at cases and controls. I believe it was in a supplement. And these are all risk factors for macular degeneration, male sex, age, and being a smoker. And so before the gene analysis was done, this is the prevalence of men versus women in the cases and controls here. Obviously, beginning with great differences and the prevalence of smokers was also tremendously great. I guess what I was trying to figure out from this paper is how they just didn't identify the genes related to smoking or perhaps male sex. It would have been much better to try to stratify and there was one phrase in this paper saying they adjusted for things and they still found these genes. It drove me a little crazy. And just to say, I think the study to really try to look for the genes could have been much better done in which these known risk factors were balanced in the two groups so you could really look at the gene effects. Confounding, I think you're all very familiar with. And we won't talk about that. Obviously, you need something associated with exposure and associated with disease and not an intermediate step. So here you have your illustration of confounding, the confounder related to exposure and disease and not just part of the causal pathway, which we see very frequently in epidemiology. And certainly, I'll show you an example of this kind of confounding. So here is a study looking at this particular gene, the FTO gene, and type 2 diabetes. And here's your welcome trust and you had two studies, each of which showed statistically highly significant odds ratio between the FTO gene variants and diabetes. A third study did not, just downright zero. And we'll tell you that this study selected patients in a very different way, particularly related to body weight. Well, as it further goes along, we can look at the variants, the TTAT and AA variants of FTO in the case of controls and showed that these, obviously, the case control groups for diabetes were obviously very different in their body weights. And there was also this difference in gene frequency. These are the BMI's. There was also this difference in gene frequency. And subsequent diabetes, I believe it was by Zagini, in which this adjustment was done, obviously showed that the FTO gene is a better predictor of BMI than it is of diabetes, which is shown here. So here is the first part of the slide showing that the FTO gene was related to BMI. And then when adjusting for BMI, looking at the association between the FTO gene and diabetes, these associations disappeared. So clearly what you had was the FTO diabetes association being confounded by both of its relationship to both obesity, by its relationship to obesity and BMI. So I guess the point there is, is that we think of confounding in terms of behaviors and some things that can also occur relative to genes, the rules of confounding that the confounders related to both exposure and to disease still holds. And we need to look out for those as we look at our gene disease relationships. I think you're all familiar with dealing with confounders. Just to say is that we don't have many genome-wide association studies with randomization. There had been a number of ones with restrictions. I think Teri had mentioned the one with APOE, in which only APOE positive individuals were identified, and then the genes, so this was restricted to group, not a lot of matching going on in terms of selecting cases controls, although that the one study would have been a very good one. It's a relatively small study. It would have been a very good study to match cases and controls. Again, the analysis were frequently missing, certainly standardization for age and gender, perhaps not as big of an issue in genome-wide studies. In terms of stratification, sampling into sub-samples according to criteria such as a genome, one of the very good studies on Crohn's disease, for example, certified by Jewish and non-Jewish, which could have very much provided a population stratification kind of issue, and the authors there, with the higher prevalence in Ashkenazi Jewish populations, stratified by that population did separate analyses, and I thought that was very well indicated and obviously added a lot to the study. And then multivariate analyses, again, frequently are not done, or certainly we're not showing the analyses. Frequently saying multivariate adjustment showed no difference. We're really often not shown this handling of the confounding. Finally, just a little bit about the analysis and presentation of data. Obviously, the alpha error control bias. Terry talked about the Bonn-Froni corrections, et cetera. There still are a few papers, particularly smaller studies, even though using a large number of SNPs, et cetera, which are not controlling at this level. The data dredging bias is an interesting one from Sackett. It's the lack of replication studies testing hypotheses identified in the discovery study is our definition of it. In other words, you would go and do your big study, find something, and leave it alone. And obviously, Terry's talk next after the break, we're going to talk about really the requirement for replication studies to get away from this bias. And then finally, the winner's curse I've already alluded to, that's the overestimation of the effect size in the discovery GWAS, and then inability to replicate the odds ratios because of lack of power. This is both a censoring, kind of a regression to the mean issue, as well as possibly the super case and super control issues. But it does talk about the first one out of the box, that first fly landing on the whale oftentimes hasn't had replication success because of this effect. And I showed you this yield study again. So if they tried to replicate this, it would be tough flooding. Finally, let's talk about the biases of interpretation. In reading this literature, you really get the idea that you're flying along and saying, what's a mountain goat doing way up here in the cloud bank? You have an idea that something should be happening and when something doesn't happen, you kind of discount it. The fact that he's about to run into a mountain is probably not part of his hypothesis list, even though obviously it should be. And these were reviewed, these interpretation biases in this general review by I think some of them that are to think about the confirmation bias, evaluating evidence that supports one's preconceptions differently from evidence that challenges these convictions. For example, this whole idea of gene deserts, the initial GWAS study, oh, it's in a gene divert, it can't occur. That's really a conviction bias. You had to have a gene in an exon, changing a protein for this to work. And only until later when a number of these kept on coming up and within the same study you had some introns and some exons and some in regatories and gene deserts and all, I think we got away from the preconception of this gene protein requirement. There's the rescue bias, you still see in the discussion, discounting data by finding selected faults in the experiments and a lot of discussion here. Or mechanism bias, interesting discussion is the investigators and Terry's going to get into this in her next talk on functional studies. Looking further and further away from where the stick was, looking for any kind of mechanism that would, and despite the lack of hypothesis generation in GWAS study, there's still, you end up with saying, well, if we find something, we better find some underlying mechanism even though the preconceptions of the whole GWAS study is diagnostic. So, some little things to think about. So, Terry's going to talk about replicating and in that paper that I highly recommend, I put in your handout all those things that we would recommend in terms of minimizing bias. I'm not going to go through them because it's Epi 101, but just to say many of these issues, if that Chanak and Monolio paper were followed, we would have probably less heterogeneity in the studies. Okay. I think those are all in there. So, what we want to do is is that when we find something, yep, it's a mammoth. What we'd like to do is be able to say we have an association and it was not a storious one, not due to something we can't account for, but rather something that we can do confirmatory studies, mechanistic studies, drug target studies, obviously, and this deals with the information going into that decision being credible. Thank you. Any questions? Yes. When you mentioned about the super case bias, I think sometimes how do you balance between a super case bias and the true case recruitment? Because sometimes if we try to recruit a true or a extreme case, just want to ensure there are true cases. So, how do you get balance between these two issues? I think there is a way to get balance. In other words, the stringently enforced preset criteria to select cases, representative of all the cases in the population is doable. The super case bias comes in is that when we put in additional criteria, which maximizes the chance that this is a familial form of that, and then the bias comes in is when we take the odds ratio associated with that kind of case, we say it applies to everybody. But I don't think that should keep us from, if you want to really identify a representative case, that has to do with setting down phenotypic definitions that really cover all cases in the population. If we do want to use super cases so that we would put out a particular specific net, if you will, to maximize the odds ratio so we can detect them within the sample size we have, I actually don't think that's a problem. And I think Ely's is quite an attractive paper because in that paper again with the replication study, then they did it in a more representative one in the smaller number of steps that they identified the first time and then say these are the odds ratios that we think are representative, really are the strength of the association that we found. But sometimes I think there are super cases and super controls and sometimes they're used throughout in replication studies as well. Sometimes they're blended in, all the data is analyzed together and so I think you will have that inflation of odds ratios sometimes because of the use of super cases and super controls. I hope that answers your question. So I just wonder whether you could give us some guidance for the epidemiologists when we look forward to the light to design a GWA study in the future. Obviously to avoid or to minimize all the potential confoundings sounds like both you and Terry talked about replication is extremely important. I just wonder whether you have any full part suggestions regarding for the initial GWA study. What is the minimal sample size in term of case control we're looking for? Because obviously the more the better but what's the minimum considered acceptable and then what is the minimal sample size would be ideal for at least considered adequate for the replication stage and how we balance these two stages. Well I will bet the mortgage that Dr. Monoglio has some comments on this. But this is really the interesting issue when you come up with an agnostic study design. There are no preconceptions. You have a half a million little markers and you have no prior hypothesis as to how they relate to the disease up or down. Therefore your effect size and the whole sample size calculation obviously is problematic. So I think that's one of the issues is that the usual prevalence estimates we go and look at etc. When you really don't know what the prevalence of the major or minor alleles are going to be or the difference between them is going to be obviously quite different. You could say we want to identify an odds ratio of a certain level I guess and do it that way. But I think what the rule of thumb has been has been kind of a thousand cases and a thousand controls. But I think that is just kind of right out of the ether. And clearly if you look at the one of the other issues with many of these diseases like for example macular degeneration some of these diseases you can't marshal thousands and thousands of them. Some of them are quite unusual diseases so you're going to be left with 200 cases and a thousand controls etc. So I think there's some issues there as well but clearly the literature has quite a number of studies with 200 or some of them you get very nervous when they're down to the 50 or so. There you go. So this was a very nice question that was addressed by the Welcome Trust Case Control Consortium. If you only read one paper in Genomite Association don't read Tom's mind. Don't read Chanak's mind. Read the Welcome Trust Case Control Consortium. It's a little long. Yeah it is big but it's absolutely brilliant. Very, very clearly explained especially the supplementary methods but one of the nifty things that they did was they said okay with the sample sizes that we had in our study what kind of power would we have had to detect the associations that we found if we actually went with smaller sample sizes and so along the x-axis you can see here you know what would the power be. These are the various associations that they found in their studies and they basically looked at you know if they had 500 cases and 500 controls they would only have found actually these are two that are superimposed so they would have found two of the 20 some that they actually found. With a thousand cases and controls I think it was about nine or so. A certain of them would be two. Here at a thousand. I think this one they only had 40% power to pick up the two. But here they would be certain to pick up two and expected might be about six if you go with an 80% power but even in this range here. And then if they went with 1500 controls expected about nine. So they came up with some very nice estimates of these you know being relatively modest odds ratios in the 1.3 range the vast consensus as it is in any study is the more the better and one big debate was would you be better off with more people or more SNPs and the welcome trust group also sort of showed that really you're better off with more people. You get more information from that than with more SNPs. So sorry to leap in with that but I thought it seemed timely. One more question about Korean studies in Korea there are two kinds of genomic studies from government funds. One is the population study I said yesterday. Another one is the hospital based genomic study and the key idea of the two kinds of different types are providing cases from hospital based study and collecting controls from the population based study but we have some protocol to standardize the exposure measurement among the population study but there's no standardization between the population study and the hospital based study. So is there any way or to overcome our limitation to different measurement for doing some case control study? I think there's a number of issues there. Number one is your population controls obviously you do want depending on what disease you're studying you want to make sure there's no contamination with cases. So for example if you're doing a case control study of hospitalized patients with coronary disease you would want to make sure that 10% of your population controls really hadn't carried that diagnosis. There would be some obviously silent cases as we all know but obviously one of the other issues is the second issue obviously is the genotyping you'll still see something that one population will have used one platform or one method of handling and another one would have used another. The platforms are getting better and oftentimes those studies will take some by both and show the concordance I think you showed it on the slide so that's getting better. Perhaps more of an issue would have to do with the processing of those samples if you see a very low call rate for example in your population control and a very high call rate it's obviously an opportunity for genotyping errors in which you're going to show differences between case control studies not because of the associated for the gene but because of how they were and which genes were missing because of or under assessed because of the genotyping quality. The third has to do which is oftentimes a problem are these other behavioral environmental issues that you'd like to use for adjustment if not interaction. Tobacco smoking for example would be a good example and it would be very nice if you could standardize the behavior collection between your use in your population with those of your hospitalized cases and frequently that's not done for example I think the the GWAS study of Mexican Americans with diabetes obviously that was a clinical group of cases from a diabetes clinic and a population study which was kind of just a general roster which didn't have the testing for diabetes so you had a variety of different informations and the possibility of information bias sneaking in there. So the recommendation in the Korean study would be to see if one could standardize the information that you're using in the population study with the institutional data. No I think we're about to take a break aren't we? I guess the other side to the sample size question is effect size. In the yearly study I'm seeing in the replications these really small effect sizes and how am I supposed to think about those sort of these additive genes playing with the effect sizes are so small I'm just not impressed so can you talk about that a little bit? I'm going to comment on that in my last talk which has to do with applications but that does, I mean what did you learn in your first 15 minutes of your case control lecture you know obviously is odds ratios in the size and we really have to be concerned about odds ratios less than two as being due to compounding and a variety of others and I'm sure you'll learn that the same way I learned that and so this is a bit of a paradigm shift and so what I'm going to comment on in mine is the consideration of the fact that what we might have done is chopped up genetic risk into all these little snips and a variety of them we're not really talking about all the genetic risk because when we try to adjust an account for the heritability which we learned about yesterday or the familial risk like in the risk associated with the variable positive family history we come up with quite small proportions of those accounted for mathematically and this is kind of you're thinking now well then you get what you deserve because an odds ratio of 1.3 probably isn't going to account for very much unless even if it's very prevalent and the point is how many of those odds ratios of 1.3 variants are out there because we're only measuring kind of clusters how do you sum those up right exactly and there have papers have been tried and I don't know Terry if you want to comment on the extent to which but it's it's it gives it this question that I was beginning to ask and I'm going to talk a little bit about my last talk is when are you done when have you found all of the culprits and I don't think we have a gold standard for doing that because we're not into copy number variants we're not into these regulatory there's even on a just the gene structure basis there's so much more out there so until we can show that we've accounted for all the heritability I think we're going to be left with odds ratios of 1.3 and I think it's important to recognize too that the small odds ratios used to make us very nervous in epidemiology because of uncontrolled confounding and measurement bias and the measurements here as long as you get rid of the genotyping error that you can pretty much control are really pretty good and so the 1.2 and 1.15 and that are probably believable the meaning of them is a big issue and Tom will go into that some recognizing that some of these may be pointing the way toward drug targets that might work for everybody so you know familial hypercholesterolemia was a very very rare gene if you probably could even pick it up in a genome-wide association study for cholesterol levels and yet the drug works for everyone except ironically the people who are homozygous for the variant or who don't have the receptors for the drug to work on. Another thing to think about is that as we do and more and more of these studies the welcome trust group again Peter Donnelly has come up with some estimates of how many variants might there be out there that actually do to variation and distance from the particular you know the actual causative snip on the platform you're actually underestimating the risk of so because you're a little ways away from it there might be a 1.5 or a 2.0 or even a 4.0 hiding there and if you do this for 100 diseases probably everybody is at risk for something you know we're all gonna die of something except me and you know the diseases that we might be a risk for you can actually sort of estimate gee you know you'll find 90% of the population will be a risk for at least one of these based on their genetic variants and if you could find that that would be very useful and perhaps you know do some interventions on that so when you take them one at a time you really have to think you know and one at a time maybe not but together probably and we haven't even touched the subject of gene-gene interaction and gene-environment interaction which is another part of the whole. Maybe just before we're breaking maybe ask the editors in the group if we could have some editorial comment about this issue of new genre new technologies et cetera a lot of excitement it deserves to be the scientific breakthrough of the year as it has been for a while. On the other hand it's kind of like getting a new sports car and then forgetting about the rules of driving safely. So Linda. So I was gonna ask you how to handle this because being an editor of a general medicine journal the ones that we get are most likely the ones that have been rejected by you know the genetics journals so you know we're aware of that. I'd like to sort of have a checklist that says you know put it back on the authors because finding reviewers who actually understand genetic epidemiology well enough to be able to interpret the papers is difficult and I think this is an area where there's you know you can probably think of 10 people off the top of your head who could review these papers and they're all reviewing all of them and they're busy and you know they're gonna say I mean most likely thing is they're gonna say I already reviewed this for another journal and didn't like it. And maybe the question is for me a lot of the times I wanna be the first fly on the whale anyway. If they're sending it to a general medicine journal and that's where they think it belongs it's probably bad. So I don't know it's an area that I'm definitely struggling with you also wonder if it's a good idea to be down the food chain and be the one publishing the negative studies all the time you know how many of those are gonna get cited etc. So I find it's just a complicated area and as you were giving your talk I was wondering if you could imagine having a checklist for the authors you know does your paper qualify this way this way this way and if they don't you can just reject it without even reading the paper. We had a talk last week that was kind of interesting that showed that same paper that you did that what's it called most published research findings. Yeah so the speaker there had an interesting suggestion is that maybe review should be two phase so the first phase would be just look at the introduction and the methods and if you're not satisfied with the introduction and the methods just reject it and don't even look at the data and that might actually be good because then you would be even more agnostic to the finding. I'm thinking about whether we could actually do that as an experiment. Linda. Well I was actually hesitating to talk about the food chain that is the nature of what we deal with every day and just the enormity of what you're telling us in terms of something like sensitivity of diet and the slides that you showed Terry in regard to just dietary fat and its relationship to HDL the implications for that are just enormous and in this mindset of designer diet and I must admit I sort of skip to the end to see if you're going to go back and deal with some of this and it looks like you are so that's why I didn't ask the question but I would agree with you Phil that first and foremost I think in terms of methodologic issues to ding a paper so to speak upfront before you even get into it I think that really is the only way to do this in an efficient manner and I sort of try to do that already in terms of just the open gatekeeping that we do in terms of a paper but as I say in terms of our authorship and what people are writing about again if we don't even get at some of these issues related to genetic variation in terms of ability to respond to a dietary intervention I don't even know how to begin to address those kinds of questions I don't know if we're there yet I mean on the basis of what you're describing do we have a public health model from what I know about nutrition I know there are certain things we can say are good for everyone but in terms of things that would be better for people who also have these genetically deviated responses to certain dietary interventions I mean that's where we want to go but I know I don't see how we're going to get there anytime soon yeah I would just have a couple of comments number one is that that was to some extent the backdrop to the paper how to interpret genome mind association studies that was essentially this issue because it wasn't clear the readers to say nothing about the reviewers to say nothing about the editors and that was kind of the thrust and we were placed that the JAMA leadership appreciated that issue as well the second is that's what this course is about because the only way we're going to get enough reviewers particularly say reviewers with the content experts and diabetes or in Crohn's disease or whatever getting them up to speed in the whole genomic issues obviously starts to marry the two sides of the things we want so they can be a competent reviewer because there's all that disease content stuff that I would have to say is given oftentimes kind of short shrift that any old case will do and kind of idea so I think that's a very important part of this and I think the third point I would have is to encourage epidemiologists to get even more involved in the design and conduct of some of these studies as a co-investigator so they can start saying well if you don't put that disease criteria on as an inclusion or exclusion criteria you're going to have all kinds of problems later and this is what you're inputting oftentimes will not be added earlier so I think those are some of the messages from this Terry. Big opportunities here for epidemiology is actually to take the skin the genome we have them listed in our catalog and test them in your cohorts and then see where you have really good dietary measures and what are the interactions and what are the differences in the way people respond I mean that's work that's just begging to be done and we can do it.