 So, again, while people are sort of getting their coffee metabolized and that, David, it would be interesting to hear a little bit more, if you can, expand a bit on this issue of choice of controls. And I know there was some discussion at the break that we really ought to be thinking seriously about convenience controls and how it goes about that, and so there may be others who want to speak on that. But in terms of the convenience controls that have been done so far, can you talk a little bit more about what the strengths and weaknesses of those might be? So I think the strengths are that, you know, these are large-scale samples, and often we don't have, for a particular phenotype, studies that are as large as 3,000 controls in the Wilkham Trust example. You know, we might have a best case study for a rare disease of just a few hundred. So in that context, probably the best advice would be to genotype the controls you have that meet the best practice epidemiologic methods, but be aware that there may be other populations you can import to essentially increase your number of controls, and that will give you some limited capacity to compare a little prevalence in the best practice design with the in-silico controls. And the other issue that Elizabeth brought up is that as the platforms change, the meaning of these in-silico controls is going to change. So as you move from 100K to 317K to 550K, and as you import these control groups, you're going to have a subset of SNPs that aren't represented on those controls. So again, I think if it's possible, you'd always like to have some of your own controls just so you can compare a little prevalence, recognizing that you're not going to have maximum statistical power by any means. Yeah, I think it's really important too to think about comparing your allele frequencies to the HATMAP and to the other controls that may exist, because they do point out some quality metrics sometimes that maybe you're not aware of. The other thing I would suggest that I found is interesting is that most of the studies have not even genotype sex. And one of the discrepancies that we found in working with individuals is sample mix-ups over time, and you don't really know that they exist, and actually just getting a genetic test for sex among your samples will show you where there may have been a small batch that went in and people were years ago, they reversed tubes. And this has just been carried on in your study over time. So actually getting some aims done on your study and some indication of the sex of your samples will help you to go forward with some of these large-scale studies because you have an idea of quality and integrity of your samples. Those are excellent points, Debbie, and I think David can give examples of breast cancer studies where unexpected men were included and prostate cancer studies where unexpected women were included and outbreaks of twins in duplicate samples. It actually does make a very good point that the platforms are now so good, you know, not for every SNP, but across the board, the completion rates of 99 percent, concordance rates of 99.5 percent, that the biggest source of error may be this repository and other errors. And so having a fingerprint very early in the life of the sample, people used to use the microsatellites. Now people seem very happy with a sequin on my plates, about 30 SNPs, something like that. Having a fingerprint very early in the study allows you to track these errors if somebody turns a plate upside down. There are other ways of catching that, but you can catch it unambiguously if a couple of samples are flipped if you have an expected fingerprint from very early on in the life of the sample. Microphone, please. Thanks very much for a really interesting day so far. And I have a question related to a point that Elizabeth made. I think it's really important that the genotypes are actually phenotypes themselves, and they're actually not categorical. They're actually these two quantitative allele intensities. And I wonder, it's a technical and a statistical question, if people are starting to look at using that as the exposure measure. And if anybody has experience in studies with that. That's a really interesting question. This is one of those questions, you know, that could be pursued in data sets. I don't know that there's a lot of that, and maybe some of the genomic system in the group could comment. There have been some interesting, you know, explorations of samples that fail whole genome amplification, for example, that there are certain characteristics of people that will fail that, and that that may cluster in families, so there might be genes related to that. Debbie, did you want to comment on whether these are being these phenotypes, the variant genotypes that one sees from these are somehow possible to use as outcomes? You can use them, and actually the copy number variation is a great way. That was found by weird clustering. So looking at these outlier genotypes that you get, where you see multiple clusters, and actually people are finding these differences. For example, on some of these platforms, they're typing over unknown SNPs, and that causes actually four clusters or five clusters to occur, and you're getting genotype information on that unknown SNP, and you can go back and mine that information. They may be thrown out, so I think that actually looking at some of these things are really important, but they're not the first thing that you look at, right? Good point. Mike? Thank you for two great talks. So this is a question about measure analysis projects, and the effect of using different platforms in studies in a metro analysis. I'd like to hear more about that, and also the different choice of control groups in studies in a metro analysis, and how that actually might make you have pause in some of these metro analysis projects. So maybe Elizabeth, could you comment on the different platforms? I can comment a little, and I think Laura Scott will probably talk about it a little bit more later. It really depends on how you're going to do your analysis. There are different SNPs on each of these platforms, so if you're combining data where you have different SNPs in the two studies, then you're going to have to either infer the genotypes from the other platform or calculate haplotypes using the SNPs from both and use the haplotypes, not the individual SNPs, or look at linkage to a region and see whether or association to a particular region or haplotype. So it makes your analysis a little bit more complex, but it's certainly possible, and there are a number of groups out there who have done this very successfully. So it's not nearly as critical as people thought it might have been a few years ago to have exactly the same product and exactly the same SNPs, because a number of groups have shown that it's very possible to do the analysis. It's just another technical challenge along the way. And Laura will probably comment as well, but certainly in the diabetes studies, whether it's those three diabetes studies that I showed, when they went to combine their data, they were combining across the aphometrics and lumen of platforms, and they used imputation algorithms to do this. I think one of the take-home lessons from that is that imputation algorithms are really very good, but they're not perfect like nothing is. And once one looks at an imputation and sees that it looks pretty good, you do have to go back and actually genotype the SNP that you're inferring. Just being aware also that there are gaps in the platforms, and so I guess cognitive function is one of the welcome trust disease groups, but oops, there's no good surrogate for ApoE4 on that generation of the aphid chip. And you can get these printouts where there's an informatic attachment to the gene name to the RS numbers, but if you're really after a very specific SNP, you really have to confirm that there's something in good LD with that. And I think your second question, Mike, was one for David. You might want to repeat it if there's to do a variation in the control groups and studies in a metro analysis. So again, I don't think there's much evidence so far that reasonably well-designed case control studies compared with reasonably well-designed nested case control studies compared with hospital-based versus population controls leading to radically different answers or even perceptibly different answers for these stronger SNPs. In a metro analysis, nonetheless, I think you'd always want to run within strata defined by design or some measure of quality just to confirm that. And as we get to the lower relative risks, you'd expect that if we're introducing any methodologic noise, that's going to make it harder to discern those. But I think we do have to be aware that the traditional reasons for concern about certain designs related to information bias and selection bias may not be so pressing in this area if there's no relationship between participation rate and genotype, et cetera. So we have to be careful, and probably in a metro analysis, you'd want to stratify. But so far, I think, the concerns we have with self-reported data and retrospective versus prospective data may not apply in such strength to these studies. Thank you. Certainly, when I was a baby epidemiologist, they told us first be terrified of case control studies, which I don't think was great advice. And also, if you're going to do them, consider using multiple control groups. And one of the big advantages now of data sharing and of databases like CGEMs and others that have their control allele frequencies widely available is that you can look at multiple groups and see, is my control group somewhat comparable to these others? And if not, why not? And also, do your findings hold up? Several of the platform developers are actually developing large control databases that they're making widely available, so Illumina just announced theirs. And that one will be going into DBGAP as soon as some of us do the work as needed to get it in there, myself included. But at any rate, those will be available. There has been, for many years, discussions about trying to get the NHANES genotype frequencies, because that's a nationally representative sample into some kind of an accessible database, and the work is going forward on that as well. Yes? This is a basic question. How do you define a population? And then second. So then in some of the papers that I've come out in the last two years, we know very well different populations, different regions of the genome undergo different selection pressures. So therefore, the population history differs from even very neighboring population, which we know very well. So when we put these things, these things, these things together, that is, mixing populations and then seeing the effect of these mixed populations. And then the question is, what are the chances of finding spurious results in the pool population? Third. Well, I will stop there. Why don't you stop there? We'll come back if we can. So who would like to tackle that one? So very quickly, I mean, clearly this is a major concern. I think the genomic control and other methods give us some confidence that we can approach this statistically. And anyone would advise that you collect self-reported information about ancestry, however that's defined, a country of origin or whatever, to allow yourself to restrict or stratify on the basis of that information. And then the other key thing, obviously, is that a lot of these studies are obviously happening in populations of European ancestry. There's no guarantee that that SNP will also be associated with the same phenotype in other ancestries. And that has to be looked at carefully. And then finally, as the costs come down, as the sample sets increase, there are going to be associations that are private to particular populations that have to. Will only be ever be discovered if we look in those populations. Just a comment along those lines. I prefer that some of the people that are doing GWAS studies give importance to a concept called wall and defect in population genetics. It's an important concept. And maybe you'd take a second to explain what you mean by that. Wall and defect, long time ago, a population geneticist named Wolland suggested that by putting the so-called mixing these populations, it affects the genotypic frequencies spuriously. And when you use those elevated genotypic frequencies, it could affect the association studies. Well, and I think that is widely recognized. And people are considering it. Perhaps they're considering it a little bit too much. And perhaps Laura will comment. But what we've tended to do is to focus on very homogeneous populations and somewhat to the detriment of being able to describe risk in the general US population. Laura? So there are a couple of interesting things. One thing that I've already started to see happen is that if different groups are collecting case samples and they're comparing those to existing control samples, once you go to try to do a meta-analysis, you actually don't have independent samples. And so you're going to have to go back and redo analysis, either in some way allocating controls or taking into account that you've actually used the same samples twice. The other thing that's interesting to think about, sorry, this is having a problem, is that when you have epidemiology studies that have matched particular variables such as if you had a study of type 2 diabetes that had matched on body mass index, what we're finding is that sometimes genes that work to cause diabetes through body mass index actually can't be found in those epidemiological studies that were very tightly matched. And so in those cases, you might actually have advantages to having population-based controls. Very, very good points. And just before Elizabeth, maybe I might ask the AV person if he would kindly come out and put up Laura's slides, because we're just going to finish up here. And then Elizabeth's comment. Just David asked me one question, and I thought I'd just mention it. The question was about what about pooled samples? So one possibility that people did, especially before genotyping costs started coming down, was to pool, take a pool of cases and a pool of controls and compare estimated allele frequencies between them. That's still being done. It's a little bit challenging technically, both in making sure that you take very good care when you're building the DNA pool so that you have equal amounts of DNA from each individual. And then it's a little bit tricky as far as the interpretation. You can't just run a genotype and clustering algorithm and get genotypes. You have to try to estimate the proportion of each of the genotype frequencies. So it's still possible. It's certainly a consideration. And it's obviously a much lower budgetary cost. But you need to work with somebody who really knows what they're doing, both on the lab end and on the analysis end. Yeah, that's an excellent point. And this was one study design that I think was used initially when genotyping costs were so very high. And of course another disadvantage of it is that you don't have individual genotype data than to analyze, to look at your very interesting folks.