 Well, I want to continue some of the comments that Terry had made on family-based studies and really talk about the family-based studies as the pre-emboldened and where the ideas were that diseases were aggregating in families, which then, of course, has led to the rationale for perhaps less hypothesis-oriented studies like the genome-wide association studies. So I want to just talk a little bit about study designs to generate or test genomic hypotheses, kind of in a broad, more philosophical sense as we start talking about genetic associations. We are epidemiologists, after all, and we can associate almost anything. We want to describe the major studies designs which involve genetically-related individuals, and these are a couple more. I want to talk a bit more about twin studies and then get into something called trios that I introduced with the Crohn's study that I started with and give you some literature examples and then talk about the advantages and disadvantages of these designs in disease gene associations. But just to say is that for many diseases, the associations in families obviously came from the bedside, and William Osler talked about the familial aggregation of coronary disease, William Osler is the kind of father of internal medicine, and certainly around the turn of the century, and so Hippocrates talked about familial aggregation. So this is just something observed, and here's an observation. Some of you know David Harrington. David is Associate Dean for Research at Wake Forest and was a fellow, like Terry, with me a number of years ago, and he's now doing quite a bit of genetic epidemiology, and one wonders if this is where he got interested. And this is a set of my patients, this paper is a series of several twin peers, and these two 51-year-olds were Hungarian refugees. They crawled out from under the barbed wire in Hungary in 1956 together. They went to the same universities. They were electrical engineers at the same defense firm in Baltimore. They smoked the same brand of unusual Central European cigarettes. Their LDL cholesterol was the same to the milligram per deciliter. Their blood pressures and diabetes were normal, and EB here showed up at Johns Hopkins Hospital with an inferior lateral myocardial infarction and a little bit of ventricular tachycardia and underwent cardiac catheterization, and was referred to me in our preventive cardiology clinic. Six months later, AB, his identical twin brother on a business trip to Detroit, came down with an episode of severe chest pain in an inferior lateral myocardial infarction and underwent cardiac catheterization at the Henry Ford Hospital. I had the film sent to me. One of the striking things about the films is that we had to put some extra tape on them because they were identical. In other words, if you mix them up, you couldn't tell who was who. Those of you who aren't cardiologists might not know right versus left dominance coronaries, but the left dominance is the minority of individuals, both of them had a left dominance. Their right and left interior descending coronaries were normal, and they both had a single lesion, a 90% synosis, at the same place of an obtuse marginal branch of the left circumflex coronary. So what is this? What is this an example of? Well, this is an example of publication bias, because obviously this is very interesting. It's an anecdotal case. It really doesn't prove anything other than the possibility of this is if you have identical genes, identical behaviors, possibly genetically identical organ structures that you could have identical structural physiological and clinical disease. This has been the basis, obviously, for interest in families and interest in family studies, but I still think that the reason that this was published was somewhat of a bias of confirming what we already suspected, but it certainly was interesting and certainly got the idea that this did have something to say about the genetics of coronary artery disease. So epidemiologists have been obviously studying disease for a long time, and particularly relating it to altered physiology like high blood pressure or gene products like LDL cholesterol, et cetera. And what the whole opportunity for gene association studies is, is to go upstream to look up to not only some of the physiologic or protein products of genes, but actually the expression of those genes and the polymorphisms that lead to either structural differences or levels of product differences. And so the point is, is that a lot of our gene association studies obviously start with phenotype, but then start to explore things along this pathway, obviously ending up with gene variants with the new methodologies that Terry's talked about. But the point is, is that it is a logic regression of our epidemiologic activities, and so this is, again, another reason why I think this course on population genomics obviously is key to really the next generation of epidemiology. So there's a variety of questions then regarding the genetic etiology of disease covered here, does it aggregate in families, is it inherited from parent to offspring, which chromosomes would carry the disease gene, which specifically genes are associated with the disease, what gene variants are associated with it amongst the genes, and what gene products are altered as a cause of it. So a variety of questions to ask to answer the real question, does this, is there a genetic etiology of the disease? But we're going to talk about here our twin studies linkage analysis, just some other comments on family-based designs, but these have all led to identification possibly of candidate genes, which as Terry has already described, have a relatively sobering track record in terms of being able to be reproducible, although there are certainly some major success stories. And then these genes were tested for disease versus no disease and would replicate. What we have philosophically with genome-wide association studies, these are given the cognitive agnostic because there really is no prior hypothesis, no religious dictum in which to essentially say that A causes B, but rather the entire genome is tested for disease or no disease, and then with the exhortations of experts like Terry Monoglio have required really replication so that we're not totally confused by a lot of alpha error. So, but just in terms of genome association studies on a philosophical basis, we tell our students obviously that hypothesis-driven research is the way to do things, you come up with your hypothesis and you test it, etc. And the genome-wide association studies with really a different philosophical approach, I think, jarred us a bit, and there's a variety of subtle and non-subtle implications of this agnostic approach that I think we're going to continue to talk about, but given their sway in terms of state of the art in the whole gene association area, I think it's appropriate to just ponder this really lack of saying that we know what's up here and we're going to test a hypothesis. This says we're going to look at a million polymorphisms and we're going to find out how it sorts out. And we're going to talk a lot more about that. Family history, obviously, is an independent risk factor in many diseases and obviously we teach our students in epidemiology the importance of defining a positive family history and obviously some of these are self-reported versus verified. It's important to specify divisional elements, the age of onset, the degree of relatedness of the affected relatives, the number of relatives. Our departed colleague, Roger Williams, probably, I think, has done the most elegant work looking within coronary disease of relatives and onset of age of onset, et cetera, with this relative to coronary disease, but it does describe the definition of a positive family history as perhaps a little bit more subtle and complicated than sometimes we give it. We also have to remember family information bias and all of us clinicians have had a patient where, say, in the coronary care unit, Dad comes in with this myocardial infarction and the question is, has any other relatives had this and you have the interrogation of the entire pedigree to the extent that any of us would have loved to do in a field study whereas someone without that disease wouldn't have that. So there is this family information bias, the flow of family information about exposures, the illness is stimulated or directed to a new case in the midst, so there is perhaps the deflation of cases in controls, the deflation of relatives in controls compared to that of cases. Terry talked a little bit about the relative risk ratio. Again, I measure the strength of familial aggregation, the prevalence of disease and relative to the affected persons over that of the general population. And here is a list of the risk ratios for a variety of diseases and one of the recurrence themes here is if you look at these with pretty sizable risk ratios, look at autism here, these are the diseases now that are showing up as the focus of genome-wide association studies. And this has been the rationale for targeting diseases, particularly some of the psychiatric diseases, for example, are these large ratios and certainly is the preamble to doing some more sophisticated studies to find out the candidate to genes that are causing this. Now, siblings and first-degree relatives, obviously, if you have, say, two alleles each, what you have is about four chances out of 16 that these two siblings will share that allele. You have four that share neither of the alleles. They'll be totally different and then the other eight will have that they share one or the other of the alleles. And obviously this is the Mendelian inheritance pattern that Terry was talking about and allows us, obviously, then in family studies to make all sorts of hypotheses. There have been a variety of studies in epidemiology, as you well know, of nature versus nurture. Migrant studies, for example, are another group of studies which would be in this. And twin studies, obviously, had their own place in the development of genetic hypotheses and if you look at the genome-wide association studies, there will be frequent citations of comparisons of monozygous and dizygous twins as the rationale for such studies. So about .3% of births are monozygous twins. A .2 to 1% of births are dizygous. Apparently this is quite heterogeneous geographically with Africa having the highest rate of dizygous twins in Northern Europe being below. Studies of twins reared apart, obviously, test the nature versus nurture and adopted twin studies have also been useful. There's also additional studies of siblings and we'll maybe comment on that a little bit later. So a variety of studies in these groups and there's been measures of family aggregation in qualitative traits that term concordance and quantitative traits correlation heritability and I want to just say a couple of those comments before we get on to some of the study designs. The concordance is the number of twin pairs with disease among, it's calculated the number of twin pairs with disease amongst those twin pairs with at least one affected twin. One would think of this should be a 2 by 2 table but obviously then everything would be almost 100% concordance in an infrequent disease. So you take the number of twins with both affected divided by the number of twins with both affected and one with only one affected. If it's less than 100% in monozygous twins you suggest you have non-genetic factors and if monozygous is greater than dizygous obviously it's evidence for genetic factors and the simplicity of this whole thing I think adds to its being convincing. This is kind of one of your classic studies where monozygous twins and dizygous twins were looked at and the number of concordant pairs really weren't that different but when it was then stratified by less than 50 or greater than 50 you had 100% concordance in early onset Parkinson's disease compared to greater than 50 year age monozygous Parkinson's disease and obviously now we recognize early onset Parkinson's disease as a distinct disease entity and one in which many of the genetics studies have focused. So just the use of concordance and again this concordance in monozygous first dizygous obviously see these large difference in things like non-traumatic epilepsy, schizophrenia, bipolar disorders, rheumatoid arthritis, psoriasis, inflammatory diseases, structural diseases, lupus. Again if you go down the list among the first 109 genome-wide association studies virtually all of these have been targeted and this has been the rationale for the studying of those first having a genetic origin. Another opportunity is to look at qualitative traits and Terry's going to talk some more about qualitative traits as well but obviously you have your nice bell shaped curve or possibly a skewed or maybe even a bimodal curve in which you can study variants but frequently in a number of genome-wide association studies have done this and looked at say the upper 7.5% versus the lower 7.5% looking for differences in gene associations with these qualitative traits and in this instance quantitative traits I'm sorry and correlation and heritability would be opportunities there. This is Manning Feinlieb's study of blood pressure again despite systolic blood pressure having had some tough sledding in the genome association world in terms of identifying polymorphisms related to it. There remains this correlation of blood pressure suggesting the monozygous twins are much more strongly correlated in terms of their systolic blood pressure than either thezygous or siblings and parents in offspring correlations and certainly more from spouses which would be a suggestion of environment. So certainly a way to look at the twin pairs from a quantitative trait basis. Heritability has been mentioned again that's the variance in dizygous pairs minus the variance in monozygous pairs divided by the variance in dizygous pairs in twin studies and it's the fraction of the total phenotypic variation of this quantitative trait that's caused by genes. It varies from 0 to 1 and if it's greater than 0.7 or 0.8 it suggests that there's a strong influence of heritability. So as you read the literature you have an idea of what this means. The limitations of twin studies obviously is that environmental exposures may not be identical even in monozygous twins or there can be a very highly similar exposures and maybe that can be almost as confusing as the reason for association. There can be some differences in gene expression. There may be some heterogeneity of the genotype between twin pairs as some suggestions from why some of the twins have the differences they have. And then there's this concern about an ascertainment bias in which a cotwin with a disease is more likely to participate in a twin study than the cotwin who is unaffected and so just a concern that you'd want to have good participation rates from all the twins asked to participate in a twin study. We now want to talk a little bit about linkage analysis and this is the family-based approach to the identification of susceptibility genes or at least it's starting to be able to locate them on the chromosomes and where they might be in the genome. Linkage is the tendency for alleles at one locus low side that are close together to be transmitted together in an N-tech unit or a haplotype and this has to do with recombination. The further apart as Teri mentioned the further apart the genes are the more likely over time is that there will be a recombinant event in which they will end up on different chromosomes. So we try to measure this frequency of recombination with a recombinant fraction. This varies from 0 to 0.5 with 0 being tightly linked no recombination these two genes are always found together and 0.5 is unlinked and totally independently associated and the distance between them is then given in centimorgans. This is a map distance rather than a geographic or physical difference. It's the genetic length over which one recombinant crossover will occur in 1% of myosis. So the genetic length gives a certain genetic length and again as Teri's mentioned there are recombinant hot spots etc. that would obviously decrease that or look like it was even further apart. So as you go down generations if each of these is a generation obviously you have recombinant events and so that the genes with very little recombination obviously end up as being very closely clustered together in which the liquid disequilibrium suggests that they're being passed on together and are physically associated as one goes down through these generations. So this whole idea of linkage disequilibrium obviously is to take advantage of with the recombinant fraction. Obviously the extent of recombination obviously is a function of the number of generations and recombinant fraction so that ones that are far apart and already not associated will become completely disassociated in complete equilibrium in very few generations whereas those with a very low recombinant fraction as the theta being almost zero may over many thousands of generations never really come into even close to equilibrium of them basically disassociating. So this is the background of linkage analysis. So in looking at linkage and family studies you would assume a mode of Mendelian heritance autosomal dominant etc. you would identify markers with known positions to serve as the references and then you would determine the number of first degree relatives who show recombination assuming different values of theta. And then this Lodz ratio that Terry had introduced it's the ratio of likelihood of observing the family data that you observed up here with the various values of theta to the likelihood of observing the family data if the low side were totally unlinked. So you take the family data from up here make certain assumptions about what the recombinant fraction would be and then assume it relative to no theta of zero that is a complete disassociation. So this Lodz score the logarithm of the odds or Z is the likelihood of the data if the low side are linked at a particular level of theta versus the likelihood of the data if they are unlinked. And so the best estimate of theta is the recombinant frequency between the marker locus and the disease locus and the magnitude of Z really identifies which of those likelihoods is the greatest. The Lodz score of greater than three is essentially a thousand one odds that the low side are linked at that level of theta and Lodz scores can be added across families. So what you're trying to do is essentially within these correlation these linkage blocks a lot blocks of linkage disequilibrium identify the likelihood is is that your gene markers be one of them over whatever are linked together the likelihood of that occurring versus them perhaps being in another block of markers. And that ratio of the odds is the Lodz score and gives you an idea of which ones were or were not linked. So I think the thing to remember in terms of reading about these is obviously a Lodz score of greater than three would be what you would be interested in identifying as something that are physically linked on the genome. Now the first example I gave was talking about trios and this is the study design which is a little different that we frequently use in epidemiology. That's the affected offspring in both of their parents and basically that's all. That's what the trio is. There's not unaffected offspring or other individuals. And the phenotypic assessment only is in the affected offspring. The genotyping is in both parents and the affected offspring. So you spend your money on phenotyping the children of the parents and you spend your money on the genotyping of all three so it's a relatively efficient in terms of the expenditure of phenotypic resources. These are used in both discovery and replication GYs and you can come up with examples of both in which a trio design was used say to identify from say 500,000 SNPs identify say a smaller number 20,000 or so that then could be put into a case control design. Probably more frequently would be your typical sequential design of a GY study would say a half million SNPs then replicated with one or two case control studies with at one of those phases a trio being involved because of some of the advantages that trios have. It's really a different it's kind of a different study design but it's also not susceptible to population stratification which we're going to talk about tomorrow. And this is this kind of confounding is to the sampling of cases and controls of populations of different ancestries will clearly in a trio you know who the ancestors are. You've got the two parents and you've got the affected and so this is this does not a problem in trios. And so as one component of a multi-stage genome wide association study this could have some advantages. The test that is done then is test whether any given allele at a given locus is transmitted to the affected offspring by parents more frequently than expected by chance. The chance would be 50% so heterozygous parents would transmit the alleles at a given locus in equal frequency. So 50% frequency of any given allele being one of the two alleles of the child and affected offspring should receive the disease associated allele more frequently and therefore there's no need for a control group. And this is called the transmission desiccant equilibrium test TDT. So here's a study of type one diabetes with this particular allele and what you have are a pro bands who are not affected with diabetes and those families which we have a pro band the child with affected diabetes. And in these children there's this many transmitted and this many not. This should be 50-50 and you can see it's almost 56% in this non-affected group. You can see it's basically one-to-one 50-50. These are not significant from 50% these are highly significant from 50% suggesting that this particular allele is transmitted more frequently in affected families than by chance. So this TDT is again a little different study design than we have in other parts of epidemiology but I think still quite an efficient and useful one. It gives you very similar data. This is another type one diabetes study from Holkanerson and here's actually three SNPs. Again with the case control study done as part of this. Here's the allele that was looked at and the minor allele frequency. So in this instance what you have is your case control study. Your minor allele frequency here is greater than your controls. It gives you an odd ratio of .8 and a p-value. In this instance the controls have a higher minor allele frequency than the cases and so this allele is a protective allele. It gives you an odd ratio of less than one, etc. And if you look at then within a subgroup of this study, a second phase of this study, again you had still the comparison of the same allele with the say the wild type allele, the major allele and you can see the transmission here rather than 50-50 should be was much different than that here. It's the other way because it's protected and so you can see in this particular study where they did both of these in different subgroups a very similar kind of information from the initial study to the trio study. So I think if I were to design perhaps the ideal genome-wide association study it would be nice to have one of the replications perhaps as a trio because you'd obviate this risk of having population stratification. Now there are some limitations. Obviously one of them is it's difficult to assemble trios if there's a late onset of disease in the affected child. Obviously you need the parents and so if you have a late onset of disease you're going to have some difficulty assembling the trios. Secondly and more suddenly they're sensitive to small degrees of genotyping errors in which the transmission of the proportions between parents and offspring gets distorted and there's actually one of the 109 GWAS that I reviewed this study by Kirov and Schizophrenia is an example of that where it appeared that they actually handled the genotyping different in the parents than in the proband and came up with all sorts of distortions which is described in this paper. So there are some disadvantages of this. There are some other issues to talk about with family based designs. There also have been genome-wide associations and affected and unaffected siblings and kind of a TDT has been used to analyze those. An area that I find very interesting is this trying to account for the heritability or genetic risk. In other words if you have a positive family history and you add the genes and the risk factors to a kid, you're going to have a positive family history and you have a positive family history and you add the genes and the risk factors to it. Can you account for it? This kind of gets to the question of when are we done. In other words how many gene variants do you have to study before you say I have accounted for the genetic aggregation of this disease. So for example this would look like say if your multiple logistic equation had a term with positive family history and say this gave a likelihood ratio of say less than or relative risk of 2 or 3 or something. What would happen if you added the various polymorphisms to that? And there have been some other studies have done this and talk about the percent of familiar risk which is a very good example of that. So I'm going to talk about the percent of familiar risk which is accounted for by these gene variants. There also can be obviously the multiple adjustment of intermediary risk factors to identify risk and first degree relatives and this obviously has been a lot of discussion in the Framingham risk study in which their initial discussions showed a relatively little predictive value of family history after adjustment for cholesterol and blood pressure etc. This has re-emerged with perhaps more precise risk factor data from the multiple generations now of the Framingham risk and so I think this continues to be an interesting area. This is a study that I've been involved with for a while. This is the sibling study with Diane and Lou Becker at Johns Hopkins Hospital. She started enrolling siblings, 30 to 59 year olds of patients, siblings of patients with coronary disease with onset of less than 60 years and then following them forward for instance coronary events. Turns out that their 10 year risk is about 20% overall so it's a relatively high risk group just on the basis of them being brothers and sisters of early heart disease patients. We also calculated the 10 year risk from the Framingham risk score was calculated at baseline and these individuals particularly the men had a 66% excess risk then would have been predicted by the Framingham risk score at baseline. Women were closer only about a 12% increased risk but the suggestion is at least in this group the Framingham risk score in siblings really falls short particularly in men and there are some additional things there which I think we would suggest would be genetic. So in conclusion I think family based studies have been the cornerstone of identification and quantification of familial risk and the heritability of human diseases and again do provide the rationale for getting into larger more complex, more expensive study designs. The linkage analysis identifies the location of genes with known markers and we're going to hear about the HAPMAP and other studies from Terry next and I didn't want to talk about this trios as a family based design that's been used both for discovery of replications and GWAS and certainly in candidate gene studies. But so the family based designs I think will continue to be useful. They've been again I think incorporated with some of the genome-wide approaches but they still form certainly an important part of the genetic epidemiology literature. Questions? Bill? So am I correct if I think of heritability as sort of roughly attributable risk for all the genetic exposure? Is that a similar concept or not? Well heritability is the percent of variants explained. I think it's kind of the discrete versus continuous. I think they're kind of apples and oranges but I think for me the heritability has to do with quantitative traits and the extent to which the variability in those quantitative traits can be explained by heritability. Attributable risk is what proportion of those cases can be accounted for by that gene and so it's I think just mathematically they're quite different from the get-go. I would agree. If attributable risk is the proportion of disease it can be explained. Really with heritability you're looking at the proportion of variability in disease whether they have disease or not or the proportion of variability in a trait. So while they're somewhat related concepts I think they're not they wouldn't map. Hi Xiaobing Wang from Children Memorial Hospital. I just wonder for twins besides the utility for estimate heritability what other utility could it be for genetic either association study or GWA studies? Well I think there are there are many many opportunities within twins to study a great number of things. Obviously the extent of concordance and heritability is of interest of also interest within the monozygous twins is some people call it discordance or lack of concordance because then you can start looking at if the genome is essentially the same but if the phenotype has some differences to it there's the other flip side of the coin in to look at what could have caused that. So in the same way that you're interested in twins because their environment in there is very much the same and then with dizygous twins versus monozygous twins you can look at the difference in the genome. I think within monozygous twins with difference in phenotypes you can look at the extent to which there are a variety of not only environmental but I think some other issues like epigenetic and other kind of post genome things that have been been going on and this could get into things like pharmacogenetics and a whole variety of things because you're basically stratified by the genome so you have a complete culture. You can look at it as Tom said sort of post genomic modifications so epigenetic modifications occur by the environment and there have been some nice studies showing that epigenetic changes in identical twins are very similar at young ages very very different in their 50s or so that's kind of the classic study. Also a recent study of copy number variants that many copy number variants actually arise somatically so sort of after embryogenesis and that and showing differences between identical twins and the numbers of copy numbers and that association with I think it was schizophrenia with one of the psychiatric diseases.