 Yes, there's no question that this is an issue that everybody recognizes its importance by all the questions and the comments over the last day and a half. So I'm going to touch on a few issues and hopefully other issues will come up in the discussion. There's no question about the importance of needing adequate phenotyping data. Genetic variants alone are not going to account for all chronic diseases. We need to evaluate environmental as well as, most importantly, gene environment interactions. And if they're not identified, then it could mask the detection of a genetic effect or lead to inconsistencies between populations with different environments. And it also can suggest approaches for modifying the effects of the genes by avoiding or modifying the appropriate environment. So it leads us to the issues about prevention and treatment that we care about so much. And when we talk about breadth versus depth, depth of phenotyping, certainly we're aware that there are trade-offs in terms of logistical issues, that the depth of the information we get is going to be inversely related to sample size and it's going to be directly related to the cost. But I think that we can't think of a resource as being just simply broad resource or a deep resource. It's really going to be a judgment based on relative to the questions that we're trying to evaluate. So in fact, the same cohort can have phenotypes that are being defined broadly enough to allow a substantial scientific contribution for some questions and or deep enough to allow them to make a contribution to other kind of questions and all of it is going to be in the exact same cohort. So what I'm going to do if it's all right is talk about, illustrate these principles or these examples, but in the context of a specific cohort. And I'm going to use the women's health study because that's the one that I'm most familiar with and I'm going to use it as an example of what Eric referred to as a cohort contributor to the scientific commons. But the important thing is that there's nothing unique about this particular cohort. It is not alike every other cohort, but on the other hand it's not atypical of a cohort. And so I think it's a good example just to get the discussion started. The women's health study, these studies have to start somewhere and then you need to know where they are now. So the women's health study was designed as a randomized trial of aspirin and vitamin E and the primary prevention of, and the important thing is two outcomes. Not one, not just cancer, but cancer and cardiovascular disease. And in fact it was jointly funded by the NCI and the NHLBI. We have about 40,000 participants. It began in 1992. The trial ended in 2004. We then followed them observationally to the present. So now we have a mean follow-up of 18 years. The participants are throughout the United States. The follow-up is conducted entirely by mail from our research office. So it's not distributed. There aren't local sites. There aren't in-person visits. It's completely by a central location. And our staff who work on this are cross-trained. So they move from this study to another study depending on the phase and the stage of the investigation of this one versus the other ones that we've got ongoing. From the beginning of the study, though, we actually had a second aim besides answering the trial questions. And that is that we could maximize the potential of the cohort to be a resource when the trial was over. We wanted to be used by many investigators over time for a wide range of health-related outcomes. We had no idea what questions were going to be, the ones that would arise in the future. But we wanted to be able to position ourselves to contribute to the evaluation of those questions in a sort of a timely and cost-effective way. And we knew our basic strengths would be that we would have a large sample size of women geographically distributed throughout the United States with extensive duration of follow-up and phenotypic data from baseline as well as regular recontact through yearly follow-up questionnaires. So what we did is three things. We added, at minimal cost, we just added questions to our questionnaires that were not directly related to our trial. A wide range of self-reported outcomes. Have you been diagnosed by a physician in the last year with arthritis and connective tissue diseases, diabetes, visual disorders, cognitive decline, venous thromboembolism, osteoporosis, neurologic conditions, migraine, and whatever else we could fit on the questionnaire. Every time we wrote it, we limited it to a total of four pages because after four pages people don't complete it anymore, but whatever we could jam into four pages, we did. The second thing is we made sure that we had as extensive a core group of demographic lifestyle and medical history variables as possible to represent experiences as adults as well as some really full history variables like reproductive, smoking, hormone use, so we could do prevalence at baseline and then change in risk factors over time and follow-up. We also got baseline plasma and Buffy coat samples from 70% of the participants, those who are willing. We had no more money after we got them, so we just aliquoted them, froze them, and stored them in nitrogen freezers in three remote locations. So if the freezers fell apart, we would still have them, but we had no money to do anything more with them. So then we went on and conducted the trial, and meanwhile we tried to leverage the resource. So we did ancillary studies that were funded for specific conditions, and through those we got deeper information. So we got the details of the diagnosis from medical records, path reports, tissue blocks. We sent out additional risk factor questionnaires, and we assayed bloods for specific biomarkers. We also updated our assessment methods, such as using accelerometers for physical activity, which we hadn't done before, and there was no problem with a burden of recontact. Over these 18 years of losses to our cohort are through death, but they're not through loss to follow up because of unwillingness to continue or noncompliance. We also got nonfederal money to support and maintain our biorepository. Our plasma samples were assayed for expanded biochemical markers. We extracted DNA on all samples, and then with additional money we got from the NHLBI, we could add a genetic component. In particular, we did a GWAS on all of the blood samples that we had, and it was completed in 2008. Again, we didn't know what the questions of importance were going to be, but we believe that the GWAS would be the best approach, the most flexible approach we could do at that time with the money that we had available. So we did a lumina chip, 360,000 plus SNPs with an additional panel of cardiovascular relevant markers. And we did it on the entire sample, not as waiting for nested case controls, because we thought that would make us ready to go as needed as a cohort over time. Was embedding this genetic component into a mature cohort too late? It wasn't, because we were continuing to get endpoints. So as long as you have sufficient follow up for adequate numbers of new cases to accrue, then it's going to be useful. And because the population was aging, we were getting endpoints exponentially. So in fact, just to give you an idea, for the first 16 years of the study, we had 3,735 confirmed total cancers. And then in the last four years, we got an additional 1,300. And important vascular events, we had 1,216 years and we had 378 in the last four years. So again, the longer the cohort continues, the more exponentially you're going to get endpoints bang for your buck in terms of continuation. How fruitful were the data we collected? Well, we can look at participation in consortium activities. And also not just in consortium activities for our primary outcomes, but for other outcomes that we were never designed to look at as part of our initial study. And for some of these consortia, we were able to contribute fully and had sufficient levels of data for all risk factors needed. And that was for every consortia for the cancer and the cardiovascular disease. But also the body mass index and mortality, those with physical activity, those with hypertension. And we had enough to be able to do everything that was asked. For some others, we didn't have all variables, but we had more than enough to be able to get the basic set that was needed to participate. So whether it would be consortia on menopause, menarche, parody, migraine, glaucoma, neurologic outcomes, glioma, liver cancer, or pancreatic cancer, there was a small body of needed risk factors that most cohorts were able to participate in. Because they had that level of information. The deeper information came from only a few, it was only done in a few. But the basic information was in many. And for others, we were invited, but we couldn't have provided data without supplemental information requiring recontact. And we just worked out with ourselves that using our limited recontactability was not worth taking for these particular questions. We didn't have enough to offer. And those where we were asked to be in a fertility consortium and a hearing loss consortium. And it just wasn't something that worked with our scarce resources and our goals. So in the last two full years that we've had, 2011, 20% of our publications were consortial publications. And in 2010, 36% were. That just gives you an idea. And these were not all from consortia that were related to our stated original goals. So this is here as an example of one cohort being broad or deep for particular phenotypes. We were trying to do exogenous hormones in breast cancer. We had a full history at baseline of exogenous hormones for all participants. And on each yearly follow up questionnaire, we assessed their change in menopausal status and then change in hormone use. We did a hormone biomarker panel and a nested case control design for cases of breast cancer. We got pathology reports, pathology slides, and tissue blocks for the breast cancer. Obesity is a confounder. We had weight and height at baseline. We repeated the questions every two years. And we sent to a subgroup a tape measure to get waist to hip ratio or waist circumference. So depending on the question and the analysis, we could utilize these variables as broadly or as deeply as was needed. Now, we've talked about harmonization. So in this particular case, I'm not going to repeat that because of the time. But the important thing to remember is that often a limited number of variables is usually sufficient to give the information that it's needed. And they can be externally harmonized. Harmonization does not mandate that you use the same questions or the same wording. What it means is that you give your core actual information to a central group. They use it to do similar, but not identical definitions of exposure and disease measures to permit the pooling of data. And we actually did this in our study. We have data for a million people so far in the cancer consortium that have been harmonized. It didn't take long. It was not costly. It is doable both scientifically and logistically. And I think it's important to remember that the time and the money that's invested in a cohort study is really primarily in the setting up the population and collecting the data. Once that has been done, using the data to contribute to various activities or adding ancillary information to it is actually reasonable in terms of time and money. So the lessons to be considered from my own experience in one cohort to the following. Many individual cohorts have the ability to contribute to analyses of multiple outcomes. And if a cohort can do that, I believe that it really adds, it's an added value situation. That a genetic component such as sequencing can be added to existing studies as long as we have adequate follow-up to a crew sufficient number of outcomes. And maybe financially it can't be done on the entire cohort, but it would be powerful even in just a subset. In terms of the phenotypes, I think the best guide for the adequate level of the phenotypes, how broad versus how deep, will come if we actually can anticipate what the outcomes and what the hypotheses to be evaluated are going to be. But even if they can't be evaluated, if we don't know what they're going to be in future studies and so they can't yet be specified, we can take a middle road. And we have experience with a set of core basic variables that we know we're going to need for exposures. We know we're going to need anthropometric, smoking, alcohol, medical history, reproductive history, family history, physical activity. And I think it would be important to build these into every study, just that amount. And in general, if you're going to raise hypotheses, then the broader is going to be more appropriate. But if you're going to test hypotheses, we're going to need something deeper. And finally, there are scientific and logistic, we know there are scientific and logistic disadvantages of existing cohort studies. Sometimes we can address them by obtaining additional data and specimens, either by recontacting the participants or, for example, as our women's health study population now is reaching the age of 65, we can begin to use Medicare Medicaid tapes to assess hospitalization and diagnoses. So use different data sources as your population changes over time. There are also advantages to initiating new cohorts. But I really strongly feel that before we did a new cohort, we would need to identify what are the gaps to be filled and how the new cohort would address those gaps. I think then, because it's going to, by doing so, we're going to have to consider what adequate sample sizes we're going to need and the nature of the population in light of the questions to be addressed. For example, I think it was Eric who mentioned that sort of adolescents were not well-represented in studies, and that's true. But are we just going to represent them? Or are we going to do them in adequate numbers to be able to evaluate effect modification of an exposure and a genetic outcome and a clinical outcome by whether the participants are children or adolescents or adults or older adults, which would have great implications for sample size. So I think that while we're establishing our research portfolio to answer many potential questions in the future, I would like to try to leverage some existing cohorts while developing any new ones. Because I think the knowledge will come from accumulated data, not one study. And we don't really need or want to wait. We don't have everything on all cohorts, but we do have enough, almost, to make sort of significant contributions. So that's just to throw out some thoughts to raise a discussion. Thank you. Thank you. Questions or comments? Teri? Julia, thank you. That was a wonderful description of WHS and very similar studies. But I have to admit, I think you're the first person I've ever heard say that harmonization was easy. So what did you harmonize and what was easy? Can I turn to the queen of harmonization? So the nice thing about really large numbers is you can be a little sensible and not go crazy and harmonize the last 3%. Sometimes we turn over harmonization to people who might not feel comfortable to do it easy. But for the BMI study that Julie referred to, which is 20 cohorts, 2 million people, harmonization was really pretty straightforward for BMI smoking, social class. But by doing, let me back up a little bit, harmonization of medical variables is obviously more difficult than smoking. But for the most part, if you visualize what the tables that you're gonna need and the questions you're gonna ask, it isn't as bad as we have said because when we've talked before at many harmonization workshops, we tend to think about making it perfect. And by definition, if you're combining one cohort with another, it's not gonna be perfect. It's not the goal. It's to be good enough so that you can interpret the results. So I think as long as the scientist who's actually going to interpret the results can put some boundaries around the harmonization exercise, I don't think it's a killer. That's not my opinion. Far, far less expensive and hard than enrolling people in a cohort and having them keep on coming back. Many footnotes to that. But for the variables that you talked about, the top 20 that everybody in the room would put on any study we would conduct, it's not brutal. It's not a killer. It's my opinion. Chris. Actually what you said was it was easy at a reasonable cost, right? That's what, so this is one of the things that we've seen is it's often not ever costed into the design of the study. So then when harmonization has to happen, suddenly there's no budget for it and it's not that big a budget you need but somehow people are scrambling around to try to figure it out plus the time it takes. So I agree completely. It's not a difficult thing but it just has to be factored in to the studies. And if I can clarify an interest, make sure I'm accurate on this. We did actually not have it in our budget to do the harmonization. The money came from getting money to do the consortium studies. And so these were costs that were not born by the investigator. And that's a very important thing. It's the little costs that just kill ya. Because you don't have them. But if it's thought up front that these activities are not gonna be done on the fly, that they're important enough to actually build an infrastructure to do them, then the monies can be built into that part of it. It's not perfect harmony. We are not ready to go perform with the Philharmonic when it's over. But it's harmonious enough. I would say our greater harmonization problems have been among the investigators in the consortium. So we have a question back there. Hi, Lucia Hind... Is this on? Can you hear me? Okay. Lucia Hindor for NHGRI. So I just wanna amplify the comments that have been made in relation to a program at NHGRI that we have called PAGE. And PAGE is an example of a consortium where we actually did imagine that all of these different cohorts would come together in an initial framework of collaborative activities. So we envisioned that the cancer cohorts would contribute to cardiovascular risk factor analyses and vice versa. And we sort of set up the structure to give people working groups. And I have to say it does work well when you set that expectation up front. And you also have to commit the time to actually look across the various variables. A lot of times cohorts may not even realize they have certain variables that can be harmonized across multiple cohorts unless you actually think to look at all of the different instruments. Can I ask whether you were able to capture treatment history in the questionnaires that you had, medication use, and what were some of the limitations of that kind of a questionnaire to capture that information? We were not able to get those questions into our regular follow-up questionnaires because of space and because of timing. We did them yearly for a while. And then after a while we went every other year in terms of cost. But we did do supplemental questionnaires. If you got an ancillary study to look at a particular outcome, then we sent a questionnaire after a diagnosis was reported. So we had the medical record, but we still sent a questionnaire out asking them for the information. And if they couldn't report accurately the information, they gave us permission again to go back to the physician to find it out. But the answer to your question is it took an extra step. We don't have access in a way that we can get it directly without getting participant feedback. Go ahead. Oh, sorry. So I'm just wondering if the degree of harmonization well, to what extent the effect size that we're looking for would be effect, could be let's say washed out by insufficient harmonization. And if we're looking for small effect sizes, let's say of specific alleles, even slight lack of harmony might have important effects. I mean, so my question is, doesn't it depend on what effect we're looking for in the first place, that the extent of harmonization that has to be attained? You know, I do think that that's true. And one of the things that I put on the harmonization slide that I passed by was that it was very important to understand the question and what is needed before making a decision. And that what is needed, for example, take something like smoking, what is needed if we were doing any kind of an analysis that was gonna involve lung cancer, for example, where we really needed, if you didn't have a full history in dose and duration and type or whatever, you would be opening yourself up to a degree of confounding that was equal to the magnitude of the effect. On the other hand, if it's cardiovascular disease, you honestly can accomplish your goals with aspects which is related to current smoking. So it's a whole different level of knowledge that you need depending on what outcome and what question that you're looking at. So perhaps in situations like that, if you really did have something and you really needed some very consistent phenotype data or several pieces of key data, maybe you could go back and do some very standardized collection on a smaller group of people that you've already narrowed down. Absolutely. Go ahead, Mike. Yes, I obviously, it's obvious that the big expenditure of money is initially establishing the cohort and consenting the patients and getting the DNA and getting the Plasmon, getting it stored away and all that sort of stuff. And it's relatively inexpensive to do the follow-up, gather the follow-up information. Cystic fibrosis has a relatively unique circumstance where after we did spend a considerable amount of resources to enroll the original 3,500 patients and then several thousand more, Cystic Fibrosis Foundation has a registry where now every clinic visit is entered onto the CF Foundation registry database. And so we're able to update our clinical information from the Cystic Fibrosis Foundation on an annual basis for all of our patients. So once you get the cohort and we have this special circumstance that we can leverage for continually upgrading our phenotypes, which obviously aid greatly in questions like who has diabetes, how lung function changes over time, you can recalculate the common phenotype, you know, when people get infected with different microorganisms if they've previously not been affected. So we have a very special, wonderful opportunity to update our annual basis. And Mike, who enters those data? Is it the patient, the clinician who? The CF Foundation actually pays sites, clinical center monies, and they get reimbursed on how much electronic data they enter on the patient. And even though we don't know who the patients are, they were enrolled from Seattle. The Seattle site sends the patients CF Foundation registry number to the CF Foundation along with their study number. And the CF Foundation sends us their phenotype data so we can link the phenotype data to our unique study identifier without even knowing who the patient is. One of the recommendations that's bubbling out of this conversation is the ability to recontact people based on genotype. For example, homozygous loss of function variants. Have them come in for very deep clinical and physiologic characterization. Do you have, based on experience, do you have as a rule of thumb from the US cohort study since the US is quite a mobile population? A, the ability to find the people and recontact them and then B, they will indeed consent to come in for these more detailed phenotype studies. Structurally, yes. Meaning as long as the study is still ongoing, then if we've been in contact, well, so we can because we're in contact with them every year. So we can, for example, who is thinking back, you know, you're looking at something, you wanna know whether we're seeing the same. You call us and you say, do you have any patients that have, or anybody who has this particular diagnosis and have you sort of seen something and we can look at it and say that. And then you say, well, I'd like to, I'd like to examine a few of them or talk to them or whatever. We can identify those people. That's not a problem. The other investigator cannot contact them. We would have to go through our IRB and just ask for permission to say, you have these characteristics or whatever, we have an investigator who would like to speak to you about the following. That's, the IRB will say yes to that. And, absolutely. I'm not going to say a contrary to what some people despise, they're so old, we can't find. No, we can't. We're not going to say if he goes to the police to take regular contact with the police. We ask them to come back for particular exams as long as it's reasonable. We have a very high rate of saying yes and coming in. Yeah. And just based on our experience, the participants like special studies. They like something that has to do with them. So if you just start out almost any letter that says, you reported that you have this or experienced that or we noticed that and we'd like to look more in detail about it, it's like, yes, someone's going to listen to me and we have an opportunity to do it. Maybe one more quick question down there, Chris, and then we need to go on some. Real quick question. Just in terms of re-contacting for a result like a loss of function variant, in your experience with various cohorts, not just the women's health study, can you re-contact without notifying about the reason why you're re-contacting for a specific result? Because it's a very rare loss of function variant. Oh, we want to measure something. Well, why do you want to measure it? I don't know. I'd have to try to sit and figure out why we're, I mean, we can do anything that is representing why we're doing it in a way that makes sense. So we just have to find a reason, but it's workable, it's doable. Julie, can you just very quickly comment on your success rate in re-contacting and getting the consent to do these things as opposed to Gail showed that study what, 20 years ago that you were getting 50% or so that you were able to? I think mine is a little bit easier, ours is a little bit easier. It's like- Because you see them, I mean, because you have regular contact. Well, we never actually see them, it's all by mail, so we never see them at all, but there hasn't been a break. So the hard one is when you- Right, so we couldn't find 38% yet, and we're still working, this is only six months in. So of the people we got a hold of, it was totally 60% of the study, 50% of those, so it's still the majority, 80% of the people we actually found, but it's the finding them, but if you know where they are, obviously you're gonna do better, but that's why I presented it as the worst case scenario. You know, I think it's hardly gets worse than that. But just to let you know, after 18 years, we've got 94% morbidity follow-up and 100% mortality follow-up. So, I mean, mortality is outside of them, but we can do that, and to be honest, their willingness to participate in an additional recontact or additional ancillary is completely dependent on how much you're asking them to do. So if it's a questionnaire, something like that, absolutely no problem. If it is, will you please go to the clinic, get fundus photos, identify, you know, and then also go to the CTSC for seven hours' worth of testing, it goes downhill very, very rapidly. Yeah, no thank you. I think we'll probably have to save some of the other questions and comments for the discussion at the end, but thank you very much, Julie.