 So I want to welcome Dr. Erin Ramos. Again, she's with the Office of Population Genomics. Erin manages a portfolio of research that includes a collaborative project that develops a set of standardized phenotypic and exposure measures for use in genome-wide association studies and related research. She'll be able to explain a lot better than I can. What exactly that looks like. Her research interests include genetic epidemiology of dementia, genome-wide association studies, and gene environment interactions in complex disease, and ELSI research, including informed consent for large-scale genomic studies. So thank you, Dr. Ramos, for joining us today. And I will turn it over to you. Great. Thanks, Sarah, and good afternoon, everyone. I'd like to first thank Sarah and the Education and Community Involvement Branch for inviting me to speak with you this afternoon. I learned from Sarah, I do have a few custom animations throughout the presentation. And some of you that dialed in through web access might not be able to see those. But I don't think it'll take away from the presentation. OK, so let's see. Let's move to the next slide. So in the next 20 minutes or so, I plan to first provide you with some general background on genome-wide association studies and why it's so important to use standard methods for assessing the phenotypes and exposures of interest from your study participants. This will lead into an introduction to the Phoenix project. And just quickly, Phoenix stands for consensus measures for phenotypes and exposures. And the Phoenix toolkit then provides the resource of standard phenotypic and exposure measures for incorporation into genomics research studies. And then finally, we'll just play around with the Phoenix toolkit and we'll explore some of the measures that are currently available. So as you know, genome-wide association studies or GWAS, as we call them, have become a really common tool for dissecting the genetics of complex diseases. And the GWAS is typically defined as any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits, such as blood pressure or weight, or the presence or absence of a disease or condition, such as diabetes. But we could quickly walk through the results of a typical GWAS, just so we're on the same page. In 2009, Chris Amos reported results from a GWAS of lung cancer. And this is in nature genetics. I've got the reference here. This was a multi-stage study design, which is typical of our genome-wide association studies. So in the first stage, which is often called the discovery stage, they genotype roughly 1,100 cases with a histologically confirmed non-squamous cell lung cancer and the equivalent amount of controls. And they use the Illumina Human HapMap bead chip, which interrogates roughly 317,000 single nucleotide polymorphisms or SNPs across the entire human genome. They then basically compared the genotypes at these 317,000 markers in cases and controls to detect any statistically significant associations. And then in stage two, which we usually call the replication phase, they picked their top statistical hits from the discovery GWAS and genotype these SNPs in two additional study populations. So here are the results from the initial scan. What you're seeing in front of you is what we call a Manhattan plot. And this is a common way to display results from your GWAS scan. So if you look along the x-axis, it depicts the chromosomal region. So the red dots are chromosome one, the blue dots are chromosome two, and so on. And then the y-axis is the minus log p-value from the chi-square test of association. And then the blue line that's across the top of the screen, across the top of the figure, depicts the bar that they set for genome-wide statistical significance. So any dot that lies above the blue line would be considered a statistically significant association between a particular SNP and, in this case, lung cancer. So I circled the 10 SNPs that they followed up in their stage two replication. So these SNPs all were found to be statistically associated with lung cancer. So then the next slide is a table that summarizes the results from their replication analysis. They, in the table, I know it's hard to read, but you can look up the paper, I have the reference at the bottom of the slide. In the table, they list the reference allele, the genomic region, the nearest genes, and the odds ratios that resulted from the analysis, along with the p-values from the test of association. And let's see, the red box that I have calls your attention to the two SNPs that of the 10 that actually replicated in the two additional study populations. So if you look, these two SNPs are in chromosome 15 and the odds ratios from the test of association are around 1.3 with a very low p-value of around, in one case, 3.15 times 10 to the minus 18 and the other seven times 10 to the minus 18. The next slide, which is figure two from their manuscript, depicts this locus on chromosome or this region on chromosome 15. And so the top panel shows just sort of a blown up image of that Manhattan plot and it shows the individual SNP lung cancer association p-values within about a half megabase region. And then the bottom panel shows the genes that are found within this particular region. So they were able to identify at least three known genes and then one hypothetical gene. So, again, I think just to bring us all onto the same page is sort of what kinds of information we're gleaming from these genome-wide association studies. And the next slide is for those of you who are interested, the NHGRI GWAS catalog, which is curated by my colleagues in the Office of Population Genomics, Dr. Lucy Hindorf and Heather Junkins and Terry Minolio. The catalog is a database of all published genome-wide association studies. So as of today, I believe there are 471 publications and for each one of the publications is a very nice display of the SNPs that were identified, the particular gene region, the effect size and p-values, the genotyping platform that was used. You can download all the data into an Excel file and you can learn more at genome.gov, GWA studies. So what are some unique aspects of genome-wide association studies? They essentially permit examination of genetic variation at an unprecedented level of resolution. So again, we're looking at 300,000 to a million SNPs we're interrogating at one time across the genome. They allow for agnostic genome-wide evaluation, so instead of doing our typical candidate gene studies where we previously would have selected a few up to a dozen or so candidate genes of known biological relevance to a particular disease or trait, here we're sort of scanning the entire genome to see which interesting regions light up and then we can follow up with them. Once a genome is measured, it can be related to any trait, so we're investing a lot of money in paying for these genome-wide association studies and other large-scale genomics projects which are still very expensive. So if we're actually collecting genotypic information on someone, it would be nice to ensure that when the studies are initiated that investigators are collecting a useful amount of phenotypic and exposure information so that we can use these studies for more in-depth analyses and to share perhaps data across studies. We've also learned that most robust associations in GWAS have been with genes not previously suspected of being related to the particular disease or trait and then some significant associations are in regions that are not currently known to harbor genes. So there's a really nice overview of GWAS studies by David Hunter and Peter Kraft and in their paper they say the chief strength of this new approach also contains its chief problem with more than 500,000 comparisons per study, but potential for false positive results is unprecedented. Thus the same quanton for belief in any specific result from GWAS is not the strength of the AP value in the initial study but the consistency and strength of the association across one or more large-scale replication studies. So this perspective on the importance of replication is also captured in an excellent report from the NCI NHGRI Working Group on Replication and Association Studies. It's a great paper if you're interested in taking a look at it. And within the paper they highlight the important aspects of GWAS study design, what should be presented in your publication resulting from your study and they also focus on what are the important aspects of a replication study and they focus on the fact that obviously a similar phenotype needs to be used. So you do your initial study and then when you do your replication studies, it's extremely important to make sure you're evaluating the same phenotype in a population. So this example, we're on slide 12 now, is, gives us an idea of why it's important to use the same phenotype in your replication studies. So Karen Hecks Group set out to replicate an interesting finding from a GWAS on major depression. Patrick Sullivan and his colleagues had previously reported a significant association between the RS252283 SNP, which is in the PCLO gene and major depressive disorder. So Karen's group set out to evaluate the SNP in a population-based cohort and they initially identified 579 cases with depression or depressive syndromes from their Rotterdam study which is a prospective population-based cohort of persons over 55 years and then they identified 912 controls from that same population. So these 579 cases were heterogeneous and either had a diagnosis of depression using DSM-4 or a broader label of depressive syndrome which can include minor depression or self-reported depression, et cetera. So when they ran their first pass analysis, you can see the odds ratio of 1.10 on a p-value of 0.2, which isn't significant. But they do a really nice job in this paper of describing their follow-up where they narrow down their phenotype and focused in on more homogeneous set of cases. So the next analysis, they focused on DSM-4 depression only and you can see when they did this analysis, the odds ratio was 1.42 and the p-value was significant at 0.0025. And to further illustrate the point, they actually then focused in on major depression only. So they kicked out a few cases from the second row of this table. And again, you see the odds ratio increases just a bit and the p-value is 0.0014. So again, I just put this example up to illustrate the importance of using the same phenotype in your replication study. So in addition to needing standard measures to facilitate replication, standard measures are also useful because we can more easily combine data from multiple studies. If you use standard measures, instead of having to sort of go through the painful process of harmonizing data from studies that use different methods for collecting their phenotypic or exposure data. Also when we studied the genetics of common complex diseases, we're expecting that affect sizes of the SNP trade associations to be relatively small. You know, an odds ratio of 1.3 to 2.0. And in order to be able to detect these small odds ratios and also gene-gene and gene environment interactions, we need very large sample sizes. So combining studies with similar phenotypes is probably the most efficient way of generating these large sample sizes. But unfortunately, the ability for research groups to actually do this and combine their data has been limited because there really has been a lack of standard measures that have been incorporated into existing studies. So taking that into consideration, we developed an RFA for Cooperative Agreement that ultimately led to the Phoenix Project. Again, Phoenix stands for Consensus Measures of Phenotypes and Exposures. And Dr. Carol Hamilton from RTI International is the PI of the project and has been leading this effort for the past 2.25 years now. The goal of the Phoenix program is to develop a useful resource of standardized phenotypic and exposure measures for the genomics research community. We're focusing our efforts primarily on selecting 15 high priority standard measures for each of 21 research domains. And I'll review these 21 domains in a minute, but we focus on just 15 measures for each domain to keep this task, this project, a reasonable one. And also because when you're collecting data from your study participants, you need to obviously respect their time. So by taking 15 of the most useful low burden measures to capture a particular domain, we're hoping to provide researchers with a nicely sized set of measures that they consider incorporating into their studies. And then the measures that are selected through Phoenix are made available to the research community free of charge via the toolkit and we'll walk through the toolkit in a little bit. We're hoping that researchers will visit the toolkit. You can see the URL here, which is phoenixtoolkit.org to consider Phoenix measures when they're planning new studies, also to add Phoenix measures to an ongoing study and also to obtain high quality measures outside of their area of expertise. So if I'm studying the genetic epidemiology of dementia, I might be interested, including a few measures, say from a cardiovascular domain and a few measures from the diabetes domain, but that's not my area of expertise. So I know that I can come to the Phoenix toolkit and an expert panel of scientists have selected a small set of measures that might be useful for me to incorporate into my study. So I sort of don't have to do much work in advance. I can come to the Phoenix toolkit, sort of take a look around, pick out some interesting diabetes and cardiovascular measures and then take them back and incorporate them into my study. And then again, just to bring home the point, we're hoping that people across institutions start using some of these measures that their studies will then be compatible with each other and we'll be more efficiently able to combine studies to increase power and our ability to identify genes associated with complex diseases. So these are the Phoenix domains. There's 21 listed on the slide. I won't go through all of them. Once the project was funded, we organized a steering committee with expertise in genomics of complex diseases, genetics, epidemiology, and statistics. The committee is chaired by Jonathan Haynes from Vanderbilt University. The steering committee provides overall guidance to the Phoenix project and they are the ones that selected these domains that we would focus our efforts on. Also, the NIH Office of Behavioral and Social Science Research committed funds for the 21st domain, which is the social environment domain. And we hope that this will be a nice complement to the psychosocial and the environmental exposures domain. So once these 21 domains were identified, we began assembling expert working groups to do the hard work of reviewing the measures that are already out there in the community and selecting the most useful standard measures for each of these research domains. So we're on slide 17. So we'll just quick walk through the process for selecting the Phoenix measure. So once we picked the 21 domains, the steering committee took a first stab at defining the scope of the domains. What are some of the important aspects of cardiovascular disease, or demographics, or environmental exposures, are important for genomic studies. Then the expert working group did a survey of the literature. They contacted many of their colleagues in the field and identified a broad list of measures. And after a few meetings and conference calls, they were able to narrow down their list to 25 measures or so. And then these measures are sent out to the community for input. And ultimately, after a process of about eight to 10 months, 15 measures are selected and incorporated into the Phoenix toolkit. And the criteria for selecting the measures include that the measures have been well-established so that they're used often. And we know that they've been broadly validated so that they're reliable and they're valid measures. These measures are low burden to the participant and investigator. So in some cases, the Phoenix toolkit doesn't include the gold standard measure if there are very burdensome or very expensive measure to administer. The measures are applicable across population groups and that measures, the instruments and protocols are freely available without charge. So the Phoenix toolkit for each of the measures includes a detailed protocol for each of the measures so in very detailed instructions for the appropriate way for collecting the information. It includes a nice description about why the working group chose this measure, why they feel it's important in genomic studies, and then all the relevant references. There's also a lot of user support. There's a user's manual, frequently asked questions. There's links to supplemental information and then also various other resources like the Cancer Bioinformatics Grid or CAB. And a toolkit feature, so my favorite part of the toolkit is there's a quick start guide. So if you go to the front page, I'll show you in a minute, there's a red button that you can click and it sort of helps walk you through the quick and dirty way to get at some of your measures. You can choose your measures and add them to a shopping cart kind of like Amazon.com. If you register on the website, which you don't have to do, but it allows you to save your carts and you could share them. So if I'm collaborating with someone in California, I can prepare a cart with the measures I'd like to use in my study and share it with them so they can download the information in the documents and we don't have to sort of send all that paperwork back and forth via e-mail. If you can, once you select your measures, you can generate a nice report. You can add notation to the measures and the protocols. And something that our task is working on is promoting the collaboration tools to facilitate investigators finding each other who are interested in sort of the same aspects of a particular study. So here's the home page that I was talking about. The red button is a quick start guide. There are a few ways to enter the toolkit. You can go right to the browse button, which we'll go to in a minute. You could also search. So if I'm interested in hypertension, for example, I could type hypertension in and the measures, standard measures for blood pressure would pop up. Again, here's my shopping cart like Amazon.com and then aspects of my account that I can customize. So here, let's just go back for a second. If I click on browse, it would take me to the list of Phoenix measures that are currently available. So of the 21 domains that we've been working on, we have already deposited standard measures for eight of these 21 domains and the remaining 13 will be available by the end of 2010. That's the plan. So you can see we've got our measures for demographics, anthropometrics, like height and weight, for example, waste circumference, substance use, cardiovascular, nutrition and dietary supplements, environmental exposure, cancer and oral health. So if I would click on the demographics link, it would take me to the list of the standard measures for demographics and you can see the kinds of measures that the working group felt were important to include in genomics research projects. And here, then I went to the alcohol, tobacco and other substances domain and I'm deciding to drill down on this measure called tobacco and nicotine dependence. And this is actually the working group selected, the faggastrome instrument to assess nicotine dependence and you can see the toolkit provides you with sort of the definition, the purpose, some key words and then you can click on the protocol associated with the measure and you can see exactly the instructions for administering the questionnaire and what the questions are. There's also more information about the source, so where this came from, the language, this is in English but it's available in others, the participant, so this should be administered to folks that are 17 years or older and it gives you an idea of the kind of personnel and training requirement. So you can take a look at it and decide if this meets your needs and whether or not you wanna go ahead and add it to your cart. So here, I just went through and I decided for my study I'm interested in incorporating some of the measures of age and ethnicity, race, gender, health insurance coverage, family history of heart attack and lipid profile. So once I have my cart, I don't have it shown here but you can click a button to generate a report and also to share the cart as I described earlier. There's also some nice bioinformatics features of Phoenix which RTI is continuing to develop. They include links to other important resources like cancer bioinformatics grid, so they are trying to map the Phoenix measures to other resources so that we'll eventually be able to combine our data with many other studies. They are working on extending their capabilities, so for example, continuing to improve the smart query tools and your advanced search capabilities. They're also developing a data entry form for collecting measures, so this way if you're interested, say I picked a set of measures, you can actually convert the protocol to whichever format is useful for you for your study manual. You can also generate a data dictionary for the variables which will help your programmers set up the database that you can use when you're collecting these particular measures. We're collaborating with many folks, both within the NIH and outside the NIH. We have a really nice collaboration with NCBI and the database of genotypes and phenotypes, so we're trying again to map our Phoenix measures to some of the data that's already deposited in DBGAP. We're working with the National Library of Medicine again and the LOINC team, which stands for Logical Observation Identifier Names and Codes, and this LOINC is a nice reference of more clinical terminology, and we hope that this will be able to expand our entryway into sort of electronic medical records, and we're also working with an international group called the Public Population Project in Genomics. So I just wanted to acknowledge this during committee members and also our NIH IC liaison. So we have, from the beginning of the project, we've worked really hard to establish relationships with our colleagues at other institutes, and these experts have been helping us with every aspect of the Phoenix project, and it's been really great to work with them as well. And then lastly, just acknowledging my colleagues here at the Office of Population Genomics, our friends at NCBI, and then our colleagues at RTI, in particular Carol Hamilton and Lisa Strader, Debbie Maiz, Tabitha Hendershot, and you can see the list of others. So I think with that, I will stop and take any questions. Thank you. At this time, if you'd like to ask a question and make a comment over the phone, please press star one. Please send me your phone and record your name clearly when prompted. Good job. Every question you may press star two. Once again, to ask a question and make a comment, please press star one. One moment, please. Thank you very much, Erin. That was excellent. And yes, we will just wait a few minutes. I'm sure there are some questions coming in. And also, if anybody, you're having trouble with the phone, you can certainly chat the questions into the webinar portion, or you can email me at shardingatmail.nih.gov if you're sitting by a computer. Erin, I did get one question in while you were talking that I'll just ask kind of over the phone. And that was whether there's information on how this is already being used, or if it is being used by researchers, and if that's improving the quality of GWAS data kind of across the board in reaching the goals that you were, you hoped to, that you set out to do. That's a really good question. So I sort of tried to describe some of our collaborations, for example, with DBGAP. So when we're mapping our Phoenix measure, say the Phoenix measure for blood pressure, we're trying to work with DBGAP to identify the studies in DBGAP there's now maybe 40 genome-wide association studies that have used this particular measure for blood pressure. So that's one way to show the researchers, you know, if I select this Phoenix measure, numerous other studies have already used it. So if perhaps if I incorporate this into my study, I'll be able to collaborate with them or use some of their data if I can get access through DBGAP to expand my work. So, and I think that's a really important aspect. So we can have a sense of how these measures are being used and then encourage people that have already used the measures to perhaps collaborate with folks that are adding the Phoenix measures to their projects. Is there, I can't see, you said there was a chat portion? It would pop up, yes, and. Okay, so you might just ask me those questions if any come up on that. Yes, yes, I will, I will, I will send them into you. And I forgot to, I meant to include my contact information. If anyone has questions about Phoenix, please feel free to contact me. You can find me at the genome.gov website if you search the staff directory. So there is a question, oh, it just disappeared. There was a question but it went away before I could ask it. So if that was somebody in our audience, please repost it. Well, are there any questions waiting for us in the queue? Quick question. Yes, there's one question in our queue about whether, whether this can address rare diseases. Unfortunately, because we were trying to limit our, we had to limit our scope to have sort of a, something that we could accomplish in a few short years, most of the working groups have elected the focus on conditions that are more prevalent. Although that being said, I think an important aspect of when you're trying to study rare diseases is that oftentimes, you know, you have investigators across, spread it across the country that each might have a few cases of particular rare disease. And if they're not collecting sort of the standard demographic and other phenotypic information in a standard way, it's really difficult then, as I was describing in my talk, to combine those data and have enough power to do some interesting analyses. So if you're doing a rare disease research, it would be useful to adopt something like the Phoenix Standards, which would then be able to facilitate, you know, the researchers around the country who have a smaller set of cases to combine their data more efficiently. Great. And I guess related to that question, I was just e-mailed another question about, so perhaps not rare diseases, but can Phoenix be used for non-GWAS studies? Absolutely. We come from the Genome Institute, and so sort of the focus has been on GWAS and genomic studies, but I'm an epidemiologist at heart. And so any epidemiological study could use these measures. It's just you can, you know, once you collect the information, you can certainly add the genomics component to it. But the way it's set up is that these measures are amenable to most any kind of study of human disease. Another question I had is whether you have been a part of any conversations or thoughts about whether Phoenix would become a part of training, kind of for up-and-coming epidemiologists or up-and-coming researchers. Is that something that you would hope this might get integrated with? Well, I think it would, Phoenix would be a particularly useful resource for younger investigators who might not have the experience or have been around enough to know which are the most, you know, useful measures for their particular field of study. And so I think by coming here, it might save them some time and could help focus them in on the relevant measures for a particular trait. And then I do know that we have some colleagues that have been thinking about using this as a tool so for their new grantees, for example, that they would, this would be a good starting place for them to get started. So they know these measures are already out there. It would be sort of a quick one-stop shopping for them. Operator, are there any other questions in the queue? Right, there's a new question in the log. Could you describe the relationship to Phoenix items to the CA-BIG question repository? So, or rather, could you describe the relationship to Phoenix items to the CADSR question repository? Are all Phoenix measures in the CADSR? From now, I'm, RTI wish they were on the call to answer this question, but from my understanding, every Phoenix measure has been registered. So you'll be able to find every Phoenix measure in the CA-BIG repository. There's a Phoenix identifier for it. My email is very active. So here's another question. Will the measures developed to address these domains be updated? Yeah, that's an interesting question. So we are thinking about this now. This is originally a three-year project, but we obviously hope that these measures are being used and if they are being used, we realize that the, you know, the methods for interrogating phenotypes and exposures are changing and improving with time. So our hope is if we can show that these measures are starting to be used, it will convince our friends at the NIH to help expand the program for an additional few years, and if that's the case, we'll reconstitute our working groups occasionally to come back and review the measures to make sure that there's been no significant changes or if there's something late-breaking that is really important and critical to include that we can include it. So that's something that we're just working on right now. Great.