 So it won't be a surprise to many of us in the audience that the U.S. population is aging. And particularly the post-war baby boom is, you know, now well into their 70s and 80s. And with that aging comes age-associated chronic disease, particularly dementia, especially Alzheimer's disease. So Alzheimer's disease, as it says, is the most common dementia. It has a shared pathology of having amyloid plaques and tangles shown here, but it also has a very heterogeneous etiology, or likely heterogeneous etiology. Extremely important is that there really is no treatment and no prevention mechanism at this time. And in fact, there is now a mandate for the NIA to begin to develop and have by a certain date such treatments available, which is a tall order. It's not a surprise that using an old-world word for some of us in the audience, there is a single major gene for Alzheimer's disease. The ApoE locus and what's shown here is basically age across the x-axis and unaffected status across the y-axis. And you can see that for those of us that carry an E4 allele, there's an increased risk of Alzheimer's disease. Parenthetically, by the way, there also is maybe one of the first protective alleles, those that carry the E2 allele, but are a bit protected from disease. Likewise, during the GWAS era and actually continuing to today, there are now at least 25 different loci that have been identified using basically case control studies and international cohorts from France, the U.K., the U.S., and also others from the U.S. And so there are substantial evidence in addition to family data that identifying Alzheimer's disease genes is tractable and that that knowledge could turn into novel therapeutics. I'm going to come back to really this and focus more on treatment and prevention. The ability for us to identify drug targets is a major initiative within the ADSP. And in fact, there is really good collaboration between the ADSP project and newly funded projects that are really tasked for identifying model compounds for treatment and prevention of Alzheimer's disease. And really all of this builds off of seminal work in PCS-K9 that looks to identify individuals that have loss-of-function variants that lower risk. And what's shown here basically is the PCS-K9 gene, various loss-of-function variants, and not only does it lower LDL cholesterol, in fact, it reduces by a substantial amount the risk of heart disease. This then really has spawned a burgeoning industry. Multiple pharmaceutical companies now have multiple phase two and phase three trials that look promising for using this target as a potential additional therapy to statins. There also is precedent for, like actually PCS-K9, novel variants in Alzheimer's disease genes that lower risk. This is a work from the decode group in Cari Stephenson that identified a novel amino acid substitution that protects against cognitive decline and Alzheimer's disease. So I think it really is a great proof of principle that it will be possible to slow this disease down if we can identify the right target. So then returning is that that was all background and why we're doing this. The objective really then of the ADSP, which is the Alzheimer's disease sequencing project, is first to identify novel risk-raising genes and alleles for late onset. And I'm going to emphasize this is one of the mild late onset Alzheimer's disease, not rare early onset slash familial disease. The second objective really is to identify novel protective genes and alleles for the reasons that I just outlined. Some of the early challenges that we address that I won't go through in great detail is we first asked ourselves can a single design achieve both of these objectives. We struggled with having really two different studies, one to find risk-raising alleles and one to find protective alleles. But with the knowledge that we had a fixed budget, therefore the size of each of those studies would be smaller than a combined study. And we convinced ourselves that really if we were clever in our selection of quote cases and quote controls, we could achieve both objectives. We went through, and particularly I want to do a shout out to Gary Beecham in Miami, substantial amounts of modeling computer simulations and power calculations to show indeed that we could use a single design that was well-powered to achieve our objective. We also went through a lot of, not arguments, but I would say heavy discussions about should we be doing whole genome sequencing, should we be doing whole exome sequencing, we ended up as you'll see in a minute with a hybrid approach. So the design really has two parts to it. The first is a large family component that is using whole genome sequencing. There are 47 Caucasian families, 67 Afro-Caribbean families from the Dominican Republic, and two families from a Dutch isolate. They have about 582 cases, 82 of which are affected, 584 total. I'm sorry, 502 Alzheimer's disease cases, 82 unaffected, which rises then to 584 total. Because of the nature of the disease, the families tend to be fairly flat. So you get basically a lot of sibling pairs and a lot of cousins that show in the family with disease. But because by the time the disease is diagnosed, by and large the parents of the affected are passed away, and the children are not yet old enough to show disease. So we tend to end up with fairly flat families. Also, this is one of the two families from the Dutch isolate, so you can see that the combination of sequencing and substantial amount of genotyping will be able to impute the sequence loci into many, many, many individuals, some of which, by the way, have good phenotyping. And obviously we don't have Alzheimer's disease phenotyping in all 4,000 individuals across 18 generations, but we have a substantial amount of information. The case control design uses whole exome sequencing, and all three of the large-scale centers are involved at Broad, Baylor, and Wash U. We shot in our design for 5,000 unrelated cases. We ended up with 5,107. These individuals were selected to be at low risk for disease, but they did indeed manifest disease within an age window. Obviously we wanted to not rediscover over and over and over again the ApoE locus, and that was the purpose of that design. We also had a goal of 5,000 cognitively normal controls, and controls here should be a case, or should be in quotes, because indeed for protective variants, they're the group you're most interested in. We selected those controls to be likely, the least likely, to convert within, again, within an age window to convert to disease. In addition, we had many, many individuals in the families that we didn't have enough resources to sequence. So we basically call these enhanced cases from these multiplex families. We went ahead and sequenced those individuals. In addition to Alzheimer's disease, yes, no, many of these individuals have a lot of what we call endophenotypes, so MRI brain scans going back to the previous discussion, and so we'll be able to invest a lot in both the phenotypes of sort of the clinical phenotypes of brain MRI. I'm particularly enthusiastic about these data for identifying protective variants. These data, or this slide shows basically the result of a GWAS for hippocampal volume and identifying loci, some of which have negative beta coefficients. It's one, again, hook into identifying protective variants for this condition. If Richard were here, he would remind us, that is, he would remind us that brain MRI is not necessarily Alzheimer's disease. They're different in complementary phenotypes, so we need to be careful about that, but nonetheless take advantage of the resource. Being a good NIH-funded large-scale project, we have a governance structure that's in place that I will not bore you with. We have steering committees and analysis committees and production committees that have quite a few phone calls. So this slide I'll go through in a little bit of detail because I do think it shows you the various pieces that have been put together to accomplish this task. In green, we have multiple representatives from the Alzheimer's disease community that have been engaged to supply samples and intellectual input for the project. They include members of the Charge Consortium, the Alzheimer's disease genetic consortium, which Jerry leads, and also NICRAD. So they really have been brought together to supply samples and also throughput of the data. As I mentioned already, all three of the large-scale centers are involved. We've created a share space in dbGaP, so all of the data, not just sequence data, but sequence data and phenotype data, including the brain MRI data and many other endophenotypes, will eventually go into dbGaP and be publicly available. We have then a lot of pre-sequencing interactions to have the sample throughput go forward in a timely and organized manner. I do want to have a major shout-out to this group for this step. Having been involved, as many of you are, with many, many projects, I would say the ascertainment of samples, the okaying of the consents and the movement of those samples to the sequencing centers for this project went forward better than any other project that I've been involved with. We really stayed on time and stayed on task making sure that the data will be produced and the data will go into dbGaP. This is the stage where we're at right now. All of the sequencing is complete for all of those projects. In fact, it was completed a little bit ahead of schedule, and I would say we're in the stage that would be considered QC and the release of very early data sets. In the meantime, NIA has funded analysis centers, multiple analysis centers that I won't go through in great detail, but there's three analysis groups that have been funded. One of the things that Jerry and I are doing as chairs of the Analysis Coordinating Committee is get those groups to work in a very, I would say, cohesive manner that we're all moving in the same direction. They each had to apply in response to an RFA, but now that they've been awarded those grants, we want to make sure they no longer compete, or if they are competing, they're competing in a constructive way, and we're all working in a synergistic manner. Obviously, I don't need to go through this in great detail. We have family-based analyses that are, plans and white papers have been put forward, the case control analyses, structural variants, and obviously protective variants. I put this red dot just to indicate, give you a feel where we are in the project. Really, we're kind of moving out of QC. The first data set has been released to the investigators that we call a tire-kicking data set. One of the things that we're doing being a good NHGRI project is we've brought the data together from the three sequencing centers at Broad, WashU and Baylor, and both Broad and Baylor are putting them through their calling pipelines, and that's what's outlined in here. Basically, creating a Broad-specific VCF and a Baylor-specific VCF that are QC'd separately, and then we're creating then a consensus set of calls. As I indicated, the tire-kicking data set has already been released, and version one will be released, March the 31st, completed calls and completed QC. The BAMs are already available in dbGaP, so if you go to dbGaP, you can already download the raw data. In the early days, I questioned whether, given the state of calling single nucleotide variants, whether this parallel calling by Broad and Baylor was really necessary, but I'm now convinced that it was a good exercise in getting us all working, and what this shows basically is the Mendelian consistencies. The red on the top here is from the Broad data, and the purple or blue at the bottom is Baylor data, and it shows the differences between these two calling pipelines. By the way, most of these inconsistencies by far are in the, if you think of two Venn diagrams, they're in the part of the Venn diagram where variants were called in one group and not in the other group, so this is good information as we develop a set of calls to distribute. The other thing we're seeing is really issues in sample handling throughout the process. These are five individuals, two of them are on top of one another. Really, they're basically outliers in heterozygosity, and all it was simply was a clerical error for those individuals. They were African-Americans and they were labeled as European-Americans, so when they were QC, they simply stood out as having a lot of heterozygosity. We're putting an enormous amount of effort early on in the replication study. I think one of the keys to our success is not only going to be the ability to make initial discovery, second the ability to replicate those findings, and then third to have functional studies to look at the mechanism of disease association. So as I mentioned, we're putting a lot of effort. We've identified already, basically, and our goal is to have 40,000 European ancestry, that'll be 20,000 cases in controlled, and 10,000 individuals from other ethnic groups to be Hispanic and African-Americans as part of the replication phase for the early discovery. In addition, we're basically using resources from other studies that were brought together, and we're keeping open for the possibility of doing targeted resequencing versus really more exomes. Maybe by the time we approach the replication, even whole genome prices could come down. So right now we're keeping our options open about what the replication study really will look like. Here, this just shows you those are not numbers that are pie in the sky. Richard Mayu and Tatiana Frude and Suda Shadri, in particular, really have been doing a yeoman's job of contacting individuals. My guess is, in fact, there's an Alzheimer's disease meeting in Washington right now. A lot of the discussions are really to drum up enthusiasm for the ADSP replication study. I put this slide in to emphasize, particularly to counsel, that we're not just turning the crank. We're, as best, you know, we have a job to do on the one hand, but we're making sure we basically infuse the project with innovation. One of the innovative aspects in the project is really our approach to structural variant calling. I was worried in the early days that we would end up with endless arguments and bake-offs between different callers. We participated in those, and we've really put together, I would say, a group that is calling, using multiple callers on all of the data, and then really bringing the callers together in what we're calling a parliament and building a set of consensus call sets. Jerry and Will Celerno are leading that effort. And I think that's one way that the product of the project will be more than the sum of the various parts had we not collaborated and worked together. We already have some early manuscripts that are either well-developed, a couple have been submitted. There'll be a design paper, which are thankless tasks for those of us that have done them, but yet they're heavily referenced throughout time. So the ADSP will have a design paper. We have a linkage analysis paper that Richard's group has led from the Dominican Republic families and also another linkage paper from the European-American families. The Charge Consortium and the ADSP have worked together in terms of best practices for large-scale QC of exome data, and they're putting together a paper on best practices for QC of exome data. Here is a sample in the last slide of the investigators. If I put everybody's name, it would be in a font that would be hopeless to read, but I just want to emphasize at the moment, first of all, again thanking Rudy for the opportunity to speak. I was at cheap date because I just needed to walk from one side of the table to the other, but really I'm a mouthpiece for a lot of people who are working on this project, many of which are on this slide. So thank you and I'll be happy to answer questions. Bob? I want to ask you about what your thoughts are about the analysis for trying to do association studies using multiple different variants within genes or within gene networks and those sorts of tools. Well, actually, I'll quote a good friend of mine, Peter Donnelly, that he told me, actually, I think, Terry, you may have been at that lunch. He said, if we're still doing burden tests in a few years, we're in trouble. And I think the field is finally moving out of simple burden tests which is count, and there's nothing wrong with counting, by the way. Many of us are fond of counting, but I think we're moving out of simple counting and looking at more sophisticated measures that will be applied. Some of them are principle components based. Some are more sophisticated or less sophisticated. I think particularly as we move into clever ways to identify protective variants. One, clever ways to code and annotate and analyze structural variants, which really is a wide open field. There's so much attention on calling of structural variants. Using them in the analysis of a complex trait is a fascinating and wide open field. And again, I think because of the emphasis we're putting on structural variants here, we should be able to make a contribution. At the minimum, make a contribution of data so other smarter people can address that. Howard, I thank you. That was excellent, Eric. Thank you very much. So the question I had was around the variants that are not in common between the two pipelines. Are they going to be available somewhere or are you going to use those as being, we don't trust those so they're out of the loop? A little bit of each. So as I mentioned, there'll be a consensus QC data set released in March. That's the timeline. Those will have, as obviously the consensus set, the raw data, so the data from both groups, in fact all three sequencing center is available. So if you want to recall or you want to look at those variants that have only been called by one of the two, that's available on dbgap. Lon? So thanks, Eric. Really interesting. I guess I'm in the mindset of precision medicine from this morning and also hearing the lofty goal of coming up with new medicines in a short period of time. I was gratified to hear that the focus isn't just on genetics discovery, which is obvious, but the endofenotype side and maybe the genotype-phenotype relationship. Right. And I wondered if you might speculate a little bit further about how far you're going to get with that in practice so that we might use some of this in a drug discovery context down the road. Well, I mean, the short answer is we haven't analyzed the data yet with these sequence variants, but I'm optimistic that the endofenotypes are going to be a key for identifying protective variants. If you live long enough, a lot of us are going to get dementia, not necessarily Alzheimer's disease, but dementia. So I think the endofenotypes are key for identifying protective variants by finding, for example, ApoE4 individuals who are in their 80s, even 90s, but whose brain, longitudinal brain, hippocampal volume is not progressing or digressing. I think that's one. Second is the ability to use the Alzheimer's disease and or the endofenotypes as stratifying variables to identify a more homogeneous pure phenotype set of individuals that could drive up power for discovery. I think that's another opportunity. Have you got the right material on all the samples to compare or on enough of them to draw those inferences? Actually, Jerry, can you help me get, how many of the 5,000, let's say 10,000 Alzheimer's cases and controls will have a brain MRI? Do you have an idea? I don't know. Centralized, so I really can't answer that. Yeah, I don't know. There's two parts to your question. Do we have the right data and how much of it? You know, we have, these are walking, talking, living human beings that are largely volunteers. So we have the data that's routinely available. And we're trying to drive up the numbers. I think the other opportunity, by the way, is going to be longitudinal measurements. Many of these individuals now have 5,6 brain MRIs over from 45 now to 80. Marilyn? I just have a couple of things that might be of help. In addition to the specific samples that Jerry has, the charge cohort has a large number of their subjects and MRI data on them. And they've already published several papers on those data. So the other thing to add on to that is we're also going to be calling in harmony the data from the Alzheimer's Disease Neuroimaging Initiative. That is all neuroimaging data. Those data are going to be, Jerry's going to be working on that with Eric. And that will double our samples costs that's available for those types of analyses. We have several other ongoing projects that we're going to try and harmonize with these data. So those cohorts will also have those endophenotypes that are very good. So I think we're going to be in pretty good shape for endorphins. Very good endorphins. Lon, I think the other thing that's happening that's very good and it reminds me of the early GWAS days is everyone thought they were going to Stockholm next week or next year. And when the reality hit that this was a tough problem. We needed large sample sizes. We needed good analytics. We needed good phenotyping. The community, I think fortunately, is laying the foundation really for this proposed precision medicine initiative. Really is that cooperation that came out of the realization that we couldn't do it alone. We were going to have to do it collaboratively. And I think this study will benefit from it and the timing. Dan? A couple of questions or comments. Can you comment a little bit more on the selection of the control group? I thought I was going to hear that they were people who were old, didn't have Alzheimer's disease, and had ApoE4 risk alleles. I don't think that's what I heard. And how do you decide that they are controls or not? That was one question. I like the word harmonization. I think we're going to hear that word over and over and over again as we go forward in this precision medicine thing. The third question that I had was the Peter Donnelly comment. Hopefully we'll figure out which variants have function and then we can do burden tests using those. Is that a correct statement, do you think? The problem with doing burden tests where you just sort of say, here's a bunch of snips and I'm just going to say, well, if there's variants I'm going to add up the number. That is sort of misguided and simplistic. But if you knew that there were three or four that were all loss of function variants for a gene, then a burden test makes some sense to me anyway. I'm not Peter Donnelly, though. As we have talked numerous times in this room, we really do not have an adequate pipeline in an a priori way to identify what are functional variants and what are not functional variants. If we had that pipeline, I would fully agree with you. But we do not. I think this is something that will unfold, and particularly as we move, again, pushed by this group, as we move from exomes to genomes, identifying functional variants outside of very tightly annotated regions. It's a wide open field. It's a wide open field. The control question. So the control question. So first of all, it's been discussed a lot. You're absolutely correct. You did not hear what you thought you were going to hear. So I want to emphasize these controls. Sorry about that. These controls were identified as individuals who are least likely to convert to cases. That was the definition of a control. And we calculated these very carefully in terms of using epidemiologic data. Many individuals in the community are sequencing elderly E4 individuals who are cognitively intact. In addition to this, as part of complementary studies, Alice and Goat has a project like that. Richard Mayo and I have a project like that. So it'll be complementary and actually part of this. But the main ADSP controls are defined this way. Also, if you define the cases up top as being E2s and the controls as being E4s, it would be a very difficult study to analyze on chromosome 19. Marilyn, you have something to add? I wanted to go back to the functional question. In addition to the sequencing project, we have a huge effort in NIH finding therapeutic targets using the Accelerated Medical Partnerships Initiative. We, NIA, has made sure that those investigators are talking with the geneticists because those guys are doing from the back end to the front. We're doing from the front end to the back. Hopefully, somewhere in there we'll meet in the middle. The other part of it is that we have on the street in our faith for the replication phase, the follow-up phase, which is going to have to find some functional components because we have a congressional mandate to find those by 2020. Yeah. Jay? Okay, you're fine. That's actually the only thing they have done recently. I'm sorry, Eric looked at me. Maybe you said this in there and missed it, but what is the specific target as far as power is related to the mandate question? Is there a goal there? You're asking me to dig up things from a year and a half ago. Yes, there was a goal and there were calculations. I don't remember. If you go to the ADSP documents, there's actually an appendix, many page appendices, and then there's a summary table, and I can send it to you. It's posted on our website, so you can find it. It's on the public side. I don't remember what the frequency, effect size, and probability was. Is there a sample biobanking component to this so that you can go back to samples? You can see it listed on your diagram. Let me help you out with that, and then you can jump in. The NIA has funded since 2002 the National Cell Recluster for Alzheimer's disease. We have samples in there that go back 30 years. Some of these families have been followed for that entire time. We have all of the samples from the Alzheimer's disease studies. Every single investigator that has been in my portfolio for the last 15 years has samples in there. I don't know, Jerry, but we have 25,000 samples in there, all phenotype. There's a huge effort to leverage the infrastructure that NIA has in place. In addition to that, the charge component has an equal number of samples that they have put into the study. So there's really a lot of leverage of existing resources for this study. Carol, what I'd like to see again, I think, relevant to this group with ENCODE and GTEC, so many other things are the brain initiative. Be nice for some of these individuals, particularly those who are doing a whole genome sequence to get post-autopsy, you know, brain donation and done in the appropriate way to do RNA-seeking and other measures. I cannot resist saying that this entire discussion sounds like a discussion that's going to be repeated over and over and over again as the precision medicine initiatives go forward. There's harmonization across data sets, lots and lots of different kinds of data sets for discovery, lots and lots of complementary data sets that will finally come to an answer. Unless the idea is that you're going to solve everything with one large cohort, we're going to talk about this, I think, for the next week or the next decade, this discussion is going to go on and on. I'm looking forward to it. And I think one of the nice things about this discussion, the discussion occurring early with engagement of the community, I think the precision medicine initiative needs to be sensitive that each of these communities need to be brought along. And that will take some time, but if we're going to get appropriately consented, appropriately phenotype samples in a large enough numbers, these communities are going to need to be fully engaged. Thank you, Bruce. Thank you, Eric. Let's move along. NIH is required to, before we can publish an FOA with a set aside of funds, we have to get what's called concept clearance in a public meeting. NHGRI uses their advisory council for that purpose. So Joy Boyer is going to give us a presentation on the LC Centers of Excellence in... Yes. Put it on the other side, Joy, so when you turn... I feel like I'm going to the prom. Sorry, it's the Centers of Excellence in Ethical Research.