 here I am. No, I'm right here. I hope you all can hear me. Great. All right. We're going to present this concept clearance for the whole chip toward more comprehensive analysis of genome-wide association data. And many thanks to my colleague Anastasia Weiss, who's only been with us for about six months, but has really dived into this effort with us. So that's great. We've gotten a little ribbing about the name of this. It might remind you of Dorito or something like that. You might be aware that Doritos have lots and lots of components in them. And if you leave out even a couple of them, you just don't get the whole bang for your buck. So similarly, like genome-wide arrays, what we'd like to do is support broader utilization of the data that have already been generated in genome-wide studies of human disease. The goals then would be for underutilized information, and we'll show you some data on how underutilized those may be. Primarily the X that is, but also the Y, mitochondrial, and CNV data. We'd like to facilitate more comprehensive analysis of existing data where necessary, stimulate development and validation of new quality control and genotype calling procedures. And also, where necessary, develop and validate new statistical methods, analytic strategies, and study designs. Some of you may be familiar with this diagram. I understand there was a drinking game at the latest ASHG, where every time you saw this diagram shown, people took a drink. But at any rate, you notice that it's very heavily populated everywhere except in the poor X chromosome. Well, actually, there's also this little guy that doesn't get any at all. But quite a difference when one compares the sex chromosomes to the autosomes. And Anastasia with Lynn G, who's busily taking notes for counsel over there, reviewed actually nearly 400 of the most recent genome-wide association papers in the GWA catalog from January of 2010 to March of 2011. And what you can see here is the proportion of papers. This is the number of papers per month. And then the proportion of them that even analyzed the X chromosome, whether or not they found anything, whether they reported analyzing it. And overall, it's only about 32% of the papers, about third. And that did not change really over time. So it didn't increase or decrease with time. We're not quite sure what happened in February of 2010. It just seemed to be a bit of an outlier. And if one looks at the number of hits that has been found at the 5 times 10 to the minus eighth level in the catalog, for chromosomes of similar size, there are about 50 hits. Chromosomes, even the little teeny tiny ones, have maybe a quarter to half that many. And the X chromosome has only seven of 1200 and 12 total associations at this level. So quite a difference there. This is kind of an interesting thing to look at. Anastasia pulled out a nice example of data on DbGaP. You may be aware that the DbGaP staff do generate what are called precomputes, where they basically take all of the genotype data and relate it to the phenotype. The phenotype in this case was diabetic nephropathy. And they found an association on the X chromosome, the strongest association in the data set at 5 times 10 to the minus 11th. When the paper was published, however, that the X chromosome was not analyzed at all, and that hit went away, they basically removed the X chromosome as the first step of QC, without giving any reasons as to why they did that. This is kind of an interesting region. The SNP that's involved, RS798159, is very near. An ENCODE-defined region of promoter-associated histone mark. So many thanks to our ENCODE colleagues for generating these data. And as you can see here, just kind of spreading this out a little bit, it really is kind of right on the edge of the beginning of this promoter-associated region. There's also what might be a little bit of DNA's hypersensitivity activity there as well. So it's sort of an interesting SNP to look at, and one that might have been worth looking at in this analysis. You'll notice that there are lots and lots of genes in this area, but sort of upstream here, there's one that's a particularly attractive biologic candidate, the angiotensin converting enzyme 2 gene, which is known to be associated with a whole variety of nephropathy-related traits, including estimated gluminal filtration rate, which is the definition of nephropathy. So that's not saying that this is necessarily a causal association that could have been picked up with this particular analysis, but it does kind of raise some questions as to why exclude the X outright. There may be some interesting stuff there. So there are reasons that the X chromosome is a little more difficult to analyze. There is somewhat lower genotyping accuracy due to difficulties with clustering algorithms that have to deal with the poor hemizygous genomically efficient among us, as well as the pseudosomal region shared with the Y chromosome that can be difficult to genotype. There's also more missing data. 13 or 14 Geneva studies showed more individuals with a greater than 5% missing call rate. Geneva, of course, is our large-scale consortium of genome-wide studies and a whole variety of different traits, but all analyzed out of the, or at least the data are cleaned in the same coordinating center. And higher levels of chromosome-1 anomalies as shown here. Missing call rates for the autosomes were about 0.08% or so. For the X chromosome, there are about seven times more than that for the Y chromosome, about 20 times more than that. So really quite a difference, but this is the one that we're focusing on. Metacondrial also quite a bit more. In the autosomes, about 0.015% noted anomalies. The X chromosome has about 10 times that, and the Y chromosome a little bit less. So there are more challenges there. Still, we're talking about very low numbers of SNP calls lost in percentages. Other reasons are that this is a little bit challenging to interpret. One does have to consider X in activation. And if you have a SNP that you've picked up, is it on the active chromosome or isn't it? Plus, about 15% of all X chromosome genes escape in activation, and that's not random. Obviously, there are reasons for that happening. Analytically, this is a challenge. There was a lack of implementation software until relatively recently. There's some difficulty in assigning haplotypes, and there's a need to accommodate different expectations for Hardy-Weinberg equilibrium and minor allele frequency estimation. This is not hard. It just has to be done a little bit differently from the autosomes. We had just sort of an informal poll asking folks that we knew that had collaborated with us on a variety of programs, why they thought this might be, and whether it would be worth stimulating. There were some who felt that we really didn't need to stimulate that will happen naturally now that imputation programs regularly include the X chromosome. There were others that felt that we were really too early in trying to do this, that the X chromosome was so difficult to genotype that really we had to wait until sequencing was better able to approach those. Or maybe this is the right time. It's clear that the first genome-wide association paper has really kind of set a precedent of excluding the X, and that hasn't seemed to be questioned to a great deal beyond that. The autosomes provide so much data. Perhaps you don't need to bother with the X, and you get around to it when you have time and you never have time. Clustering algorithms, as I mentioned, are more complex, and there's a little more effort required to analyze them. There's also somewhat lower power with 3N versus 4N genotypes in these. Nails and females combine. We're proposing that we try to stimulate this area somewhat by asking investigators to obtain and analyze existing genome-wide association data, it could be their own, or data from DDGAP, or other similar data sets. For phenotype associations with X chromosome variants primarily, we also would like to see how folks might deal with Y chromosome mitochondrial variants and possibly structural variants as well. Focus on data sets where these data have not really been analyzed, although one might allow them to do some replications using different methods, and develop and validate and disseminate possibly new user-friendly quality control and analytic methods, again, with an open access model. And we would, as is always our mandate in population genomics, try to be sure that they include diverse populations since those would be an issue as well. We would propose that investigators need initially to share proposed methods, identify common analyses to be undertaken, and then meet again halfway through to sort of assess progress and how things are going. And then at the end, have a workshop that would be open to folks outside of the investigator group to report on experiences and explore and disseminate lessons learned. Again, we would ask for a plan to analyze, under-utilize genome-wide data within this two-year timeframe, asking them to focus on the X chromosome, they would have to have access to existing data. We would propose that perhaps about 10% of the budget be set aside if needed for genotyping and or sequencing of DNA in possibly a limited, high-priority subset of subjects. That's something that we could use your advice on. It might be worthwhile, but it would be something that we would hold in and sort of reserve until we saw what kinds of applications we had to come in. We would require a deposition of individual-level data and DV gap if they were not already deposited. We would propose that simulation studies be allowed as part of methods development, but that an application not be limited to simulation studies. There would need to be analysis of existing data. Suggested criteria for selection would include a broad range of diseases and traits, as always in population genomics, high public health significance and ethnic diversity of the population's development dissemination of new methods. And as a sort of a guideline, at least 2,000 participants with existing high-quality genome-wide chip data greater than 550,000 variants and the more data that can be provided. We would anticipate funding this at a relatively modest level, $2 to $3 million total over a two-year period, so about 1.2 to 1.5 per year for four to eight awards. We would propose the research project grant mechanism. We don't see that there is a need for cooperative agreement approach, though we would look for your advice on that. And we would encourage participation of other institutes and centers to fund more applications with a wider range of phenotypes that might be relevant. So we see this as an opportunity for maximizing the knowledge to be gained from existing genome-wide data, for which the research community has paid dearly, to avoid missing important associations with human disease and to leverage available data with a modest additional cost. So I think with that, I'll stop and ask for any comments. Thanks, Terry. I'll focus my comments on the X-Gromosome because that was the primary focus of the concept clearance. I totally agree that ignoring the X-Gromosome has really been an issue over the last four or five years of GWAS. And I agree with Terry that basically the first few studies sort of set a precedent and guilty, since mine was one of them for Type II diabetes in 2007. Now, the reason that we were guilty is that it was actually three studies that were working together very closely. In fact, we ended up publishing papers back to back to back in science. Two of the groups had done the AFI 500 K-Hif. We'd done the Illumina 317K. And basically, we were in the early days of doing this, trying to decide how could we combine our data, because despite the large numbers of SNPs, there's only about 40,000 of them in common. And a happy fact is, at that same time, Gonzalo Macassus at our place and Jonathan Marchini in the UK were developing methods of genotype imputation. And so those methods were developed just in time that we could in fact impute genotypes across our platforms in those early publications. They did, however, at that time only do it for the autosomes. It was only a couple of years later that that became available for the X, which is at one level a little bit silly because the X is actually easier than the autosomes. And the notion that haplotyping is harder for X chromosome than autosomes, well, we can't really credit that because a lot of the work is already done in the context of at least half the people. But I think those early studies did set a precedent. And I think people have sort of been comfortable not dealing with the X chromosome, partly because of issues of how do you count? Do you count females as having two and males as one or with X in activation? Do you count one each or how do you do that? There were early on issues of availability of methods for testing, methods really, but software. But those issues have really gone away. And I think over the last couple of years, last year and a half really, imputation and analysis methods have now become available. A lot of the work in GWAS, though at this stage, is focused on large scale consortia, meta analysis consortia. And those are not little PT cruisers, those are battleships that getting them to do something different takes some time. And so in the stuff that's been published in the last year and a half, the analysis I think is totally right. There's still this autosome bias. But if we look at what's going on right now, and I have to confess I'm not able to look at everything that's going on right now. But if I look at type two diabetes and type one diabetes, other autoimmune diseases, lipids, blood pressure, glucose, anthropometrics, all of the sort of GWAS meta analysis consortia that I'm involved in or have close contact with, all of those are now doing the X chromosome as part of the sort of current stage of analysis. Now, that's not saying that that's true for cancer or mental health. I'm not in touch with those communities. I actually sent out some emails last night to see if I could get some input. And I suspect I have some input, but I can't get out the internet. So I can't go beyond the earliest returns I got, which were actually reasonably positive to like the idea of this sort of idea of encouraging the X to say, yeah, it's really not necessary. So I see no harm in this. But I'm not sure all of this isn't going to happen just sort of naturally. If we were in a climate of lots of funding, I'd say by all means, let's go ahead and do this. Given that we're not, I've sort of mixed emotions about going forward with something like this. I would say in its defense, it is very little money. And so, you know, one could, I'll turn it approach, I guess, would be to write sort of a brief position paper, see if we can get in a good journal and say, you know, this is something people really ought to be doing because now it's possible. Could you maybe talk about how applicable because you talked about the X, but you also talked about CNVs and some of the other ways of doing analysis that have also been underrepresented. Maybe given some of this, if the focus, if X is being taken care of, you can still move ahead with this, but refocus on some of these other types of analysis. I think that's a good point. I think we do need to look carefully at whether X is being taken care of. And I think that, you know, the data that we have today don't look all that great. I mean, things really haven't changed much at least to March of 2011. And these methods have been available since about 2009 or so, but people are a little slow to take them up and that sort of thing. So one thing that we could conceivably do is basically develop a funding opportunity, stimulate people. We might not even have to put any money into it. We might just stimulate them by having them write an application, which is kind of neat. And then see at the time that they come in, and after they're reviewed, to see if things really have moved forward, then probably this is a solved problem, which would be great. In terms of the Y in mitochondrial, and I would ask David and others who do this kind of genotyping to comment, my understanding is that they are still very problematic in terms of genotyping accuracy. And what we were looking for was a small amount of money to stimulate analysis of existing data sets. And I think the existing data sets probably aren't there yet. David, do you want to comment? I agree as far as the Y in the mitochondria. And I wasn't on the copy number variance. I wasn't sure if you meant copy number variance across the genome or just on the X. No, well, copy number variance across the genome. The concern that we'd have there is that that's a very sexy sort of thing right now and that that might swamp out analysis. Yeah, I totally agree. Plus, we've had some experience in the Geneva Consortium that's really pretty sobering in terms of duplicate concordance in these. I mean, it's, you know, a half to two-thirds actually replicate. And when you look at mother-child transmission, only about a quarter to a third of them are transmitted where you would expect 50 percent or so. Right. So we're not convinced that the genotyping there is quite good enough yet to stimulate. Yeah, I agree with that as well. So I'd be, if you want to stimulate some attention to the X, I don't know if I'd have CNVs in that same. That's very helpful. Yeah, we really were on the fence on that. So that's very helpful to hear. Mike. One thing I was intending to look up, but didn't, was how many SNPs are there on the mitochondria, how many mitochondrial SNPs and Y chromosome SNPs are there on the standard products? I just, I have to confess, I don't know the answer to that. I think on why there's only like 28 or 30 or so, it's a very small number. On mitochondrial, it may be a couple hundred, but I honestly don't know. Yeah, before making a decision there, it'd be important to have some concept of, you know, how large that scope is. And I would totally agree on the CNV. I think that realistically, we've already spent a lot of effort on trying to identify and genotype CNVs in the context of GWAS data and, and putting a substantial more money into that. I'm not sure is the right thing to do. Matt, this is just, it's too little if you want to take on the CNVs. Right, yeah. And the, the additional problem with mitochondria, of course, is heteroplasmy. And then you see, you have to worry about, you know, where your sensitivity is and, and how heteroplasmy affects that. So I think that's a murky area right now. Would you recommend not including the mitochondrial in a, in a solicitation or including it, but looking at it very skeptically? I mean, allowing people to come in, maybe they have really great approaches for, for using it that we don't know about. Well, I guess I would say I, I wouldn't, I don't see a problem including it, but I just think the review of that has to be very, very informed to review. Great. And it wouldn't, and it wouldn't be the emphasis, the emphasis really on the X. Howard. Well, and I, I would, I completely agree with what's been said, except I would maybe emphasize it a little bit more, but review it very hard, because part of the reason why the reagents that we currently have available are not that great for, for, for why and for mitochondria is that there hasn't been any expectation around it. And so if there, if there is, there might be some informatics person who could turn their energy to it and figure it out, or some of the the companies may, may decide they better up their game. But I'm going to hold my breath on the ladder, but you know, it could happen. Any other? I just make one more comment, and that is if, if one really is thinking of putting this out as a trial balloon with the intention that a year from now we may decide this isn't a problem, and therefore then not funding these grants that people would go to the trouble of putting in, I think that's a terrible choice. I, I think one needs to make the decision, is this something we think is important or not? If we're not sure, it'd be a better choice to put this off for a period of time than it would be to put up a trial balloon that I think will get a number of our colleagues pretty upset with us if it goes in that direction. Although the funding might be about the same as it is now. Fair enough. So I think we would propose given the data, you know, the best data that we have right now says that this is needed, and I think that's a, a fair thing to go forward with a solicitation. Things can always change. I do think though that the situation is a little different. Sometimes you need to stimulate methods when there aren't any methods, and here you already have methods being used in a third of existing studies. And that just strikes me as a different level of priority than a case where methods don't exist at all. So I would hope the incentive to find strong associations if they're there would itself be a reason that people would want to apply X chromosome methods to their existing data sets, particularly if they already exist and are working in other studies. So again, it's not a large amount of money, but the level of priority here to me seems less than those cases where methods don't yet exist or not being included in any studies for, for an important kind of analysis. Yeah, and the methods development was sort of a, you know, if needed, but, but I would agree with you that would be a low priority for, for this. It's really using the existing data. Sir? Actually, something like a, like the genome analysis workshop, one of those types of things focused on this may get as much out of, for nothing as, as putting out an RFA or a RPA. Because you're right. I mean, I think the problem is not X, because people will analyze that. Why? It sounds like it's not enough snips to worry about. My country, I don't know. It sounds like there could be some hope there. I would come back to something that Mike suggested, which is actually a publication. I mean, you've got these data together and I don't know where the, I haven't seen these, any kind of emphasis on these data and they're really astounding. Particularly if the data are just sitting there that people could do it now. Yeah. So, so that is definitely a plan of ours and Anastasia was busily working on slides for me. And then we'll, we'll take over right now. Yeah, that's right. Yeah, exactly. Great. Any additional comments? So what do we do about approving this as a concept? Do I hear a motion for one approval or disapproval? Well, I'd move disapproval. And is there a second? Second? Is there additional discussion? Do you feel that without this, you can stimulate this happen? Because it, I mean, David's point is true that this, this isn't job one, but there is some work. We, we are trying to stimulate it as best we can within our consortia. And, and there is resistant to it, even as that's happened. So getting people to do the imputation on the X chromosome and to, and to do the X chromosome analysis has been a challenge. And, you know, and we're paying them in, in other studies, I'm not sure that, you know, we have any, any cloud we would like to think that people would see this as a, as an opportunity and would go forward with it. It hasn't seemed to happen in the past couple of years. And so it seemed as though some stimulation would be appropriate. We can wait another year and see how it goes. Mike? Well, it is actively happening in type one diabetes. It's happening in type two diabetes. It's happening in glucose and insulin. It's happening in anthropometrics. It's happening in lipids. It's happening. You know, as I say, I don't know what's going on in some of the other areas, but in terms of immunogenetics and in terms of metabolic genetics, it is happening. Mike, is it just too soon and we're not seeing the publication? Correct. Okay. So these, that's not, they're not finding anything. No, the current round of meta analyses going on are a thousand genome space. So there's a whole new round of meta analyses going on for all of these different consortia I just named. They're doing, we're doing it again, because we've got a different set to impute against, we're impute against a much larger set of markers than we could and we're doing this based on Hamptman. So we will have publications coming out, I hope, from all of these different efforts. All of them are including DX. And I wasn't sure about that. So that's, that's a polling I actually, when I first saw the, the clearance to see that, or the proposed clearance to see what was going on. And in fact, I got emails back on Friday from each of those different groups I mentioned that, yes, we are actively doing that. I thought so, but I wasn't sure, but I did check. So, Mike, would you, would you have concerns about the phenotypes that really don't lend themselves to these large meta analyses? And you've, you've described traits that are, you know, have hundreds of thousands of people. And that's the place where I think there is an argument for doing this, but I think there's also an argument for, for writing the short position paper. And I think either approach is reasonable. I think either approach is reasonable. I've got to believe it's going to happen, but I could be wrong about that. There's a, I can speak a little bit for neuropsychiatric disease. There is a sort of a growing interest and realization that one needs to take into account sex differences in incidents and that we have not, that many papers pay almost no attention to that. And so I think you're going to see more interest in that and maybe that will stimulate people to look at the extent data. So, so can I ask a question before we go to, because obviously there's a motion on the floor in the negative direction. So, I don't know, before we vote on that, I guess what I want to hear from those who are moving not to move forward on this, what your proposal might be for sort of reassessing. I mean, there's obviously all of us right or right or wrong, right? And so either this is going to have an uptick or it's not. Are you, in saying don't go forward with this consequence at this time, are you saying come back in four months, eight months, 12 months, that's sort of the windows that we operate in. I think four months is probably too soon because these papers are just getting started. I think eight months would be quite reasonable. Or as I said, you know, this is not a lot of money. One could just say, you know, this is important enough to us that we think we ought to commit the funding to it. But, but, but that would be the time range I would be thinking. I think four months would be too soon to now. I think eight months we'd have a good idea. Isn't this a question not so much of is it going to happen or is it not going to happen without this funding, but rather more a question of can it be significantly accelerated and just what the benefit of that is relative to the amount of money that's being spent? I just asked Terry the question whether this is something that could be addressed by issuing a program announcement, which is essentially a statement of institute's interest. It doesn't carry any commitment to funds with it, but if you get good applications, you can fund them. They're essentially equivalent to investigate or initiate that are ones. I think that'd be totally sensible. Okay, any other discussion? So with that discussion we have a motion on the floor in a second. All in favor of the motion not to approve this as a concept for an RFA. Please indicate opposed. Okay. Thank you. So we will go ahead and look at these other. So we'll go ahead and look at these other approaches to trying to achieve this goal.