 I was asked to discuss the questions that might be answerable. And the one thing, I guess, I'll do two things I'll say before I launch into this. One is that a number of the questions have been here today about is it envisioned that X, Y, or Z. And I can tell you, from at least my experience of being on the planning group, that this is not one of those cases where someone thinks they have the answer and you're trying to read the tea leaves and pull out of them what they already have decided. This is a case where there's a problem and we need to solve it. And the other thing I would say about that is that the genomics community has done a very good job over the years, I would argue, of saying what problem needs to be solved and then going out and solving it, even when the answer was not in sight, as opposed to saying what's possible. But sometimes when these types of issues come up, we say what's possible. And I think that obviously we're going to adhere to every rule and regulation and commitment we've made, but we should be thinking outside the box. So what I thought I would do in thinking about this, which is not a standard talk, is actually try and address not only the what are the questions, but who are the audiences? Who are the communities we're trying to serve? And I think the reason that's important is that this tends to be a discussion, I think, of us talking about how we serve ourselves. And I'm going to try and, at least those of us who are members of the genomics community, and I'm going to try and at least destabilize that. So what I mean is some of us are gene hunters. I guess it would be fair to put me in that category and say, you know, I have a disease I'm interested in and I want to, and it varies in the population, it's heritable, and so I desire DNA sequence data and I want to compare it to controls, and my success is measured as like, do I find a gene? Okay, that's a valid activity. And another, and you could argue that over the last few years we've made some progress, because it used to be this was done in an incredibly fragmented way with everyone's attempts to study disease X done separately. We've done a good job of pulling it together in terms of each disease. You do see papers where in effect, however it was done, however elegantly or inelegantly, we look across all the, a lot of available studies. But actually we seldom or almost never make that data available to other people studying other diseases and traits. Statistical geneticists, I'm going to go through five different categories of human being. Statistical geneticists tend to develop analytical and computational methods to analyze genetic variation for a variety of good purposes. And they desire access to data so they can develop and test and apply their methods. And their success is measured as, is my method good? Did I, they often talk about, did I explain the variance? Because explaining variance is a lot of what statistics can be about. And they publish papers. And again, you could say the DB gap has provided some route to access for that community of people to apply their methods to data. But it's certainly cumbersome. And in particular, each time you have a new question or you have a new data set, you have to go back and get a new data use. And actually, so it's not very flexible, it's not nimble, but it does exist. Nonetheless, I would argue that while some attention and effort should be dedicated to the communities of which many of us are members, it should not be our main focus here for the next two days. Because that's actually how we serve ourselves. And I actually think we're doing a much better job of serving ourselves than actually we are of serving the 90% or whatever of the biomedical research community who are potentially to be informed. If we're ever going to deliver on the promise that we all say, we're going to inform biology and medicine, then we have to actually have some impact on biology and medicine. And I think that we're not doing that very much, I would argue. And it's because we're not actually answering the questions those people have in a language that they understand. So I'm going to just sort of at the risk of, I'm like way out there, right? Because I'm vastly oversimplifying, but nonetheless to be provocative. So I trained as a biologist, I didn't train as a geneticist or a statistician or epidemiologist, I trained as a doctor and a biologist. And what most biologists, I would say in my department do, is they study a process and they in particular study a gene, or they study a pathway. They don't actually want much from human geneticists except that we don't consume too much of the NIH budget. But to the extent that they could care at all about what we do, their question is, I have a gene or pathway that I study. Does human genetics in any way I could be, I could be flip and say, does it in any way confirm what I already know to be true? And if not, I don't really want to hear about it. But one way or another, they want to connect their gene to human biology. So I would ask you the question, if you were such a biologist and you didn't know what you knew, how would you possibly answer that question? That's the phone call we get a lot actually. So I'm less concerned about the phone call of the statistician who says, where can I get more data sets than the biologist who says, which I hear all the time, because I work more in a community of biologists. And they say, well, I study this gene. And I studied the mouse model of X. Does that true in the human genetics? And you have to go get a, if you wanted to, you might get a postdoc who could spend the next month and a half trying to cull through all the literature to write a review article for them. But there's certainly no simple answer that we have offered yet to that question. Let's think about doctors for a second. I'm one of those too. So at least I was once, I'm defruct. But if you have a patient in front of you, your question is, and there are people in the room here who are currently and very leading in this domain, but a patient comes in and there's some question about predicting the course of the disease, diagnosing the cause of their disease, recommending some intervention. And increasingly we hear about the idea that we're going to have genome sequences in the clinic. So if you had a genome sequence of a patient and you wanted to know, how would I even annotate this with regard to the world's current knowledge of sort of relationships between DNA variants and disease or even just frequencies? Has this variant been seen before? You need access to the world's data in a form other than again go read 100 papers and try and summarize for yourself what's going on. And many of the efforts I hear of to do this are like hand curation that we're gonna think about individuals going and doing this. And that's important and necessary, but it's certainly nice to be able to provide, especially if we imagine going from what we have today to 50,000 to a million genomes, some sort of at least summary of what's been seen to date that those expert committees could work on. I also just sort of gratuitously wanna note the concern about ascertainment bias. I'm very concerned that a lot of Mendelian so-called mutations have only been studied in people with Mendelian phenotypes. And that if your patient came in and had an incidental finding, so they didn't actually walk in with a leg growing out of their head and you found the gene mutation associated with the leg growing out of their head, it's not surprising they don't have a leg growing out of their head. Unless you are such a Mendelian that you believe that all mutations are fully penetrant. But that's the danger we're in right now. And so in some sense, the potential of all these genome sequences we have is to provide a rich data set in which to say, well, in people not selected for having a leg growing out of your head, that's actually Drosophila, not humans. But if you're people not with that phenotype, did you ever see this mutation and was there a phenotype so that you could say your patient who didn't have the phenotype walking in the door to the Mendelian Genetics Clinic, what's to be expected? And we don't have that right now, okay? And then finally, and I'm gonna be followed by Jeff Trimmer who's gonna talk with much more authority about this, but my sense of what most people in the pharmaceutical industry think about is developing therapeutics that might actually help patients, that's their job. And to the extent human genetics is relevant, it's either because it could help you predict in advance what the effect would be of modulating a target with a drug, because there might be a genetic perturbation in it, or maybe selecting patients for inclusion in a trial. Their success is not p-values and statistics and papers, it's a drug that works. And again, if you worked in a pharmaceutical industry and you were not a statistical geneticist, how would you answer the question, like if you're working on a drug target and you're gonna inhibit it and you wanna say, does any experiment of nature inform to me what might happen in the patient, where would you go to look? Because the answer is our community has failed to deliver something with that sort of simple level of clarity that might inform that activity. And so I think that this is actually, we should, as we think in our usual genomics community way is to how to realize the impossible, because like $1,000 genome just five years ago, how likely was that? Or sequencing the human genome in the first place, or any of these things, actually just organizing ourselves to answer whatever questions that turn out to be the most important should be more achievable than going from $100 million to sequence a genome to $1,000 in a space of some number of years. So we should be able to tackle this. So just to close, and then we can have discussion, what are some of the kinds of questions? And again, this is just an individual view, this is my point of view, I'm not saying these are the right questions, I'm just trying to provoke some discussion. Things like, given a phenotype of interest, we should make it very easy. And some of this is like OMIM, right? Some of this is what used to exist for OMIM. It's just a OMIM in the world where it's actually, there's a million genome sequences and it all could be integrated. But it's questions like, given a phenotype of interest, identify the complete collection of genes mutations that have the property that genetic variation is associated with your disease of interest. So you should be able to go somewhere and ask a question, show me all the genes. I study heart attack or I'm a medical student thinking about what I want to do with my life and I really care about heart attack and I like human genetics, so could someone please tell me, what are the genes that are actually involved in heart attack and I don't want like a committee writing a summary article, I'd actually like to see some analysis of data. Given a gene of interest, flipping it around, or orthogonally, what are the set of phenotypes that have been associated with this gene? So this could be like a drug company saying we're developing drugs that target endothelial like lipase. Are there any, we want to do that because it might affect HDL. Are there any phenotypic associations of mutations in that? Another thing would be we have a variant now. We've gotten from the gene to the variant because they won't always be the same. CD-Cal-1 and Intron and CD-Cal-1 is associated both with Crohn's disease and also with type 2 diabetes. It turns out not to be the same genetic variation although they're right next to each other. It's different haplotypes. You might want to know if it's the same variant or not and here you might want to say there's a variant in my gene I found that is associated with HDL I'm doing that because I want to actually affect heart attack so is that same variant associated with heart attack? So some of you may have seen St. Cath recent and a cast of international collaboration actually published this paper of which I'm an author that came out a couple of weeks ago that said let's look at variants associated with HDL cholesterol and the risk of heart attack and so the answer to the first question might be here across some of the premier epicohorts in the world are the effects of this variant on HDL and here in 116,000 people is the utter lack of any effect of that variant on heart attack but why is this not a lookup? Like why is this have to be the case that every time someone, because there's a lot of other questions like this why is it that each one of them has to be a year and a half long effort and we start from scratch every time this is a phenotype-genotype matrix in theory it should be possible to simply have the data available so someone could walk up and say I'd like to take all the variants associated with glucose and ask if they affect heart attack risk or oh I'd like to take all the, you know I believe in the inflammatory hypothesis of type two diabetes there are a lot of variants that are associated with inflammatory diseases do they have any effect on type two diabetes? I'd like to note just so it's not unstated this will change the workforce that we need because many of these things are what people might imagine a crew of hundreds of people writing papers about for the next however many years but that's good, right? First of all progress is good if it's good that it costs less to sequence genomes it should be our goal to figure out how to automate analysis but also it won't actually put us out of work any more than sequencing the human genome put people out of work because what we'll do is generate a large number of high quality hypotheses that then could be the subject of future study and future genetic analysis and so I'm just gonna close by saying to my mind the question is how do we take what is the incredible amount of money that the society has invested in us to sequence the genome to perform all these clinical studies to perform genetic studies to sequence all these genomes and deliver answers that are not just think about how do we get more access to data so our lives are easier we have less bureaucracy and we have less difficulty getting high powered things to write our papers but how do we actually answer questions that the rest of the biomedical community really needs and think more about that perhaps and the other thing I wanna be clear on is this is obviously incredibly complex okay so I'm just trying to put out what I was asked to what are some of the questions but it obviously calls for organizational and cultural change it calls for some regulatory change again I don't think our goal coming out of this meeting should be to find the solution that would be a huge mistake right even if we weren't let out of the room until we'd spent a week here and came away with our consensus view of what to do I'm sure it wouldn't be nearly as good as actually some period of years in which there was a lot of innovation and diversity of approaches and trying to answer the questions so as much as budgets are tight we have invested a huge amount in getting this far and what we should try and think of is actually doing some experiments in how this could work because over time I'm sure then this community will converge over time on maybe not one solution maybe multiple solutions maybe different solutions for different kinds of questions or different audiences it might be there's one kind of audience to enable the one kind of approach maybe to investigate the sophisticated computational analysts to have access to data and it might not be the same as the one for the biologist or the pharmaceutical investigator has a different set of questions or maybe it is the same I don't know but let's not try and come up with a consensus opinion but rather come up with some approach that will begin to make these things possible so with that there's some time for discussion hopefully that will provoke some yes Lincoln I think you missed one audience type and that would be the researcher trying to organize a study and looking for suitable patients say go to a database find all patients with high cholesterol and a relatively rare mutation and pull out 5,000 people that could be recontacted and recruited so David Cox and I were just talking about that before the meeting and I think that I totally agree with that I think that as long as we are and I think we are being flexible as we don't conflate the idea that we do need to do that and do need to pull in new patients based on a genotype or a phenotype with all the retrospective data because if we say that those have to be the same thing and this did happen early in this discussion the whole discussion became hung up on like solving every problem with every patient you know all data should be in and all data should be called backable so I think you're absolutely right there should be a you want to design a study and you'd like to pull people in based on some combination of genotype and phenotype and such a system would certainly help you formulate those hypotheses and if there were some genotyped individuals or phenotyped individuals who were consented for recontact it could be obviously a direct you know sort of suggestion of what the next steps were Eric. So David building off of that and also combining your talk with Adams be interesting to know how many in his histograms he showed how many of those individuals are measured for many, many, many traits and are currently being followed so as they develop new diseases those or new conditions those conditions could be entered into this database so the wealth of information would continue to grow as opposed to static case control studies they're important but I think they don't have the value that these deeply phenotype studies would have. So I would agree and the only other comment I would add to that is I think they're particularly of value in my mind case control studies have always been a efficient way to generate hypotheses and longitudinal population-based studies are a great way to characterize the effects and follow over time and to the extent we had them in one analysis environment then that leap would be very easy you know if you wanted to ask is the effect different or the same or look at other phenotypes and so I don't think they're in conflict you'd probably want them both accessible but I agree with you that samples you might want to invest more in would be those we're going to a gift that keeps giving. Carlos? So David do you think it would be useful to think about in terms of the set of questions that you asked you know what are the set of things that we believe we could achieve that 50% of the way there in a year right? Like open up all the big studies in a way that would make it really easy for analysts to barrage on the data and then what are the sort of the medium term versus the long term goals and you know in thinking about I think you made a very articulate argument about the role of you know industry that could come in here right I mean you are talking about creating new sectors of the economy that are going to be enrolled in how this information and data is going to be managed so so I don't want to I mean again I'll just give my own opinion it may not be I'm being intentionally a little iconoclastic or whatever in terms of how our community thinks I think it's harder but I might get pushed back in this I might well be wrong if you say and I don't want to quote you but you know if we say how are we going to get it so every analyst can go I think you actually barrage on the data but you know can have free access to the data that to me is actually a fairly complicated task because there's a whole set of issues about what are they going to do with it how are you going to act control it how are you going to regulate it all of which need to be addressed I think in the long run we will get the best answers from having a diversity of approaches going to work something that having a limited number of approaches applied to a large number of data would actually be easier to achieve you know I'm saying that would be to have take all the data and make it accessible to everyone for everything so as we think about quick wins and I'm you know the question of whether our initial goals should be achieving so all the analysts can do their analysis which in particular the analysts will be enthusiastic about you know versus what are a set of questions that we need answers to across data sets and can we pilot how we get those answers so we can learn what things are needed they're not in conflict it's just they have different challenges there's a comment over here yeah I just want to comment about the re-contact based on genotype I think even that one has to think about governance if you take BRCA one and two as an example there have been dozens of studies and many centers have really found you have to limit the number of times you re-contact people with the latest hot genotype so I think even that we'd have to think about the governance I totally agree and I actually think that's so clearly correct that while re-contact as Lincoln was saying is a very important thing to enable I personally would hold it as a separate goal because it raises so many different issues than just bringing the data together and doing certain kinds of analyses that we shouldn't ignore it we should focus on it but we shouldn't think one size fits all Pearl did you want to add something to that yeah on that same note I think sitting here listening to data access versus access for re-contact are so wildly different they both have their place but I think if we're looking for a 50% solution I think the re-contact deserves another meeting so we might actually say actually there's three levels of that we're actually talking about in my mind there's actually access to results then there's access to data and then there's access to the patients who gave the samples to generate the results in the data and again we should just keep them separate I think our Vindu is next you know in hearing you and a number of other people speak and comment you know one thing that this is likely to very effectively lead to is to know not only this question of studying health disparities but where the disparities are in the studies meaning we don't even know across all of human morbidity as to what we are studying I'm not saying we should make it proportional to you know mortality and all of that but just where the gaps are is currently very very difficult to know and this is particularly true for studies in children as to who are being studied for what and how and this might be very very important in trying to figure out where the holes at least the big holes are I think a couple of things we might keep they're important but I'm just hearing some of the threads that we might keep separate in our minds even as we attend to all of them one is how do we create a computational regulatory organizational environment in which such things could be certain things could be done it would be a different environment for different such things then there's what you guys are both raising is what data has gone into that database and this is the meeting last week that Francis and Michael Dolson other organizers a lot of discussion of what would you want in my mind it was described in a different way but it's really like what samples would you want in such a database in order to ensure that you found the most important things but you still would want in the database you could look is what Eric's saying you know you found something you'd like to then look at the studies a lot and you guys are saying maybe you wouldn't have the right ethnic or ancestry mix so there's an organizational and software platform that's secure and responsive to enable things there's what's in it and then there's how we serve the answers including answers like that you would it would not be allowed to re-contact someone yes yes speaking about communities that you might not have included the patient community might be one exactly for the reasons of perhaps not needing to re-contact if they donate up front and then also that they donate a phenotype that is progressing over time because you don't want a snapshot only you want to know what happened 10 years from now so that model since you are in the business of breaking yes the establishment I think it would be very important I think you're right I think that there's an even fuller view of what we're talking about where the patients could have if I'm hearing you correctly and patient involvement in an ongoing way is this what you're saying? well I think people are legitimately out there trying to donate blood donating data donating what they can to make health better and that by itself would remove a lot of the barriers that we have right now so again I totally I personally resonate very much with what you're saying and I think again as we might want to keep track of what are the different threads because you know a system that could allow you to do such things would be powerfully used in the social environment you're describing where patients are engaged in an ongoing way contributing but at the same time the system will be useful even without that and you know we want to keep each of them separate threads but because they're each important and they each have separate solutions yeah and I think computationally we can put data out there but allow permissions at different levels yes all right I assume somebody is in charge of the time I am watching how long you've been up here there's more time yeah and a few comments to throw out so I think when you when you talk about say databases you know database are very attractive for parts of the problem that you identified say if you want to ask your question about what are the genes associated with MI what's the current best view you know it'd be nice to calculate our best answer not to have to go look at you know 10 different papers published you know two or three years apart and try and digest and none of those would be the best answer because the best answer would be taking the data together from all of them but you know there's other things where I think the idea of having a large database and think that it will solve our problem sounds actually quite scary to me you know if you think about say processing sequence data you know and saying you know we need to figure out a way to put the sequence data in one place and analyze it once with some set of I think those are the kind of problems you know it's more of a compute problem than a database problem it's a problem that rapidly becomes obsolete you know so you're gonna build some white elephant and it's two or three years from now you're gonna have to build it again right so I think that there you know if you take Carla's approach you know how do I make the data easily available that's the data easily available solution lasts for a long time you know the single compute place could be easily a white elephant it could cost you more than so I think that I'm not surprised to hear you say that I think that that's one of the reasons you'll know to my slide innovation and diversity because I think that if you wanna avoid a white elephant on the one hand certainly one way to ensure you have white elephant is to have one of them because then you'll have lowest common denominator committee driven process and it'll be and you would like competition among these things however I'm gonna respond in kind by saying that even in the environment I work in which has a large high proportion of really great analysts who can do things it's still the case that often there's someone in the next room over who doesn't have the access to the answer actually someone in our group meeting yesterday said we were talking a little bit about this and some of our group meeting who's a very sophisticated person computationally said I don't understand why this is a problem you just go grep this file and then you go do this and you do that and you get the answer and even if that person knows that even if it happens to be true and this was not a go back to the raw data thing this was sort of some available data even if she knows that it's not clear that anyone else in our group meeting knows that let alone anyone outside of our group would have access to the answer so I agree with you Gonzalo we don't I totally agree that we do not wanna create a wide elephant one size fits all thing that's out of date immediately that would just mean we did a really bad job and we were wasting the money and we shouldn't do it at the other hand the fact that individual analysts have access to data and can write their papers does not solve the other the problem of the other three communities yeah yeah there now is only a fraction of the data we're gonna have in the future and so I think it's it's also important for some of these questions if we can sit here and Adam can go and say you know here are things that I think are high value you know we have ability to contact participants or there's few restrictions on use you know it'd be nice to have a model consent for example that means that you know when we have this meeting five years from now or 10 years from now at least some of us might still be working in this field by then you know maybe you'll find new fields as you but you know we don't rehash these questions and so I think it's important to have to think about what's the ideal consent and can we use it more broadly or the equivalent of you know sort of standards right saying okay consents that meet these set of standards will then be allowed going forward in the way you said I think the same way that you're correctly pointing out and I do totally agree with you we should not have a solution we want competition between different things and we also don't want a consent because there's gonna be one consent that'll be for free data sharing and it'll be a very good thing but only of somewhat you know only some set of people will sign up and we have different things Chris O'Donnell was waiting earlier did you want to okay sorry Chris O'Donnell yeah Hi David two comments one is to amplify on the question of communities that we should be on your list I think the populations providing the studies the actual participants in those in those populations is very important I mean because by engaging them I think you have an opportunity to get by in by those populations so I think that's gonna be terribly important the other part is is you mentioned the HDL study and one thing that I'm worried about is that through all the thinking on how to get the data together and knit it together that the actual end product of the results of the analyses would just be put in a shoe box it would be a really big shoe box but it would be potentially a shoe box the only thing that would come out is the low hanging fruit and like many of these GWAS that we are all involved in some of those data are shared publicly and in many cases they're not and it would be really I think a good thing to think about how to make the data very widely available so a lookup could be performed and the effort would be maximized yeah all right so I think unfortunately it's up to you if you wanna do