 The committee has been working on trying to capture points as they've come through the discussion and for most of the next two and a half hour, or next three hours, we're going to spend time going through that, through a document that we've created that we think is going to serve as a discussion draft to put stuff out there for you to react to and refine and see if we're actually getting to key points in this meeting. But before we go ahead with that process, we wanted to get back to a point that has come up. It's clearly an important point, and we wanted to make sure that we took a little bit of time to discuss it. And this is the issue of what kind of information sources do we need for clinical care needs and how does the process that we're talking about today relate to that? And I think Heidi is here and maybe could comment a little bit about that. Sure. I'm over here. So I direct a clinical laboratory and also have been working in the area of software development for clinical laboratories. And in listening to some of the conversations, I think the one thing that I think about is the laboratory workflow. So I think there's a couple of distinctions. One is supporting the physician and interpreting variants for the care of their patients, which is a step after the supporting the clinical laboratory geneticist who's interpreting data that will be put into a report that will be given to a physician who then has to care for their patient. And I think it's a pretty big extension to try and today think about supporting the physician. But I think it's reasonable to think about supporting the clinical laboratory. And that's what Richard has been talking about and others where we use these same resources. We do have scientific backgrounds and understand the complexities of the data. But we often have a very different workflow in terms of how we use the data. And as David pointed out, we do have to use it in a comparative environment where we can only make use of our patient's data in comparing it to data sets that are out there. And we do that all the time. ESP cohort has been incredibly valuable to have actual specific genotypes and quality filters put on that data. And we use that daily. The difference is that we use it daily, not in large project mode. And so every day we're accessing one by one variance as we interpret patients. And so the environment of DB Gap to put in a request for access to a large data set to analyze for a specific project is entirely not how we work. Every day one by one we look at variance to write specific interpretations at the variant level. And so thinking about what kind of environment we could use to support the laboratory geneticist to be able to access data outside an IRB approved protocol. But within a HIPAA compliant environment to be able to access patient data and phenotypes to support the sort of individual variant interpretation process. And the other topic I'll bring up is related to, you know, there's been some conversations about providing IT tools for data analysis to assist clinical laboratories. And of course we run into a lot of regulatory challenges in that space because clinical laboratories have to have versioned and QC'd and validated tools. They can't keep changing. They have to be locked down and validated. And that often requires a lot of infrastructure. And although we can work in the LDT environment, the laboratory developed test environment from a regulatory perspective where we, they don't have to necessarily be FDA approved devices. If you create software that you're actually distributing to multiple labs as opposed to be developed within one lab, then you enter the FDA medical device environment. And that, those tools have to be, you know, at least at a minimum a class one exempt medical device with the FDA. And so that's what we've done with our software, but that requires a whole lot, another level of quality systems in place to be set on top of that. And so my guess is most of the clinical laboratory environments are going to have to either have their own software in an LDT environment or use commercial services that have gone through that sort of FDA environment in terms of supporting those workflows. At the same time, we do still want to get access to the data that's out there that we need to use to interpret the variants in our patients. It's just a different workflow. And those are just some of the key things that I thought were worth mentioning. No, that's obviously right. I think the question, though, that may be beyond the scope of this, but that we do want to talk about is the data is only comparable if the data, all the things we talked about do apply, not so much on the phenotype side, but on the data processing side. So the absence of a variant could be used, different filtering or whatever. So although there are all the issues you described, learning how to do this in the research setting, it seems to me will facilitate, learning how to do it and also creating the conditions under which it can be done will be necessary, in my view, to what you're describing. Because if there's a situation in which a researcher can take all the data, process it, get reliable things, and do the N plus one genome, then whether it's that system or with some other system built for the clinical labs, it means we've figured out how to do it. But if we don't move down this path, then it's just going to have to be created anyway, but without the benefit of the entire research enterprise. So I think what you're saying is on the one hand, the needs of the laboratorians just underscores the importance of data aggregation. I wonder if it also underscores the importance of bringing certain kinds of data analysis into public, making possible the public availability of certain kinds of results, certain kinds of aggregate results or summary results or things of this kind as they are possible to do. Does that make sense too? Absolutely, and we're, you know, we have a big effort that's been started through the ISCA consortium on CNV data that Evan Eichler and many others have been involved in that we're trying to extend the sequence variants and really capture a lot of the data that's coming out of clinical laboratories that can be also put in the public domain and break down some of those barriers because I think, you know, particularly, while maybe not quite as much on the discovery mode, certainly after these genes have been discovered, a lot of the testing transitions into the clinical laboratory environment and less in the research environment, particularly for Mendelian disorders. And so there's a huge amount of data that's been collected over many, many years in the clinical labs after those initial research studies that just aren't getting in the public domain and creating environments that facilitate and support the clinical labs being able to do that, you know, in the effort that we have underway and talking to clinical labs, it's actually, with some exception and we probably know the exceptions, most of the clinical labs are actually willing to submit their data. The issue is they don't have the time and resources and infrastructure to do it. And that's the major barrier for most laboratories, myriad being the exception. And so I think if we create the tools and the process to enable them to do that and the environment to put that data in, I think we will be successful in getting a lot of that data out there. And maybe we should acknowledge the importance of that kind of database, but I think what I'm hearing is there certainly are different constraints around the data submission process and perhaps the structure of the database than what we're talking about in terms of aggregating large databases for research. Is that fair to say? Yeah. To the extent that that's the case, it seems like it's really important to list that as a separate but really important sort of data aggregation challenge. Is that fair? Other comments? All right, I think we're ready to go ahead, at least if Mike is ready. So let me just explain that the organizing committee got together and we have put together a document that we think works as a summary document. We want to really emphasize that this is a draft and we actually are going to put the document up and go through it piece by piece. We've tried to capture what we think are the main discussion points emerging from this meeting and what we're hoping with this draft is to generate discussion, make sure we've got this right, find pieces that are missing correct things that we haven't captured right and hopefully that will bring this discussion to a close. You ready? Any other comments? We probably have another couple of minutes before we can start. Maybe I'll just ask one question of Heidi and others around this issue of clinical needs. And that is about the idea of identifying users who would then have access to aggregated data. I would assume that what we're talking about when we're talking about creating a data repository that provides up-to-date definitive information about genetic variation and its clinical implications, that what we're talking about is creating a publicly available database that anyone could access for the purposes of accurate interpretation of genetic tests. Is that correct? Yeah, that's exactly right. It would be freely accessible. Labs would submit their data along with their clinical classifications of those variants. And so people could see what various groups have said about a variant from a clinical annotation process because we do this on every variant we report on already in our normal workflows. But then I think there would be the better, you know, the improvement on that process is to then have expert groups of curation, evidence-based curation that can look at a lot of data that's submitted and this has been done through the ISCA consortium, groups of experts through a consensus-driven process have then marked variants as disease-causing or not based on a larger data set. And in terms of what's publicly available, I know you've been involved in discussions, would the format be such that we're not generating concerns about confidentiality or disclosures in terms of individual databases and others? This would be sort of summary data around particular variants? Yeah, so the way that ISCA consortium has done this to date is once they have put on their requisition forms and their website and their reports, in essence an opt-out notification that basically says this data will go into a publicly accessible de-identified environment unless you tell us otherwise. From the point on when they've established that policy, they've then been submitting full cytogenomic array data sets which we intend to extend to exome genome into DBGAP and then the variants flow from there into DBVAR or DBSNP so that the variant level data is fully in the public domain but the full case level data with a set of variants is in the DBGAP environment and so far that's been working for a number of laboratories to be able to take clinical lab data on patients without IRB consent but simply an opt-out mechanism and enable that data to be shared. Now that's what the current environment is that has been supported by many IRBs. They do have IRB approval for this process but the individual patient doesn't have to actively consent. It's an opt-out. I just again want to stress the importance though, it's not a regulatory issue, it's a technical issue of the lack of comparability of all that data. Like there's an assumption, not on anyone's part in this room, but that like sequence data is sequence data and it's always right. But the good thing about that approach is it's better than not sharing it. The bad thing is it's really not comparable to take your lab sequencing data from your vendor or your machine with your pipeline, compare it to some database and imagine that you're measuring the same thing necessarily. And so the model we're talking about applied in whatever way where you could actually have data available and you could actually process it together and you could actually analyze it would be vastly superior technically to what is being done now where the assumption is that you're sequencing and my sequencing are the same. Anyone who actually does this for a living knows that's a very bad assumption. So that over time if a system such as we're discussing here allows much more aggregation on a much larger scale with careful attention to data quality, et cetera, harmonization possibilities, we ultimately generate information that can provide a stronger support. A stronger evidence base for the clinic in a way that simply doing it in silos and then assuming you can summarize it in an expert website is I think actually naive about the nature of, especially if it turns to next generation technology and people are collecting more and more data as opposed to assuming there will be artifacts, there will be problems. And you'll also want to know for your, for all across a thousand other patients, not just the ones who came in with the same phenotype, what it meant. I think that's an important point, although I will say in terms of the data dictionary that we have helped NCBI with for the ClinVar project, the goal is that all of that data comes in with what platform it was run on, whether it was Sanger confirmed, because most of us in the clinical laboratory are not only getting these variants through NextGen or other platforms, but we're also Sanger confirming them. But it is important point that that data comes with what platform, what quality scores, and other parameters on each of those variants. Now, as you get into exome and genome data, it's a slightly different world, where a large proportion of your data is simply designated as the platform it came off of without any Sanger confirmed confirmation. No, and also when people read about why this is all gonna happen, the magic thousand dollar genome that's gonna be a hundred is what's always suggested as to why this is gonna happen. That's not a individually validating every variant by Sanger sequencing. So if the logic train is, oh it's gonna be so cheap, oh it's gonna be done on every patient, then the logic train has to be how we have the evidence base in the technical systems that support using that cheap technology to support clinical decisions, not the historic technology that might be more robust, but it's not actually what we're talking about for the future of magic thousand dollar genomes. Arvind, do you have a comment? No, I just wanna remind people not that this is not the right thing to do but there's a whole history of doing this kind of thing in Mendelian Disorders and there are these other mutation databases and local specific databases that in some sense, I'm not saying they shouldn't be replaced or not all of the data are great, but this is also trying to convince many other communities that do business very differently. So I think I agree that we should do it because this will be much better data on which to do clinical testing, but you gotta convince a whole bunch of people who, I mean, they'll require convincing. Sir, can I just amplify very briefly on that because I think Arvind is exactly right. There are many other efforts nationally and internationally to aggregate these data, but I think the point to be made is that this group, maybe part of our meetings mission should be to comment and address that because a nice outcome would be something that reduces this almost comedic parallelarity of development of exactly the same thing that is dealing with today's problems, not tomorrow. Well, I think what we need to do is move on to go back to the research database, but I think these are very important points to keep in mind as we discuss. So again, I wanna thank everybody for coming to the meeting and for what I think has been a very lively discussion, some very nice presentations, very lively discussion on issues that really mattered to a lot of us and what the organizing committee came here with was a notion of different things to discuss. As someone said, there was not a pre-compute on how this was gonna end up and so that has something to do with why when we go through this document, you may find it's not the most polished document you've ever seen, but what we've tried to do is to capture some of the key threads of the conversation and to give us a basis for going through and talking about these different things and doing our best as a group to come to places where we do have agreement and feel we can make recommendations, identifying other places that may simply require more study. And I think it's important to keep in mind something that either Eric or Francis said at the start. The goal of this meeting is not to get us down to the fine details of implementation but rather to talk about what our goals and options and in relative value of those different options seem to be. Okay. I'd also say I'm not sure why I'm the one up here doing this. I think I probably stepped back the slowest when it came to volunteers for who would be the presenter but I'm happy to do that. I certainly encourage David and Wiley and Lisa and Adam and other people who are involved in this process to please speak up. And of course, all of you to speak up because this is not meant to be a long-winded presentation. We have about two hours here and I think we can use it all on this particular part of the discussion. So the way that we chose to organize this document is to talk first briefly about the goals of the meeting, to talk about what we would think of as the likely outputs for this meeting, then to talk about sort of three major recommendations which come in sort of many parts but three major recommendations and then additional ideas that we think are worth putting forward. And so what I think would make sense is I'll go over those goals. We can see if anybody has any disagreement. Then I'll go to the outputs. We'll talk about that a little bit because there are a couple of questions there. And then go and talk through the full set of sort of key recommendations and then let's go back to them one by one and go through them. And then we'll go through the subsequent recommendations sort of one by one as we have time. And of course, if people have things in addition to bring up, we're delighted to have that happen. Does that seem like a reasonable process? Okay, I'll take silence for a cent in this case. Okay, so basically, meeting goal, we have a major investment in DNA sequence and phenotype exposure data. These data have the potential to inform the understanding, diagnosis and treatment of disease. At present, there are significant organizational challenges that limit our ability to learn and benefit from these data optimally in that these data sets are generally difficult to define, cumbersome to access, stored in a variety of databases, the underlying DNA variation, phenotype and exposure data have not generally been harmonized and some lack of sustainable governance structures for what's going on. The workshop was aimed to discuss scientific questions that these data could address, current challenges to obtaining and analyzing multiple data sets, options for dealing with these problems and costs and trade-offs of these options. And as I said a moment ago, later, more focused workshops or working groups can address the details of implementation of the recommendations to which we come. In terms of outputs for the meeting, certainly our set of recommendations, Lisa tells me we're meant to be doing a report, so you may all have worked hard getting ready for this meeting, but there will still be more to do after this meeting, but I think that's as it should be. We certainly don't want this discussion to go away without being captured in a useful sort of way for the NIH. One thing that the working group did discuss and was curious about people's opinion on is rather than just writing a report internal to the NIH, should we consider writing a short report, perhaps a one-page report, perhaps to a high-level journal, perhaps even to a journal represented here, that could communicate with the community the issues of the use of data, the challenges, and how we see them being overcome. Comments on that, that's a yes from David. Do you think there was a point yesterday where people said we should write this report, keeping updated and we should overturn that, and lots of people said we should do that? And I think that requires being written and published. I don't think this is the place to mention that, but for me that's going to be a brilliant act, but a very practical one. I certainly agree with that personally. I mean, I think that's really good to say. I don't know that it fits in with this document, though. It seems like a separate issue. It's not a kind of un-mitting answer, but I wouldn't want it being lost as a... I promise it's in what we'll be talking about. It's one of the issues that is written down at this point. We will get there. Anybody who want to argue that we should not be thinking of writing such a paper? I think if the short paper should be given to a much wider community, I think it's better followed or somebody should be able to download a much more detailed, you know. Just to be clear, Irvinda, this is not an either or. This is in addition to a report that we'll be writing, that there might be something shorter, but more pithy that could go to a broader audience. Okay. I just have one thought, which is if we're going to do this, then there might be an opportunity to solicit input from the broader community about what they would like to see this resource look like. As we represent who we represent, but there may be others out there who have other ideas. Yeah, I can see certainly a widely circulating such a draft document to get feedback from people. Or maybe put a website up, link to a website, comments are requested. Okay. All right. So that was a page that figured to be the easiest one. Let's now talk about, so we suggest at the end of this that some follow-up workshops or working groups be constructed. We suggest some particular points that might happen on and we'll certainly be asking the broader group if there are other things that feel should happen with. We have particular topics, but we haven't identified particular individuals for those topics. I totally agree with that. I mean, I would ask Eric, I don't know, and this reminds me of another meeting some of us read last week, how you could possibly get from this broad set of, like if we could get from this broad set of issues to agreement as to what the goals were, that's remarkable progress. It doesn't always happen. To actually get to who's going to implement what just seems to be unimaginable. Like how could you do so much think, like we don't know yet if this room is going to agree with the goals that haven't yet been described based on digesting what was said here. So how could we be at the point of action items and deliverables and, because we don't know what the outcome is yet, right? Otherwise the meeting is a charade because we didn't know. I just think it's inevitable that if we could actually get to agreement as to what needs to be done, that would be remarkable progress. And actually I'd like to say I think the process that Lisa I think suggested and I know really made happen, Lisa Brooks, of having these position papers or one or two page documents written in advance of the meeting was just incredibly helpful from my perspective in framing the meeting and taking us forward in a very useful way. So I think that by itself was a very useful starting point and output for this meeting. Okay, so we came up to, I think we came up to three high level messages although you'll see there are multiple different points that reminds me of writing a grant proposal with David who always wants three specific aims. Well we also have three primary statements here. Yeah, never four and never two, okay. I have to confess I had once a grant proposal that had eight specific aims and it did get funded. Okay, so the current system sets the stage for broad utility of the data. I think, I can't remember whether it was Francis or Eric or who's talked about, we're sort of victims of our own successor. We have done tremendous things with Eric, okay. And I think we as a genome community should be deeply proud of this. That said, we have more that we need to do. So much data can be found in DBGAP but it's efficient, inefficient to get the data to aggregate across studies. This ability to aggregate data limits efficient and effective use. The DBGAP's governance model lacks public input, transparency and clear consequences for misconduct. Large investment has been made in the production of experimental data that has not been matched by adequate investment in data aggregation and analysis and effective governments in effect resulting in a less than optimal stewardship of the national investment in genomic clinical research. Sorry, genomic clinical research. So that is our statement of the problem. And so in terms of what we recommend we call, and again this is high level, we'll get down to lower level later. We call for regulatory change and investment to encourage creation of entities that aggregate and harmonize available data and that support multiple levels of data access and querying of results. We talked about the idea, I think that was Wiley's term actually, of zones of aggregation. So non-centralized access to data aggregation, supporting innovation and diversity and approach to solving this problem. We talked about each dataset characterized in terms of access and use permission so that we can use different data differently as we should. Recommend tracking of users to provide accountability and the possibility of recourse for violating data use as part of that process, defining criteria for registered users, defining a registration process and establishing a user unique identifier to allow tracking. And notice the term we have, we're talking about certification before has now become registration and whether that's the right choice or not, it gives I think a better flavor for what we've been talking about but that's something that we should discuss. And then the third recommendation is the long one. Aggregation is critical, not only to basic research, of course, but also to translation, to drug development and to developing data processing and a knowledge base for clinical medicine. And so under that, we talked about the four different models to facilitate data access and analysis presented at the meeting. And there was consensus that these actually represent four different versions or layers of the same sort of thing based on bringing the data together, bringing the data together and having processes allowing us to access them. And so we endorsed open access when the data are appropriately consented and I should say this is again the small group and we'll need to decide whether we agree with this. Endors open access when data are properly consented, for example, with the personal genome project and believe we should provide clear guidance and support for such data in any new data aggregation system. Endors the streamlining of DBGAP data access process so that it becomes not a choke point. Standardized rules and procedures, methods for standardizing and communicating data use and user criteria, assuring consistency and rapidity and attention to principles of responsible governance. And then creating conditions that will allow the creation of multiple approaches to creating zones of aggregations and answers which would include regulatory and policy frameworks, funding incentives, pipelines for data processing and phenotype harmonization and encouraging innovation and diversity of methods. And finally, I think, endorsing registration of users with consequences for failure to comply with the relevant rules. So bringing together data in a regulatory environment that allows research to be informed. Some users might use this in the context of a research commons, others might use it in the context of an analysis server, providing opportunity to upload individual genomes and projects for annotation in a manner comparable to a knowledge base. We've already said that. Again, this is editing quickly on the fly, so apologies for that. Different rules for data access was already there. And then the one additional point, any entity creating such a zone of aggregation would have, be responsible to have appropriate governance structures to ensure stakeholder input, transparency and accountability. Okay, so those are three main messages. Should we go by those one by one? Ewan, you can't. You have a loud voice, Ewan. No, totally right. And in fact, this is the danger of presenting this piecemeal, which we'll see whether this was a good idea and maybe we'll keep going a little further. But we do have a point on that later on. That clearly is an issue that needs to be dealt with. And the notion, the possibility of doing this internationally would say dax across different countries, for example. I think it's a beautiful example of the advantage of some of what we're talking about. Maybe. Thank you very much. Yes, yes, totally right, totally right. And I think so, I think that's a really important point. And I think maybe what we were trying to recognize is that there are some procedures that can be developed, some steps that can be taken, some governance innovations that could be done all around DBAGAP now, but they're sort of developmental stages. I think your point is that there's lots of data out there and when we create the right data aggregation opportunity and ought to incorporate a broad range of data. Yeah, I think the way we describe this, we need to distinguish exactly the two things you were talking about, Ewan. One is we do have internal to the US audience instead of stakeholders we need to be dealing with. But then we need to put in the broader international context and that's absolutely critical. Mark? Ewan's point is a great one and I wasn't going to make it here, but it certainly is one of the things that plays into how we think about doing the questions with the GWAS data and identifiability because in fact, the majority of our meta and mega-analyses come from data not in the US and so that does play into who's in charge of making the rules. What I'm going to say is I wonder if, and maybe this goes without saying, but in part three, you sort of introduce both the sort of raison d'etre, the whole thing, as well as diving right into the technical details and I wonder if it's worth breaking those two out because it's such an important part of the background, all of the different use cases as it were in a technical sense that could be enabled by this activity and that that would be something that we want to highlight upfront before getting into then the technical models to facilitate the access to it. Reason being possibly that we've engaged a huge set of experts to think about the technical details but we may in fact, in line with what Chris was saying, the area where we might want a much broader community engagement is what are the possible uses to biologists, to pharma, to clinicians, to public education and other things that could be facilitated by such a utility and to engage a larger audience for that question but not so much for the technical part. Yeah, I totally agree with you and as I was reading this, I was thinking, what's worse than a document written by a committee? It's a document written by a committee in 35 minutes and so clearly I think that kind of restructuring is entirely appropriate. I think we want those bullet points at the start to be pithier than they currently are. We'll still, I think, want that information there but organized a bit differently, I totally agree. Actually, we had a... Yeah, just building a little bit on you and comments, I think that the vision for the research commons has gotten buried a little bit. Here, I think that the research commons vision needs to be brought out. Its key characteristics are that it's a tier of data released for general research use under a common set of usage restrictions that it can, the research commons can extend across multiple national and international databases and that registered users, the vision is that they can have cross database and cross national access to the commons. I think those are important key concepts. I don't know if everybody endorses that but I personally think they're good that if there is a consensus, should go into this document as a separate vision. So can you clarify only because I actually don't, if you go back, Mike, up to where number two is, the, what we were aiming for and the question is, is it, do people agree and if so, how do we craft the document? Is that, and this does come below, is that not to have like the, not to make as distinct and very separate entities, the research commons, the server, the free data accesses, as will come but see them as different data access criteria for, you know, ways of accessing the data and so maybe we can wait till we get to that but the question is, do we want to call out a particular slice of how that might work or describe the attributes of the systems we want to have but then could, you know, different users can interact with in different ways? You know, rather than highlighting one particular way of interacting, highlight the conditions under which people could do these different things. My personal feeling is that the commons is distinct enough that should be that it should be brought out as a specially named tier. So I think that this is where I'm not here. Rather, so rather, so, you know, rather as currently written, the model is very general, it supports many, many, supports hundreds or thousands of tiers. Why don't we get to where we'd like to call for. I think this is another example. A distinct one that has got to, is the default more or less open playing field that people can work on. So can I suggest we save that one again until the end, until we've seen the rest of this. See if the verbiage you see later on is comfortable from your perspective or insufficient and then we can talk about that further. Other? Yeah, one question and a comment. First of all, the use of the words governance, I fear is lumping too many things. And what I'm not reading from it so far is more what is the public engagement issue, the stuff that Wiley talked about yesterday. I think some of it's in there, but I fear just the term governance doesn't adequately address that. Yeah, we'll talk about that one more also and we would welcome a different term and this was an active area of discussion. And then my question is, and you say call for regulatory change. Could you drill down what regulatory change you're looking for? Mike, maybe you should go through the whole document because a lot of the questions are, if not adequately addressed, are further explored. Everybody comfortable with that? I was hoping we could maybe make this more bite sized but I think David is right. I think there's value in going through the whole thing and then we'll come back. Okay. So now, as I said, those were sort of the key overarching messages and then some additional ones, actually quite a few yet to come that in many cases relate very much to those initial ones. So additional messages. Where possible, data summaries should be available for public access. These might include genotype counts or legal frequencies and p-values. This should certainly be done for nonsensitive traits. One way to think about this is in the context of the kind of thing that you might fully expect to be in a journal article and why should that not then be available more generally. We support, strongly support NCBI's deployment of a general use platform to allow access and query of data. Useful to think of it as a pilot of broader efforts in which multiple parties would innovate in how to best manage, process, analyze the data and serve the results. Encourage international coordination of policy and requirements and global inventory of studies. So I sensed very strong support for what Steve and Laura were talking about. I think it's a great step, one of many we need to do but certainly a nice one to be able to point to as something can be accomplished right away. We talked about the creation of portals. Question. I have a white. Is that true? I am really sorry. I just wanted to point out in your statement about encouraging NCBI and the GRU thing, which is great, we're happy to hear that. But it's not really up to us. There's an advisory committee, the GWAS steering committee that makes the policy decisions that enable those things. Eric is a co-chair. So when any summary encouragement needs to go there as well. Okay. And certainly send the information to that appropriate group, Steve. Just one point on the previous thing about the p-values. I think we want the word comprehensive data sets in there to indicate that it's not just slices of small size, but we really want the whole thing because you never know for people like Heidi that are trying to do interpretation, what measures are important. Totally intended, yeah. Yeah, and basically what we were hoping to do was to turn this around from where can it to be done to where can it to be done. And it basically, the presupposition is in general it should be done and there will be some exceptions. And that was really what we were arguing for. I think that's really quite important. And I agree with you totally in what you said earlier, you and that this would be a near-term win that we should really go for. Okay, so then, so creation of portals that provide simple answers to common questions for non-geneticists, for any given disease, what are the genes associated with that disease at some threshold? For any given gene, what disease and traits are associated with it? For any given variant, what's the association to all available traits? And for any given genome sequence, what variants are seen with annotation with regard to frequencies and association? So, again, making data available to a broader class of users than just those of us who are actively working on this kind of stuff. Simple answers to common questions by geneticists. I mean, there's some use of the data by geneticists that right now are cumbersome and that should also be easy here, but should be easy to take all the GRM sequence samples and use them as a reference panel. But if it's incredibly hard, so why shouldn't a portal make it easy? I totally agree with that. I think we were focusing on these other groups because we typically feel we don't serve them well, but you're absolutely right. Okay, then we wanted to talk a little bit about siloing. Given the value of data across multiple disease areas, aggregating across multiple disease areas, segregating data by disease or institute should be avoided. Instead, data should be centrally aggregated and disease-specific portals could then be used to, as a useful way to capture and disseminate information. So aggregating the data, but making the information available in a disease-specific way as different groups would design. So avoiding fragmentation and yet still giving the advantage of the different looks. Mike, this is a small point, but I understand exactly what you mean, but I suspect that it might be misunderstood by many. I would simply sort of point out again the benefit that many other audiences don't perceive of aggregating data across large numbers of samples irrespective of. Sure, yeah, it was a misstatement to say it's specific to diseases. It's really, it's partly diseases, but it's more broadly just putting all the data together. I totally agree with that. Yeah. Okay. Ah, then mechanisms should be developed to ensure that both phenotype and genotype data from NIH studies be deposited in central databases in a way that's maximally useful. We should, however, note that there will be exceptional cases where such central deposition is not appropriate and robust governance will be responsible for identifying those cases. So there is a clear good here, but we don't want it to be suggested that it should be for all data. At the same time, we don't want the portion of data for which it's not appropriate to get in the way of it happening for the large majority of the studies. Okay. Okay, so given the value of datasets that we shared broadly, models should be developed to encourage broad data access when appropriate, providing advice on formats for broad consents. Acknowledge the need for stewardship components as part of a system that utilizes broad consent and acknowledge, as we said before, there'll be, continue to be data sets that require more specific consent with resulting limitations in data access and use. So sort of a forward-looking direction here. And then just a sort of a general guideline. We should be developing plans on two levels, vision and goals of where we want to go and also shorter-term steps or pilot projects to start the process. Okay, and here's a short point where Ewan's issue of international is alluded to so we make sure we don't forget it because it is very important to make sure that as we're having these discussions, we focus on international cooperation to make sure that we focus on underserved and underrepresented groups and to make sure that we focus on very rich data sources like large cohort longitudinal studies where the work we do can be particularly valuable in terms of the aggregation of the data. Okay. Then we wanted to talk, we had I thought a very interesting discussion of yesterday of the issue of engaging the public and we sort of phrased this as there is a social contract with people who give their data and those of us who do research to address questions with the potential to advance science and medicine. We thought it was important to focus on how we are doing, so focus on how the research we are doing can enhance science and medicine and talk about why it's important and in particular to give concrete examples of the value of something like DBGAP or more generally data aggregation that we should as someone suggested collect good information on adverse events, recognize that they are the exception and I really like the comment we've been sort of in a defensive crouch, it's time to push back with the public, the track record is very good, certainly the risk of what we're doing is not zero but the gains are important and vastly outweigh the harms that have occurred or potentially are likely to occur. And this issue. And Mike, I just want to add, I think it didn't quite get in, the social contract is not just with those who give the data but those who pay for it, the general public. Fair point. And then the difference between public and public perception of risks and actual risks, please. It's a general comment and it's people such as yourself and Molly and others who are going to look at the final document and wordsmith it. This is a personal opinion but to me right now, the science is much more important than the applications because often the applications aren't clear or in very early stages. And so when we talk of gains, much of the gains for example, are really huge scientific gains. And I just want to make sure that we titrate the expectation correctly. Too closely tied to clinical ends has its own sets of problems. So how you write this, I'm not going to give you any words, as a general principle maybe important to point out. I think that's really important in our Avinda and I think it really only underscores the importance of getting to this public engagement and acknowledging that there's a certain amount of research involved in how to do it right because we want to generate, we want to really explain, make more clear to people what the research process is, including how much time it takes and how uncertain the outcomes are. So I see this though, of course I agree with you, but I see this as an opportunity actually to right a wrong that's already been committed because the public reads endless hype articles from companies and the like that suggest once we sequence a genome, all good things will happen. And if we craft this message correctly by saying there is a research agenda and we can't yet interpret genomes, but we could build a knowledge base in which we could learn, we might be able to actually not hype, but actually take away this thing we keep reading about that someone sequences their genome and measures a bunch of chemicals and somehow saves their life, which there's no real evidence for. A page up, there's a statement about that our conclusitory statement is that we should develop a vision and some goals. I think it's disappointing to have a meeting like this and come out with a statement that says that we ought to have a meeting to develop vision and goals. Well, that's what it says up there. Richard Levin means that we've rephrased that that was intended to be poking at us as we're thinking about this, to be thinking at both levels. That we should be actively working at both of those levels, not that we need to develop some vision. I think we've got a vision or at least I like to believe in it. Let's walk out with a vision. Okay, then we agree. So I think that is really pertinent to Mark's point and this last point that Arivinda made that early in the document it should, I think very clearly delineate this clinical research boundary and the expectations and the false expectations if we need to put it there. But let's use a little bit of real estate to really carefully lay that out and not be shy about projecting sometime into the future. Overall, I think there's nothing not to like about this whole document, except for that it's not particularly futuristic. Okay, now I think that's a fair point. I think that I'll just drop back to the original defense. There's a limit to what you can do in a short period of time and getting this kind of input is very relevant. That was the point to say, there should be a vision in the document that we come to, not to say we should come up with one later. It just, there wasn't time to write it down in words, whatever it was. No, that's what I was trying to say, Richard, is that we are meant to be looking at both levels as we drop this document. Public engagement issue. I think it is good to have been there at the point of rectifying perceptions of risk and so on. But I think Wiley made this point yesterday and I think it might be worthwhile highlighting this again, that there's been a lot of conversation around it shouldn't be about the risks, it should be about the actual harms and so on. But if we look empirically at where the breakdowns and trust have been in kind of publics and science and so on and in particular with publics and biobanks, it's often not been around risks or harms, it's been around respect and public dialogue. I think that might be something we're highlighting. No, I think that's a really good point. There's, I just wanted to add one other risk, which is having bad data and data that's not reliable pose a tremendous risk to the public and they should recognize that. I've had at least half a dozen cases in the last year of incorrectly called molecular genetic testing that resulted, could have resulted in inappropriate surgery, including thyroidectomies on a seven-year-old and a five-year-old and that was prevented only by very careful evaluation of a report and if this data were properly aggregated and accessible through these portals, we'd prevent that. So I think we should stress that there is that risk to the public as well. So I just wanted to add to the comment about being prepared for the inevitable data leak and how we handle that both in terms of, it's something which given the scale we're doing will affect a lot of people pointing out what the risks to those people actually are likely to be, what the consequences will be for those who cause the data leak and basically being prepared for this in advance. I think that's a really important point and I think it speaks to one piece of the issue that Pearl brought up the need for sort of more laying out the different pieces of issues that we might include under governance. I believe if we use the right kind of model that includes involvement of all stakeholders and a very transparent approach that says here's our procedures, here's what we're doing, here's who was at the table and we can clearly identify all the stakeholders and here's what we came up with as reasonable procedures to protect the data, here's why we think they're reasonable. If one has done that work up front, it's much easier to accept that mistakes will happen, adverse events will happen. It's not that there wasn't an irresponsible process that was sort of asleep at the wheel, people were trying their best and of course they do have to try their best. They do have to provide strong justifications but I think there's a huge protection to the research process in bringing all stakeholders including members of the public to the table as you define what those reasonable procedures are. One particular set of participant groups that we need to think about are the multi-ethnic, the minority which traditionally in our genome research has been the last to have the research and here there's an opportunity for those populations to be knitted right into these comments but I fear like for example with the general research use data set I would be curious how many non-white populations are included there and my guess would be very few and therefore there may be data use limitations that are a little bit more strict that we need to consider but at the same time this may be the population where we have to be most concerned about the risk of a misuse that would get into the front page of the Wall Street Journal of New York Times so I think we have to be thinking about that population both for science, for notifying the public for communicating with the public and for how to bring them in early on. So I think engaging is a really important piece and I think where there is clear mistrust and that's evidenced by lack of participation and yes in D.B. Gap there's a relatively small number of minority samples. I think we have to accept that the steps in the process may be more incremental for some databases that there'll be more restrictions early on, more oversight perhaps, more involvement of the communities in the oversight but if we can create the environment with restrictions that enables trust to be built then there's a possibility of moving forward. And while a number of the things we've discussed like the research commons and like the shift from specific analyses to approved or registered investigators those are great ideas that I favor strongly. Those of us who work with cohorts have taken responsibility for making some decisions on behalf of our communities in depositing our data in D.B. Gap. So I think it's important as those before those changes are made that we engage the cohorts in dialogue and let them propagate the dialogue to their communities. I think the document may wanna comment on the return of results issue if only to move it off the table. I think as Wiley had highlighted a lot of folks who participate in this kind of research expect results back but at least from my perspective that's when the risks are entailed. I think the document speaking about data sitting in repositories, the risk is pretty close to zero. Once somebody decides to give a result back which may be wrong or unwanted or of uncertain clinical significance that's when risks are really entailed. So I think that's a whole can of worms that we probably don't wanna get into in the context of this document but you might wanna acknowledge it as such. And something about the structure. This goes back to I think an important point that Richard Gibbs just made. I think just the nature of our discussion is such that we should sort of delve into the three major points and issues. But I thought the more I hear our discussions agree that we need to spend really some real estate in explaining what the state is, what we're trying to accomplish and why and even a very general overall general landscape where we're trying to get into. This may be obvious even though there are many questions just with this audience and outside this may be much, much less so. So I think articulating why and what. In terms of writing a report for the NIH, we have the basis of these different white papers to start with. I think we're in a good position to take what's happened here, add to those documents and get to a reasonable point. I think the harder challenge is gonna be that one page paper and talking to a broader audience and certainly we will be actively talking with people about how we best take that forward. Richard? But Mike you might consider all these other parallel efforts and how you can at least point to them and point out complementarity. I think it's all to do with this mission of some statement out of here saying what the overall landscape is and stepping back from just taking the source documents that they generate. A bit of a different topic. I think it'd be important to also include some discussion of particularly Allah, the registration I guess instead of the other word. What is the individual researcher responsibility versus the central server versus the institution? I don't know what the answer is but I think I'm a bit fearful of institutions taking on more of this responsibility. And if they do, I think it just needs to be very specifically spelled out. So to be clear, are you thinking of this being part of the document that we all generate or is this something that is left to a committee that takes this further? I'm happy to be their answer. I wasn't sure. I think once you come down to specific models just to really walk it through and say what are you asking the institution to belly up for and take liability for versus even here when you talk about registration would that be a one-off investigator or would that be a sign off by the institution or by a central agency? So I think whatever final models you look at I think it would be a good exercise to just walk through who does what and who's responsible for what. Okay so you would argue that it should be in this document. I think they would just say it's an issue maybe not to go through those details. It sounds to me, yeah, what I was interpreting as St. Pearl is that actually being real clear on who's responsible for what is a necessity for an effective system and therefore as a system gets built that has to be dealt with very specifically. Is that? I agree, I think something general, promissory note. Yeah so if we're willing to consider a range from really detailed to a promissory note that that is something that is going to be actively dealt with by the next group. Surely one of those two is what we should do and which one is not as obvious to me. Eric? Comments and the surrounding discussion and also make it fit into this is what this meeting's been constructed is we gave ourself a hug in the beginning so we have 70,000 exomes. How can we better use them and how can we bring them together into a central data resource? I think the point needs to be made is indeed we're continuing to sequence more and more exomes and in fact we need to 70,000 is not enough particularly for many disease classes or many groups of people. So we continue to collect more data. We continue to do this in more and more variable settings, the clinical setting for example and the value of all that data will exponentially increase if we can bring it together into a single, homogenous, probably not a great, single data resource. And so I think it addresses that comment. It also pushes us more to the forward and it brings it to the clinical side. Yeah, I think I was hearing, it's certainly something I expressed yesterday at the end of the day. I've talked with a bunch of people who seem to be in the same place but we should actively talk about this. I think clearly we want to do things we can with the existing data but I think the bigger emphasis honestly should be on what comes in the future and that we should use what we've learned so far, what's happened so far, what things we need to do based on what's happened so far to help us plan for the future. When a process is starting, it's very difficult to plan so well for what's going to happen because you don't know what's gonna happen but now we have 70,000 exomes or something like that. We have a pretty good idea of at least have a lot of the issues and so if we don't plan now for the future, shame on us. Is something that appears in any organizational statement from climate change, geophysics, you name it, that's the phrase and it really has become something of a platitude. I would urge you to try and make it more specific either in saying what kinds of transparency have you not seen, what kinds of stakeholders have not been included, something so that the general reader or policymaker doesn't look at it and say, oh, it's one more of the same. I totally agree with you. I think all of us around the table totally agreed with you. I think this is just reflecting that we were truly doing this in the fly. That is meant to be expanded absolutely and it would include the public. But I think we can start by saying that the government structure available for DBGAP is lacking, very obviously lacking, no input from the public, data access committee solely operationalized by the feds and is not reporting out in any kind of consistent way accessible to the public what it's doing, et cetera. So I think we could start there. David. So I want to call something out just so that we can actually discuss it and hopefully move it forward, which is when I talk to geneticists about people who want to access data about what's wrong with DBGAP, their answer is it's cumbersome, it's slow, it's hard to get the data I want. In other words, it doesn't do a good enough job of getting the data out. The discussion that Wiley was just prompting and that I think some of the other LCP people in the room would say is that there are problems with DB, it's orthogonal, it's not that it should be slow, it should be streamlined, but that there are issues beyond just how well and quickly it does it, but with the governance model. And I think that what we want to take away, we want to talk about as a group is can we understand and embrace those so we don't end up, they could be misunderstood if we write the document wrong as one group saying speed it up and the other group saying slow it down and that we're in opposition. But that's not what's happening, but I'm worried that we misunderstood that what's actually I think being said and we tried to wordsmith some stuff in the five minutes available that there are some governance issues that for sustainability need to be addressed so it's a fully robust process and that if we really wanted to work as well as was said they have to be addressed, that's not saying slow it down, but I'm just worried that if we're not careful it'll be that usual perception of the geneticist saying go fast and the ethicist saying go slow which is not actually the case. Just following up on that point, I think definitely the LC community would not be interested in installing these processes on so I think that's what we're making, but perhaps another point that's worthwhile flagging in here and to get away from having yet another platitude of probably going on is to highlight just how non-trivial a problem this really is. I mean there's people around the world who've been hitting their head against the wall trying to figure out a biobank governance type of mechanism and the literature is growing on this but it still is a non-trivial issue and I think what's being proposed here essentially is in a way the mother of all biobanks so it compounds many of the problems and Laura said yesterday informed consent just can't handle these issues and I think most LC people would agree with that and this can't be done on the side of the desk this is a real problem and it's particular to do it in a way that doesn't add sort of inertia. No I totally agree and that was part of the reason for talk there's one point we had in there about going forward having some requirements in terms of when people get funded studies from the NIH if we're talking US here to build this into the process of their studies but that's only an aspect, Steve. I take a little bit of exception actually some great exception to the idea that DVGAP isn't governed well or doesn't have a lot of transparency I think echoing this point that it was built as a completely new model in trying to envision data access to what has historically been restricted and it might be imperfect and so something that's kind of acknowledging progress for progress is made and deficiencies is I think a better statement than in saying it isn't working. So I actually would, I want to be clear because I was one that made that statement about what I'm trying to say I actually think that the governance procedures around DVGAP were created by a wholly trustworthy group of people with very good intentions and have achieved wonderful things in terms of data sharing but I would also say that the ground rules that have informed how the governance procedures were created and by that I mean who makes decisions about how we interpret the user ability, the use access for data sets, how we make data access decisions, how we handle non-compliance problems and how we handle the let's call it the right of the public to know what's going on. All of those things have been designed in a way that is good work by good people but it's behind closed doors in a way that is not likely to inspire trust over time and I think that's a problem. So could we counter that maybe it was a version one of an attempt to try to get us quickly to a place in GWAP? No, it's just important. Look at the beginning of the document. I just think you guys, just go to the beginning of the document because it doesn't say it. Let me also, I appreciate your comment. I'm very happy with that interpretation. I think where I get a little bit concerned is that we say this is version one, we did it to get going and now we're going and we don't want to slow down so even though these governance issues are important and we're going to address them, we're not going to slow things down. I don't want to slow things down. I agree with that completely but if we don't bring attention to these issues and attention to the need for resources as well as attention, then we're going to keep saying, okay that's important but let's not slow things down and we're not going to address it. So I think it's really the red flag that I'm concerned about. Actually I would just add something that I thought was a useful analogy that I think Wiley made when we were having this very quick conversation and that was when you remodel a house, part of your responsibility is to bring things up to code and so that's what we're doing here. It's not that the process was a bad one, it's just that we can do a better one now and I think that's really what was being intended here. So I mean in that analogy there's framing of the problem and if you've got a plan of which you want the house to be which is what I'm hearing Richard say, talk about the future vision and what this is going for and I think we could make a construct, I don't know, I just reacted to, it lacks public health ability to transparency and it sounds like there was a failure in some way. We can craft that a little more carefully. Suggestion maybe to address that point is maybe to emphasize that this is uncharted territory. In terms of LC governance, this is really uncharted territory and so failure implies that there's a standard against that one hasn't met but the standards haven't been defined. The governance issues. So most ethics oversight is still based on oversight to research projects but with the whole move towards buyer banks we're dealing with platforms rather than projects and in many ways ethics oversight is still trying to grapple with that. You and I think Wiley's point is that these commons and I appreciate you agreeing is that they're going to be immortal in a way. The data will be in there forever. The purposes to which they will be applied are not known to us today and that maybe the transparency and the governance problem, the oversight is that we can't articulate everything we're going to do. You can't tell people what we're going to do a hundred years in the future but that's the uncharted part of this. Well it's been discussed but it hasn't been finalized and in fact I think many people would argue well there's a bit of scope that happens and I think the issue with the broad consents and there's now a number of people making this argument is that from how we originally conceived informed consent in research ethics and for very good reasons you know Nuremberg trials and all that sort of stuff that kind of idealized version of informed consent has been diluted and it's increasingly being diluted with nothing to back it up and as we go further down the road we're seeing this dilution happen more and more and now the question is okay well what can we bring in instead of this because we don't want to go back to where we started off where we don't have any meaningful protections anymore. Another comment and question. I think following up on this last discussion it's not just the broad consent it's really how to deal with the central resource itself in that you have a broad consent but if you truly don't understand what you're sending it into and how it's being used that I think is a lot of discomfort and I just think we need a better way of looking at that. My question is I still don't know what regulations you're talking about changing. You suggested I'd see it in the next slide but they either missed it or. It didn't. It was made of switch I'm afraid I think it wasn't in earlier version well why would you want to speak up on that? Okay I'm on to you on that one I did not see it. I don't know what each other wanted to ask. I think maybe we're talking about using the wrong word maybe the word should have been policy rather than regulatory in the sense that we certainly are talking about policy implications of structuring this aggregated this entity or this multiple entities for aggregating data they may not trigger changes in for example IRB regulation. I would suggest that as part of this then there is a place of at least engaging the regulators because we talked a little bit yesterday about potential hippo changes which would have a major impact. I think we have to watch what happens with the advance notice so again maybe just a watch this place. I just want to bring out in response to UN's comment something that Wiley said and I think is meant to be in the document which is these issues of governance and of stakeholder engagement and all these things are not actually limited to the consent and the IRB the idea which I think is somewhere in the document is that anybody in any entity are engaged in this kind of data aggregation analysis. There should be some framework in which there are opportunities for governance and transparency and accountability and stakeholder engagement not just at the level of who shook the patient's hand and took the sample not just a DB gap but if Nancy Cox in Chicago set up some portal in which it's a Research Commons or whatever it is they have some obligation to explain what the governance is and so I think I heard you say something like that's different than how we make it maximum useful but what I heard from Wiley was that actually as part of the idea we can create a situation in which there's an ongoing transparency engagement not as a onerous thing but as a facilitative thing. Catherine? One issue and the feds didn't decide that they wanted to set up all these DACs because we all had day jobs that didn't involve serving on a DAC but at least we got some ruling that basically said if it's a federal database the feds have to make the rules on who gets access to it. I hope that can change that would be really good but that may be something that's in either regulation or policy I'm not sure of which. Who's answering the question data repository like DB Gap or anything that we set up and that is the reason why it's only fed. There are ways that we have external experts in particular areas but the actual decision process is a federal decision. So yeah, so you know I think you're right Laura I think there is a law that states that and has been interpreted to me that that can only include federal employees but I think that's very different from creating a structure as you say that involves advisory input but also involves engagement with the public and to the extent that we do policy making processes that have to do with how the data sharing is structured how we categorize different data sets in terms of uses how we register users et cetera I think there's lots of opportunity for input beyond people who are federal employees and for trustworthiness of the process I think those opportunities should be taken advantage of. As we think about creating a sort of decentralized opportunity for data aggregation for accessing a zone of aggregation where you can use data clearly you have opportunities for multiple different institutions potentially to be involved you have to address Pearl's problem of who has what responsibilities it's a complex structure but it also creates more opportunity for involvement of stakeholders. Policy questions as we develop new models. Speaking of that one of the things I didn't mention that's been up here on the screen for whilst probably people have seen it is we did think that additional working groups were going to be required to provide more specific more granular advice on some of these issues we identified three certainly not meant to be the extent of this but certainly something talking about the dbGaP streamline efforts and maybe we've done enough there in terms of advice or maybe a committee should be talking and actually I think it's being actively talked about setting up zones of aggregation and then this investigator registration process so curious what people think about those ideas and whether there are other places people feel that such workshop or working group could be useful. We'll get what we mean by zone of aggregation so Mike can you go back to the thing that where we talk about the four models up higher in the document so the idea and again this is something we really should talk about is but again Lincoln may or may not agree is what we should come back to is there's one way of looking at this would say there's a research comment that's completely different than a central server is completely different from open public data and the other is to say what we really wanna come out of this with is not designing what one thing would be that someone's gonna go implement but what we heard but again others may not agree is that there's the need to bring data together and there's different levels of access which can range from some of the data's open and clear and that's a good thing and that could be available to anyone some of it again could be a research comments where you can come in and use it and on top of that or outside you could have some a central server they're not actually mutually exclusive they're different levels of access and so that's what was referred to as zones of aggregation was creating the conditions in which people could in a diverse and innovative way think about how to like that you could create a regulatory and funding and other policy environment where 10 different groups in this room could possibly figure out how they wanna bring together the data how they wanna create such a thing and we benefit from innovation as opposed to we try and decide which one is the right answer so here's a suggestion there are three classes different services can say right I'm a service I'm gonna be a cloud service based in Michigan I'm gonna be open to the mental health community and therefore I know that I'm gonna be a public data and produce data and therefore this is the service I'm gonna get it right and I'm gonna have agreements to download datasets X, Y, and Z to serve my community and then a certain type of service may do all, may handle all of those in my case but by saying there's multiple different types of that's not what's being said not that every one of them should do all the things we're trying to create a framework I think that you and is doing a very good job of channeling me in that where there is no there is no true conflict between what you and I are saying and David and what you are saying what via the zones of application it's just that I believe that you and agrees with me that the open access and the broad access tiers should be brought out as explicitly as particularly named zones of application which will then be uppermost in people's minds because those are the ones that are gonna get high use totally fair and I totally agree that I think there was strong agreement on the committee point and maybe we should take that up more generally because I think there's strong agreement across the group today that making use or making available those higher level aggregated results information is something we strongly support and that it should be the default rather than the require justification I'm not sure if I'm off base here but I have the impression from looking at the document that there hasn't been enough discussion about ongoing cohorts to existing cohorts versus future and to what extent will it be a prioritization because we don't have unlimited money and so is there a process for triaging and what the principles would be for deciding how much time and effort to expend on current existing cohorts and how much to really look to the future I totally agree we should come back to that point I'd like to finish this other one and then we'll come back to that but is there anybody who would argue against the notion of this results information sort of the standard is it should be available unless there is a clear reason why it shouldn't rather than sort of the opposite which has been the pattern so far which is not to say we as a group have the right to make this happen but we as a group can advocate that it should happen Is it fair to say that we're then agreeing on something like UN's three categories open, broad and restricted in a very simplistic view of this to keep things simple and that every sample should have a footprint in the open one even if it's highly redacted and then you would drill down with if you're registered or you're appropriately qualified or you're researching in a particular task to get deeper information in a sense you should be able to see there should be a footprint or a summary of everything at the very highest open level So I think my point earlier this morning was to achieve that we might have to do some re-computing across data to fill in the missing footprints and that's part of a back goal but we could describe in that context there are the community says some basic things that could be done descriptively for that dataset and it's available, right? Right, that's the community side the community No, I'm just saying we need both but it would be nice to have some standard things that we could maybe keep out See to me this is where standards make me nervous in the same way that other people like standards are not innovative they're a committee driven process of these common denominator I like the idea personally that there's opportunity for everyone to work on the data because it's available and people can work on it and there's the opportunity for individuals or groups to compute stuff on it and make all those results available so rather than a committee deciding what the pre-computes are people go compute things and make them available and then there's innovation in what gets computed because anybody can show up and compute things So when things are computed and published should they be remembered? That's an interesting question actually if part of using the system is you record the results and make like you could imagine the same way that you can go to certain websites and like whatever the last query was you get the answer one version of this could be you do some analysis and others can see the results I mean in a similar way in our experience in the UJA there's all the spit of the business dropping grace and columns which these people get obsessed I mean I understand that they're obsessed and right and as individuals instead they're like don't use this name and that seems very detailed and rather than capturing the process to do compute that you actually capture the output or the computation and you just store that as a first class database as long as we can then do something with it then that's good but frustratingly we can't do anything with the list and that would be a better story Okay Mark then Eric No and I think just continuing that point I think that's where there's a great opportunity if we agree that it is reasonable to make those sorts of summary statistics available that the diabetes group will do a they will agree on this is going to be the really bang up diabetes analysis where we use all 70,000 or 170,000 samples and then that can then be available to every clinical researcher and every pharma researcher who wants to look up their genes I mean this is where we really start to be enabling the things that we can't enable today which is where we sort of started so that's fantastic And not only that we agree to it but that we write a compelling enough argument that the powers that be agree that we're right Exactly Eric Mike the last discussion brings up this document's relatively silent on Mark's talk and I think it was also part of Steve's I forgot did you leave it out because of the app discussion or there's not a lot in here about centralized compute versus distributed computing and we spent a lot of time on it That's the central set of it We're just serving up data I think you do get back to that part of the document Central serving, data serving But it's not exclusive, right? I mean you can have both, right? And that's the point is that we don't want a community which decides what is the right answer Right, so we want and I'm with David on this one, is that? So personally for me, I find it difficult to use for a zone of aggregation to meet both the kind of classes of data and the different ways that we access those data for me, though I'm totally happy, I suspect the others on the committee are happy, although I should ask I'm disaggregating those two issues of the type of data and then the things you might do with it But the key thing though that it's not about classes of data and it's about accessing data it's actually meant to imply and maybe we should be clear about this it is creating the situation in which entities individual labs or centers or whatever can themselves get all the data and offer some offering which again could have different attributes but the idea of zone of aggregation which someone said and it just got into the document maybe a bad choice is not just that you're accessing something someone else had and have some criteria but you yourself, your entity can go get all the data and offer something and that right now is only a couple of people can do that now and that's actually the key to this whole thing if we can create the condition in which lots of people can get the data and do stuff on it we will have one and all the details don't matter and including the governance model and the transparency and accountability and all these things that should come through the document maybe zones of accountability is a terrible phrase that's what we're trying to enable and so I think Eric can answer your question I think the answer is yes we think both these things are really good things and it's a question of how we make this happen have the potential to be really good things I apologize if this was on your slides already somewhere I couldn't see all of them here you talk about these three different categories but I think coming back to on Felsenfeld's talk at the beginning I think we should also think about perhaps encouraging groups to aim for the opener categories where possible and one thing that stood out to me was how a very large fraction of all the data are coming from some very very large projects where you need to influence very relatively few people to have a big impact on the more greater release of data and NIH probably has a disproportionate impact on those larger groups as well and so that's something which I think might be thought about yeah I think clearly it's something that the smaller committee endorsed and I had the impression although we should actively talk about this that the broader group felt as well that a push towards when appropriate broad access or as broad an access is consistent with active conduct is something we should be aiming for and that will then allow us to do all the rest of these things we're talking about and it gets to what Bob was asking about earlier in terms of how much should we be forward-looking and how much should we be talking about where we are with current what you can think of as legacy data and we should get back to that one I would just say with a special emphasis on the very large groups because that's a small number of people you need to consult with to make these decisions and that accounts for a very large fraction of the data so shall we get back to that? So the issue certainly we've spent some time talking about each I expressed my own opinion personal opinion not of any other group at the end of the day yesterday that I think for places where we have clear wins and without too much difficulty too much expense too much treasure of one sort or another on legacy data we should absolutely take advantage of those and I think what we talked about a little bit ago in terms of the summary statistics is a perfect example of oh gosh how could we not do that? That's such a clear, very valuable thing it takes some policy work but hopefully that can be dealt with my personal view is on things like that we should be actively working on what we can do with legacy data but for the most part my personal feeling is let's invest in the much larger set of data that we're gonna be having very soon because even though 70,000 sounds like a lot it's really a very, very, very small fraction in terms of what we expect to have in the not distant term and certainly the term in which we're gonna take some time getting these things going so that would be one person's view I'm curious if yours Bob I guess I was just gonna say that if I look at expending money on trying to extract with great difficulty from an existing cohort material versus spending that same money on trying to improve the portals so that people can gain access who are not the small in crowd users I would vote for the latter I mean that's what for me that would be a better use of the money that one could for example think of some sort of a triage method where you'd look at the existing cohorts and decide how much money was spent on the phenotyping what was the quality of the phenotyping and what has there been value added by a harmonization effort which of course adds a lot and that then therefore may include more of the existing cohorts as being valuable for going after but that this needs to be done I think in a pretty hard-nosed green eye shade approach Yeah, one thing that I felt very strongly that didn't get into this document just for lack of time is on the data harmonization work that's clearly a hard problem it's clearly very important when data harmonization has been done by careful, thoughtful people we need to capture that I mean the notion that Leslie Lange I can't see Leslie right now and had to go through this process essentially twice on largely overlapping largely the same data that we just shouldn't allow that and for one thing Leslie is my friend I don't want her to have to do it twice but the question of beyond capturing what people are already doing how much do we want to invest going backwards I think is really a good question and one of the nice things about the cohorts is the cohorts you're seeing them again and so one can think of how do you take measures of data harmonization going forward and Phoenix or other things like that have a real place there I think Yeah I'd like to just say Bob's right and it's worth looking at that in sort of a cost benefit and I would just give one example which is for example the Framingham data set which always comes up as a flagship cohort study but actually has a slightly different process for getting access than the other studies and so one question one could ask is if you pick a couple of big heavy hitters like that ask the question what fraction of that study would potentially fall into the general research use category because that's the ones you can and then are there other issues with that study which would not allow it to be and sort of an easily equivalent member of a commons and then have that make a decision about you so you push to change that or you decide to exclude that from any research commons. No it's a very fair question. Chris do you want to come in? I agree, I agree with what you're saying. One problem that comes to mind related to Framingham Jackson some other cohorts with the shift to investigate a registration some of those cohorts have required IRB approval of the project for which data will be used and that's an obstacle we're gonna have to figure out how to address. That seems like an incredibly fundamental roadblock I just want to point, no, no, no, I think no, I mean like the, so if what we're saying here everyone was singing Kumbaya we want to get to a place where people can aggregate the data and be able to work on it without having to on a like oh I had an idea now I need to go ask permission because it's a new project, right? So we should identify, like we don't want to leave this room having spent 24 hours doing this only to discover that some of the most valuable data sets can't possibly participate. Is that the case? Like you know I mean we should identify that because otherwise this whole thing is happy talk. I'll just say that the reason I brought that up there is an inventory which was done we have done some work on identifying where the whole study is general research use we could report which fractions of existing studies are potentially general research use and which ones have additional stumbling blocks like this. Let me ask you specifically, that's available information. Jim and I work together and let me ask a question though because I think sometimes these terms for those of us who are not experts just get confusing. Could you envision going to the Jackson Community Advisory Board which I visited with myself and other such things and saying here's a use, here's a project. There are a set of diseases that are high priority diseases for which Jackson has great data. We want to create a server or an environment or whatever we call it, a zone of aggregation, terrible phrase in which lots of people can come learn about genetics in African American, diabetes in African Americans but we're not going to ask every time for them to have to write a permission when they want to be certain. Could they go for that as the project? Could the project be what we're talking about? It could be, I think what we're going to have to do is present the shift from project approval to investigator approval to the community and show them why that's a more secure, better model which I believe it is and get them to decide to lift the IRB requirement for project-based analysis. Yeah, I think there's an incredibly important point and this is something that Debbie, I think commented on earlier, others as well that this community engagement as we're dealing with this process, who are the different people who need to hear from us that this is happening to help us persuade them that this is a good thing we talked about IRBs, this would be a very relevant context for that, Steve Rich. Yeah, I think that one of the benefits of some of these large projects we've been involved with, whether it be CARE, the SHARE projects, the Exome Sequencing Project, is that there's been a movement to the sharing and I know that for MESA, right now we have all the GWAS data in DDGAP, we're putting all the phenotype data in DDGAP. These things are ongoing, the number of individuals with general research use is large, so I really don't see that as going to be a hurdle and there may be some cohorts because of identifiability issues in the past that have to be addressed, but from that standpoint, addressing David's concern, I think we've made a long push to get this going and I don't see it being a major hurdle, in fact. But I think we need to keep actively pushing because those data are not as readily accessible as some subsets of them perhaps could be, I think is the point. So, Pearl Road and then Eric. Can I just comment on that? I think as that opportunity for engagement is an extraordinarily important opportunity and I think it's really important to make full use of those kinds of efforts, not just to get a sign off, which was the perhaps reason that the conversation was precipitated, but get engagement in what the rules of the road are, what constant fair processes, how we justify what we're doing because I think the more we have those kinds of conversations with communities, representatives of cohorts, managers of cohorts, the more we're building together a kind of rules of the road that everybody considers here. I think there is a lot of room to look at the IRB space on this. I think in many places the IRB is overused. Even by NIH, we have certain institutes who do not allow us to call it not human subjects research and still want an IRB approval, so we have some internal education to do, as well as many of these advocacy or patient-centered collections want to have some review and they use the IRB as a surrogate. So if you get rid of the IRB, you gotta talk them into not having everybody and his brother having a separate and different review process. And then the final comment I'd make is the community risk, which right now we've pretty much been looking at individual risk, which IRBs are more comfortable cogitating on. But I do think we've never been very good at the community risk issues and as some of this research goes forward, I think just to keep that on your back plate, there might be a point where the IRB or others say, no, we need a different oversight risk. In a word, no. That's an institutional issue. And we, yeah, no. Really no. So if I heard correctly, I think some of the discussion that we had in the planning group about this registration system may not have been captured or maybe I missed it, but the idea, I don't think we really pulled this out with the group here, but the idea was that the registration system of users would be some sort of centralized thing, meaning that, presumably at NIH or some entity, such that when you were registered, it's not a certification, it's sort of a unique identifier, that if you were approved to do some sort of thing, it would actually cascade out to whoever these people offered the data for. So first of all, you wouldn't have to go get an approved, if you were allowed to use a data set and Gonzalo's version let you do one thing and David Houser's version let you do another thing, you could go use both, because once you were approved as a data set and similarly that system, but there's another meaning to certification, which sometimes I think I'm hearing it sometimes, I'm not sure I'm hearing it, which is something that had been proposed a while ago, which is rather than having the fact that you're just willing to certify you, which is awkward in cases of people, is the idea of sort of licensure, that you would actually have, like you would get a medical license, you would get a license to be a genetic researcher, get licensed at different levels, and then your exposure and that would have to be through a professional society or something like that, but then that could be interesting. That's quite different from sort of piggybacking on this. So we explicitly, I'm not sure everyone. I'll have to go through that requirement, but that being said, I do see that it is a time now to actually have the discussion again, even back with the Framingham group, this is a decision that was reached by the IRB at Boston University by NIH and PUU working together, so it's not like one person decided this was a community thing, and similarly with some of the other cohorts, I think the cohorts that have the biggest problem are those where they actually have a consent statement that says, you know, use of your data will be only after an IRB approval has been registered, and there are very limited numbers of those cohorts where that exists, but the issue of privacy and of identifiability has to be addressed in the conversation. It can't be sort of what we don't, we think the IRB requirement is a really cumbersome thing, and we'd like to take that away, but we're not gonna address that other issue, which is the reason why it's there in the first place. No, it clearly needs to be a full, what would be ideal would be a full revisiting of this at a particular point in time, what do we think is possible, because those are incredibly valuable data sets, but it's not easy to get to them in comparison to some of the things we're talking about here, and it shouldn't be easy to get to everything, but surely, or perhaps at least, there's a subset for which it could be made a lot easier, you know, Catherine? So I think the idea of engaging the public is wonderful, but I think the public has a wide variety of knowledge of understanding. I tell you, I deal with researchers all the time, I deal with IRBs all the time, and even they don't understand controlled access versus open access. Anybody who's been on a DAC with me knows that the Russes that say, I want the public data, go back to get a precise definition that this is controlled access, and a lot of IRBs don't understand it either. I've talked to them. Once they understand the process, it's different. So I think we are sometimes our worst enemies in not using very precise language of how we are protecting patients, and I think we need to be careful about that. I wanted to make a suggestion which is only slightly facetious, but actually mostly serious, and that is that going to the Jackson Heart Study or to the Framingham or whatever, and going to the governing board and explaining what this change would mean and why it's important, might be something very good for the great communicator to do, and I'm not talking about Ronald Reagan. I think this would be an excellent thing for Francis to do. I'm serious. I mean, if he can go on Colbert, he can do this. And I actually think that he would do it. He would do it in three different places. And I also think this would be an excellent opportunity for us on a more national level to explain what are we doing here? Why is this of benefit to the public? It should be videotaped. I think that's also a great idea. I just want to sort of link this back to the earlier conversation about governance and roadblocks. And so I think there's a way of looking at this where kind of appropriate involvement of community boards or publics in governance can actually enable data sharing rather than restricted. In the example of if you have original consents that simply do not allow for sharing, and if you stick to kind of the letter of the law and you really view that as the bill and end all, you're going to really struggle. But if it's a more dynamic governance involvement, you do then have that opportunity to go back to a particular cohort or community and say, you know, here are the changes that we think this is really valuable. We think this would be a benefit to you. So I think in this way, you know, properly designed governance is enabling rather than restricted. Just one other point, which is that I think if there isn't a hardwired consent issue that's going to be very, very difficult to deal with, just triage it. They're out. Don't worry about it. It's just not worth. I just don't think it's worth the effort if it's at the expense of what we're going to do going forward. Yeah, I think that was something that there was general consensus on also that we're satisfied with solutions that cover a large fraction of the data, that it's worth it to us if it is otherwise useful that we need to do that. Pearl and Jim. Yeah, I just wanted to point out that, you know, while IRBs are certainly spending a lot of time on genetics, you ain't the only, you know, trick in town here. And, you know, the system I oversee, we have 7,000 research protocols going on. They're not all genetics. And I'm wondering the ones who also have huge databases, a lot of the epidemiologic research, it might be worthwhile reaching out to some of those because we can't have a system for genetics that is not transferable to systems for non-genetics, big data or tissue banks. As soon as we exceptionalize, everybody does everything wrong. Jim. Just a quick comment on the engagement with the community. In the original DB GAP design, actually in response to some of the participant communities, there is a requirement on every study on the DB GAP site, we show publicly all the awarded research requests for the data because the participants said they wanted to see what was being done on their data and convince themselves good things were being done. There's also a requirement for those people are supposed to report the publications that they published on that data, so it could also be listed there as well. That part I'd say is not fully implemented because you have to follow back with the people that published and stuff. But sort of as part of trying to really open this up and make it more public, I would suggest we put a little bit of focus on showing people the benefit of what they're not just promising the benefit, but actually seeing, oh look, I can read all the stuff and sort of advertise that more widely and enforce the site harder that if you publish on this data, you post it there so people can see that results are happening. No, Jim, I actually think that's a great point and it just illustrates how DB GAP offers an opportunity for some developmental work because I think what is now being done is a good starting point. As you say, publishing or listing publications would be a good addition. I think there is a tremendous opportunity for creating more easily accessible information to the lay public, more easily accessible to the lay public about the research process, about what DB GAP enables about maybe even summaries of the papers that are forthcoming, et cetera. So I think there's a platform there that could be enriched greatly. Okay, so Debbie and then George and then Eric, I want to call on you to see if you have any comments. So I was just going to suggest, I don't want to cut this off, but time is marching and I guess I just wanted a couple of take home points or action items. And I guess, I don't know. I mean, maybe we could just make a list of action items. I mean, I don't know if others feel the same, but. I think it's a great idea. Just one last thing and maybe a transition to action items since you did recognize me and I haven't said anything yet. So on your list of suggested working groups to provide more granular advice and following up on what Jim and Wiley were saying about public outreach. It may seem like it's too easy and too obvious compared to all the breathtakingly difficult problems we've put here to put on that list as the fourth bullet, the open object possibility that you had on the earlier part of it. But I actually think that it's not trivial and it does require a working group. And if it is considered relatively easy, you could be tasked with such things as future cohorts and public outreach and public education as well as, because I think this is where the public is going to hit DB Gap first. It could be a honeypot attracting the public to DB Gap in a way that they can understand it and even validate in their own minds whether the aggregation is working or not because you can have the same data set in an aggregated and unaggregated form. It looks like David has a comment and then let's go back to Action Items. So I agree with George and I think in particular there is a sea change needed and I think it's being prompted where a few years ago the idea of open data access was greeted by many as sort of a bad thing, even if you know, however you did it, clearly there's a move from many some quarters, you know, to sort of push this, no not push it, embrace the idea that you know, people may want to do this. If we were to come up with an agreed upon set of standards of how you should do it and what conditions and also in my mind what some of the concerns are of like you'll buy us who can sign up, you know and how we address that to make sure we don't, you know, limit it to certain people and certain views of what they want to share. That'll be a very positive outcome, you know what I mean, because we could then embrace it and say it's a good thing as opposed to it somehow being suspect. Okay, for Action Items, I would personally argue for the issue of the results being made available and working hard to make that happen. I think that's a potentially short term, potentially big win that will help us in just lots of different ways. What is the actual action item for that though? How do we do that? Talking about the data summaries. So who needs to officially make that decision and how does this run? Do you mean how do we change policy or how do we decide as a group there's an action item? We can decide as a group that an action item would be to write something or convey something that addresses this point of view, how it's then taken by the government to take follow. Okay, but as long as there is a concrete action item because I think changing policy isn't an action item. Right, let's go. So I think there's clear sense that moving forward with these different ideas for zones of aggregation or whatever you want to call them in the sense that going forward, defining these, giving people the opportunity to propose these kinds of processes is logical. What do people think about that one? If changing policy is not an action item, there are a lot of GWAS people in this room. I think if we all went back, made sure in our consortia, we made all those results publicly available, it would be a great step forward. I don't think there was any disagreement that we should be doing this, but we can't unilaterally do it. What we can do is we can make the statement, the strong statement from this group of rather impressive group of people that this is something really important. I assume, David, your point was we can't do it, we can just say that it should be done. Yeah, but we can make up a subgroup, makes a position statement that everybody gets behind, that would be great. Goes and meets with that committee, whatever. I would add that for the people in this room that have published GWAS papers, that that data could be assembled, that the p-values and sent to the archives and positioned for distribution. It's not all there. We would love to do that. We haven't done it because there wasn't, we're doing it ourselves in some cases. But absolutely, it should be done. I think Steve's point is there isn't an open way to do this now, but you can do it through controlled access because it's considered the same as the sequence data, primarily. But if you're, so your recommendation could be that you wanna change the policy on that. That we could facilitate the process by getting things done. And you could start the process by just rolling the data in under controlled access and change the policy with a letter to the. I wanted to second that, what Jim just said. I mean, that's something that we could recommend this group now that it could all go under controlled access, cause it simply isn't. And that, and while, and then when there's these new commons are achieving more streamlined access to the controlled access, basically you'll have the results right there for use. And then it can roll out from there. I think that's good. But I think that what we take is just one step further. Yes. And so there's some committees that are saying that. Eric Green is the co-chair of this committee. So he can speak for the committee. That's very useful tonight. David. So I'm a little. So I'm a little concerned that the intention of this document is not really being clearly communicated in that, it was saying, what are the action items? So the idea, if we go back to the top, right, was to write a report, write an article for a publication that would generate a lot of attention to these issues and to push hard, and again, exactly how we push hard, it would seem a little presumptuous of us to tell the government how we're gonna do all these things. But in very concrete ways, to A, you know, streamline DBGAP and put in place the things that are listed here. B, move aggressively towards creating the conditions in which we could encourage lots of groups to get all the data and be able to do these things, which are, some of them are enumerated. This is written in 35 minutes, so it has to be wordsmithed and it would be, presumably, the report and the article would be passed around to this group, but it would call for a set of very specific actions. You know what I'm saying? That would result in a lot of, you know, state that vision, which we could, I don't think we just have a committee to do it. I think it's discussed here. We just have to write it down and try and, and wordsmith it so everyone can buy into it. But an aggressive pose that says we would like to see that, you know, in a year, you know, some things have happened and that we're moving towards a future, we're in three to five years, we live in a world where you can do all these things. And it was not just say, let's have a committee, we'll go home, but actually, we're not gonna be able to do it ourselves, but we can call for it very precisely. And we're not gonna be able to do it ourselves, but we can call for it very precisely. And that means documents have to be written, people in this room have to respond, and then we have to push NIH and work with NIH to implement these things. I agree with that. Eric? Would it have more impact if it was written by a journal editor who may be here? Otherwise it's by, you know, we set the agenda, we gave the talks, now we write the paper, it sounds like Rupert Murdoch. And, you know, we should, it seems like it should be a more objective piece. We're much nicer and not nearly as wealthy. Barbara or Magdalena, do you have an opinion on that? I think for an editor, that would be pretty difficult thing to do because as an editor I'm not actually entirely neutral in this discussion, because whatever the community here comes to decide, this is something that I would have to help implement. So I think for an editor, I can imagine an editor being co-signatory, but an author in and of itself by him or herself would be rather difficult, I think. I would also say, and certainly as an editor, I would kibitz, and I also strongly agree that any kind of document you put out or to have in it, what do you feel needs to be accomplished within a year? What do you feel needs to be accomplished in three to five years? Otherwise, it's very fuzzy and feel good and it's not going to accomplish things. However, I too could not see being an author on it. Okay, thank you. So, Eric, have we done what you wanted? Actually, obviously I've bookended this meeting, so there's a middle piece that I haven't heard, staff will tell me how great that's what I hear. So, but I listened very carefully to this. I mean, Mike, let me put you on the spot since you're standing there. Of the document and of the, where you think we are now, we being this effort, this workshop, how much of it is softening the ground through changes, policy changes, philosophical coming to a consensus? What percentage of it is that and what percentage is it ready to build something new? Because part of, I mean, there's a practical issue here. I mean, both things need to be done, right? You need to sort of be prepared to start building whatever infrastructure is going to be to actually, I mean, these words you use, commons or, I mean, that's gonna require building something and that's gonna require resources. But before you can get there, from what I've heard, especially this last discussion, before you can get there, you have to sort of probably write a position paper and have sort of some philosophical changes and policy changes. Unless you'll give us the money to do it without that. Well, that's, no, actually I think you would argue, David, I think you would argue that would be a bad way to do it. You're not prepared to do this in a rational way. It will fail and you don't want to get money that's gonna fail. So I think we're in really good position to do these things. I mean, some of these things require policy changes. That's a little bit more complicated. But a lot of these things, we have the ability to do right now. I mean, the stuff, the NCBI guys were talking about, that's something that can happen right away. The kind of sharing of the summary information. That could happen right away if we got a policy change. And we have, that doesn't necessarily require a ton of money. No, but actually, so get it. Let me finish, yeah. But then we've got a lot of different things that are methods development, database creation, going forward, that will require some money. But if you talk about the money that's required to do these things in comparison to the overall investment we all have made in sequence and GWAS, so my question to you is what was described here, what I just heard the last two hours, what fraction of it was the former, what fraction of the latter? Is there a lot of, that's to be discussed about what needs to be built in terms of money? I think a minority is policy change and most of it is building things. And I think that to be clear, we did say, and I'm not sure if you were here for it, we discussed the phrase, it may be poor stewardship of the national investment in genetics to invest so much money in data collection and then be so penny, so pinching money on how to make it useful. So I think that it's a small amount of policy change and it's a lot of investment in the broad community creating a diverse set of such things, not one, but a bunch of them so we can really have competition innovation in how you make this data useful, but we need first to create the conditions in which everyone can get is allowed to bring the data together and to do it. And just let me add onto that, it's not like we have to find people, obviously we're always looking for people, but we have a lot of people who can address these questions right now, who are primed to address some of these questions right now. And so, it's not something that would have to take forever. It could be a relatively short term thing. So I saw Gonzalo Hand and Nancy. I just wanted to say, there is a few about it, based on, when people are trying to answer a specific scientific question, it will be pretty threatening to jump through, jump through the obstacles that are there now, and build pioneer, how do you bring these disparate data sets together, what can you gain out of that? Agreed, Nancy? Yeah, my point was just that, in the absence of some clarification around these policy issues, when we've talked about these things before from a scientific perspective, it was not clear that they could be done. That is, I can request access to all the data in DbGaP to do this, and wanna serve the results publicly, but it wasn't clear that that would be allowed. And so it's essential to cover the policy issues in order to get to the things that everybody wants to see. So, and that makes total sense to me. I'm trying to get it a very pragmatic, and my role here is very practical, because I'm gonna be the fundraiser here. And so, and there's not a lot of places to go for those funds, and the timing, you gotta strike when the iron is hot. Usually cold these days, so I got a weather, I see something that's hot, I gotta, do you think the scientific plan that could be articulated to somebody who's gonna write a check, which means it's gonna be a high bar, you're gonna have to be very convincing, scientific. Could that be ready in a concrete form, is it one month, is it two months, is it six months? Yes, it was said here, it can all be said. No, no, but I need a document. I know that. So the happy fact, Eric is there are lots of position papers that Lisa, in courage, be written. Write all those. We've got a good starting point, I think we can really push it. And so based on the discussion, those could be all the same. If you tell us, if you tell us a month is way better than two months, we'll do it. Okay, it's very helpful. All right, I think we're running out. Three o'clock. I make one other point, although I made it to some time, I just wanna make one other point, because I think I even heard the discussion here a little bit of confusion. Whoever's gonna be writing this, whether it's this, or converting any of these reports into sign that's concrete, I do think a glossary and a clear description of some of these phrases that you get very comfortable with these phrases here, but believe me, the people we have to raise the money from are not gonna be familiar with some of these things. They may barely know the word cohort. They're not gonna know, when you say of research commons or analysis, you've gotta define this. And so please be sure to do this, it's gonna be very important. It is now three o'clock. I thank everybody for coming for all the hard work that went into this. Thanks particularly to Lisa and Adam and the other NIH folks who helped organize this and the meetings adjourned, thanks very much. You will be hearing from us.