 For now, thank you What I'd like to do in this second third session is try to Bring a high-level view to the computing requirements, and we've talked about several approaches in details about and The types of analysis that we'd like to see the kinds of data We want to ensure are available and the challenges in getting to them So rather than go through a lot of slides. I just wanted to go to a high level and talk about process There's a lot of questions in the white paper that were raised by the writing group Encourage y'all to look at that. I don't think we're going to find the answers to all of that here but was going to try to pull for some consensus and Raise some points at least how we're seeing this strategically at the archives and this is work that I've talked about a lot with Paul and others So I wanted to spend a few minutes talking about what I think are the shape of the data Because that will drive what's available and then talk about the Framework for getting access to it so conceptually we can think of all of this data now as a giant virtual two-dimensional matrix We think of everyone around the world that's been typed or sequences We could think of them as columns the nice feature about genomic data is that it gives us an organizing principle in this case rows or Positions in the genome we could rank this out for three billion bases and then at the bottom We could add space for alternate loci which are going to be Variably present and different individuals in unplay sequence or rearrangements that might be specific to cancer cells tumors whatever exotica come up Each cell in this little Cartoon is actually a complex object and I think this is where a lot of the engineering Conversation will happen because what you're trying to save in this area or is a space that contains sequence information Genotypes and haplotypes and here I'm just blowing it up and so at that position of a person in a genome We still have questions about the underlying sequence evidence and there's There's a conversation I think Lincoln raised the question yesterday about would it be available to see reeds and yes We want to keep the reeds We want to keep the qualities for the reeds in some lossy yet informative way Information about alignment quality mate pairs the production tags who they came from what instrumentation etc That's all been done right now in a CS array or cram format. This is BAM That has been optimized for storage at EBI or NCBI others can think of you know ways to Compress the data locally as well In the most extreme condensed space it might be considered genotypes where you have a simple case where there's just a Couple of alleles or differences from reference You might want to also keep information about ancestral states alleles accounts properties of population genetics all stored down here In this other third space, which we're just starting to work with it as a community is the haplotype information I mean, I think we have imputation down and we can talk about imputed genotypes But we're going to be quickly moving to a space where we can start looking at physical as well as statistically Phased haplotypes of longer length. They're going to have frequencies They're probably going to have phenotypes and a compound mutational since that's different from the phenotypes or the variants Functional consequences expressed individually so collectively this is the space worth we think we're working in over the next three to five X number of years All of this needs to be remembered connected to publications in clinical assertions or aggregate data sources such as DB snip DB Whatever so you have individual measures that are in the controlled access Elements here, and then you would have aggregate summaries The row the columns in this a virtual global matrix are The natural unit of collection. These are individuals that are being sequenced and they typically are housed Or in a repository that are partitioned by archives It might be a central archive such as DB gap or e g e g v it might be a cloud provision service It might be hospital health care services, you know So we can imagine that what we want to reach through and access globally won't be housed in any one place or even in a few places There's going to be stewardship responsibilities across the world There's going to be some sort of minimal ethical oversight We talk about DAX and I'm using that in the shorthand sense of oversight But you know we've I think we've all agreed that in the special case of pure public access that we can set maybe isolate and Permit free access to data. That's that's in play, but it's probably not the global solution So where they're especially for health care and other systems There's going to be legal requirements of countries and oversight groups that we're trying to harmonize We know that there's national legislation that governs some of this But you know having a body that can organize Conversations at this level is important if we really want to think about queries that slice through Horizontally through what's the possible answers you could get through the world in Reality the data that will come to us are going to look more like this Not every study is going to examine every base in the genome Not every study will look at placements on alternate clinical haplotypes or remember unplaced data So when you do queries you're going to have to expect that in some places that the Space will be richly covered and in other places it might be sparse So the data right now are being transported in BAM format to the archives and in VCF format for the genotypes We don't have robust standards for phenotype data for Epigenetic environmental data, or I'd say even for haplotypes right now And that's an area that will profit to try to standardize because tools work best when they know what the data is going to look like So stepping back from the shape of the data and looking at process flow We've talked about several Designs for access to the data and computing requirements really file follow from requirements In specifications which haven't been articulated So I can't really talk much to what's the right solution because I don't think we have a handle on the problem or the right Data flow other than maybe all of the above which you and others have proposed So what I talked to I think maybe in a kind of a time order Is thinking about what we have right now? We have the approved user working through a DAX system that has access to Repository data, and I'm saying gated in the sense that it's controlled. It's not public access We've talked about how this can be partitioned by consent into Categories that maybe have exceptional requirements and care because they're sensitive in nature And then we have a typical study that has consent that's either very broad use or maybe narrow use Individual disease or non-profit use only and sometimes studies will span this area part of the individuals Will consent to broad use and another subset of the individuals in the study elect for a more restrictive consent and so when we talk about Creating access for broad use data. We're talking about just those individuals that have given broad reuse consent And it might not be the full complement of a study Something that NCBI and NIH are doing to try to expedite access to aggregate data For this broad use and try to create a lightweight system in the absence of showing all p-values And I'd like to say I agree with Mark Daly's comments last night We really need to think about the risk question about getting access to the data For aggregates but in absence of that the Cata or what we're calling the combined aggregate data analysis set This is Laura. Did I get that right? This is going to be a new expedited release study at NIH that we're packaging the broad use consent and for Rapid access with one request you can get a year's access to all submitted GWAS results all p-values In and that's still going to go through the DAX system. I'm losing my pointer here, but it will hopefully be a faster process We are also looking at data compression services to help remove the burden of the transaction and downloading data when you have Approval to use broad use samples thousands tens of thousands of samples So this is going to be BAM slicing and compression services that make the footprint smaller for what you want to lower We're also as talked about trying to create centralized review for large projects and try to minimize the impact of the stat governance layer What we need I think in in this broad use category is that if we can Recalculate the aggregate data just on those individuals. We will have a harmonized set of individuals and their aggregate measures that could be broadly used And they will be lightly We're trying to minimize the barriers to uptake and if there was agreement or discussion on how to do the Recalculations to call variants to do the p-values for GWAS studies on a select number of traits That could be done this summer and we could actually have data as early as fall For the GRU studies where we have a both a an aggregate data set and individual level data that have been matched and Are harmonized and interinterpretable so you have the individuals that go into the values And that's something we could do today without a lot of engineering discussion. It's just will and planning The other thing we've talked about now moving a little bit farther out is the idea of the certified user This is what you and introduced last night is removing or Reversing the approval so a person that's been duly certified has pre approved for access to this broad use Research commons area again. This could come with services such as slicing and compression and optimization So it's still a lightweight transaction Just like there we're planning to do over here The advantages that you don't have to go through the DAC process on a on a basis and this might open it up to the serendipitous discoveries A little bit farther out because this requires a bit more our Planning and an architecture are cloud based storage and solution models again here We're thinking that certified users would have access to the broad use data This is the research commons any cloud the advantage here is that it has Compute access next to the data so you could bring up local data Compute on it in addition to what's available in the research commons and do it locally rather than Dragging all the data down to your local environment Our optimization proposal here would be thinking that you don't want to store everything That's ever been published on a broad use sample in a cloud because storage is an expensive component You pay to push data up and you pay per month for it to sit there So it's better to just store the data and cash what's been used or requested by users That could be done within a process that then grows a cache of useful data as defined by the community requests And all we need to do that is to change or modify some of our policies on security so that we can create Single footprints of data that are reusable by lots of people rather than these personalized encryption copies that we're doing right now And that's just a nuance and NIH policy, but I think it's We were saying is that there are equivalent protections that could allow us Efficiencies of scale and cloud solutions and then finally there's a discussion that we've talked about which is Separating the analysis services and abstracting them into a cloud layer This could be operated in a third-party cloud or analysis services could be extensions of the repositories themselves It could be a virtualized website or someplace you go to so I'm just drawing it kind of in the abstract But what's going to happen is that users are going to make requests to these services where tools could run So this is a platform environment on the back end They're going to be Having transactions and access to data if the data are in the cloud You know maybe these two clouds are the same thing then this dotted line is simply reaching across the firewall If this cloud is sitting at a service, maybe this dotted line here is actually reaching across an interior firewall And you're just reading off the local disks But what we're missing in all of this and I think there was some discussion last night and that thoughts about How do we just construct the payload for the request? It's going to have to have properties of the query properties about the requesting user and properties about the data itself And there's that logic of how do you square all three of those together to say? Do I have a well-formed request with an answer that I have permission to ask in the data has permission to give an answer to So if we solve that problem, we have an analysis function Finally, there might be a subset of that analysis function that is safe enough that we could say it's public You don't need to be a certified user So now you could have analysis services that students can run that postdocs can run that people that are junior to the Trader and public and educational systems can actually interact with the data to so Deciding what tools and what services and what functions are in this category versus a certified user category is a risk Calculation there needs to be a body like this or somebody that's going to do the decision making so again from computing What I've drawn here are a lot of arrows In time. This is done fast I think compute and cloud could be done second or Certified users possibly done second depending on the governance issues and then finally this could follow as long as we have Standard platforms and the tools are running in a platform environment Requirements are needed for all of this to talk about costs or timelines. This is I think more conceptual So final slide. I think the elements that we would need for a general platform for computing on data are five-fold right now I think we need standards for security that are going to permit many people to access source copies of the data and move away From the personalized encryption copy I think there's many technical possibilities here Don has mentioned several And it's just a conversation with engineering if there's a will to understand that for some slight elevation and risk of having a central encrypted copy or a Virtualization of access you can enjoy a lot of benefits and savings on space because you only have to keep a few copies of the data I mentioned before the data presentation standards right now There is a organic method for how they evolve. It's been well-crafted in the leading projects like thousand genomes and TCGA I think that you know either these projects move into the challenges of haplotypes phenotypes environmental Representation or other groups will but until their standards It's really hard to get tools to work as you don't know what to expect So I think we're halfway there in the problem and that's an area for maybe some investment by funding agencies The messaging structures I've mentioned if we're going to talk about these global distributed analysis and queries then this is the atm Analogy, you know, why do atm's work because the engineers got together 15 30 years ago and they came up with a standard encrypted payload in the talk That's how the machines talk to each other, you know And not everyone can use it you're in the system if you do it's part of what defines the platform We need to have that conversation to realize David's dream of, you know, a vigorous ecosystem And I think that's doable, but I think there's a lot of details in that because of the flexible nature of consent It's an unbounded problem in a way International coordination for policy and requirements. This isn't a computing issue, but the requirements directly Governing computing. So I think that you want a international body that can have conversations to harmonize policy and requirements to the degree possible If we really want to imagine query systems that can reach across countries and aggregate data They have to follow the same principles ATMs again work because internationally we're all using the same messaging system within visa within Mastercard within pick your flavor Finally this point of special attention for the research common studies if we really want to get aggregate data that is well matched to the individual level data and commensurate Then we might need to do some immediate recalculation of the data calling the variants Computing the p-values whatever we want so that we have a harmonized set and I'm going to finish with that saying that's something We could do now we have a framework for distributing it It's just taking the will of the group to say that this is important And this is going to be helpful to the community for the work involved because in some ways it's a redo But we think it's an important redo so that we have controlled access data with all of the properties the way we do for Thousand genomes and public access samples and so with that I'll stop and just I think open this up for discussion Thanks. So my first question is that how many subjects you have in that broad use data? Laura do you know how many studies are in GRU? It's something like a 19 studies in The first pilot And so that's I want to say tens of thousands But don't hold me to the number because I haven't actually seen it Sorry, and is that genotype or exome data? It would be a mixture it would be anything that's general research use is the way that the list is being constructed is by the consent category Things that are exome now are expected to probably turn into full genome in another couple of years It would also include some studies or just a ray-based data So so in order to answer your question We would need to know what is the phenotype with associated with those ten thousand or how much samples, right? Yes, the other one which I was very intrigued about is your diagram and how the different users have been Or use cases really have been told through in that that context and since I'm from industry I'm especially interested in how that this all access policy and everything this structure would work for Industrial researchers so right now the approved user model is industrial research permitted I mean if there's no exclusion for that people from industry can make application to access the data in the system What I'm imagining is and so I did not draw the line But the approved user could certain could put their data in an isolated system I think it's permissible to do it in a cloud if it's configured correctly I I'm looking to Laura because I don't know we've actually Solved or answered that question It depends okay You can certainly put it behind a firewall at your own institution But also it will be very important to really understand how it works because we also do team science instead of individual scientists Right, so how that would work because it's not an approved user, right? It's multiple people working on a project or multiple projects within one company So how is that over so right now companies are the agents of agreement for doing an approved users and so Every organization or company would have to have an individual Agreement in place thing to use the data Then it could be pulled they could all access it and Compute on it, but essentially everyone's covered by a use agreement saying, you know that they're agreeing to the terms of use I think the ambition discuss this further. Yeah, yeah I'm just saying I think the ambition is that we would imagine there are centralized compute environments That support international collaboration Increasingly that's where science is going and that's where you get the power So it's just getting the policy conversations in place You know to make sure that this can work and then below that getting the engineering in place so that we can you know ensure everyone has You know efficient access to the request system. I know that in some countries Requesting data as an approved user has been difficult nuances of the era system and whether your institution is registered But that could be solved, you know, and then the certified user could be solved. I saw questions over here Richard. Yeah, Steve if you do all your developments through summer into fall And at that point, how does your model overlap with what mark presented for the central server model? What are the key differences and I guess the applications layer is an obvious one, but apart from that I think that the key difference here is that worth we're talking about the data Actually the data that would be served in a pre-calculated way The things that we're going to be distributing now and that we saw we show on browsers could be done And as a as a special data set that is keyed to just the Research commons data. So essentially if research commons is what we're calling broad use We would want to make sure that we have aggregate data and displays that are matched to those Samples so if someone did a calculation they'd get the same answer Because right now you're not guaranteed to get it because you're missing 20% of it So it would be in a sense of data content difference with overlapping functionality apart from the application Yes And it's probably restricted in its in in the content space meaning that we're not talking about redoing all analysis But I think you would want to call the variance and get their support scores and possibly P values to select traits not all traits I think that's the conversation to have of what would be in space for doing the calculation But that could all be pre-prepared and I also think I mean there's a clear question of you know Do we want to because of the technical complexity of these things? I mean do we do we want to have? Does it not make sense to have multiple instantiations that have different goals or values? Not because I think this is a good model, but I wouldn't want to have just one These are all meant to be plural Yes, I so so not so this is yes, and then I guess it's a question of So when you say that I mean you're saying this seems very much what you've built now You mean you're saying that you would imagine instantiating Entirely other models similar to this outside of I don't think I'm concerned that things that are the same are being we're trying to figure out What's different between things are the same so what I'm seeing here Maybe I've got it wrong so you're trying to create what I saw on your slides was Policies regulation standards that would make it possible to have these kind of interactions. Yes, you might be responsible right now for a big data store You might yourself calculate some things, but you're providing a framework in which Research commons could be created by others. Yes, they could have an analysis server or not So what I'm saying like I think it's the same it's just exactly who would do what isn't exactly clear But I'm not hearing that it's that different. I'm hearing this is a framework in which such things could be done That's right. I'm trying to say there's a we're missing a few pieces of infrastructure to let the people talk exactly problem I don't think it's actually different. I think it's the same I think we could you know break out what what exactly these pieces are and I think you're pretty much right about it So one is a policy framework for getting at a common data set That's the general separating out the general research use and a streamlined access mode to get to those There's a second step which is Providing the pre-computed summary data We ourselves would not necessarily do that Some group would do it and provide it or there could be two or three groups that do it different ways and provide it But you could access that and it's keyed to that data set which solves this sort of impedance mismatch That's there now and Then there's also a few api tools here. You can get it all And then there's some tools to give you vertical slices based on genome coordinates through it We're not offering at this point sort of rationalized phenotypes But if there was the will to pick some phenotypes and recompute associations That could be done again not necessarily by us but and deposited back again But none of that is sort of on-demand analysis tools The other piece that Steve is saying which is a little looking a little further out is Now moving the compute environment out a little bit with this idea of sort of a commonly encrypted file Some technologies to make it accessible to a cloud environment and the security there But that's still that's sort of Platform as a service. It's still not software as a service. There's other people would be building software is on top of that and then there's the issue with the software as a service layer is Control access to that software who can see the results of that and that's kind of his little model in the corner That's even further out there. I'd say is is how you decide that Right, you know, I'm just To get to all of the above is all I'm trying to say is you and his point and people made last night that if we want to imagine That all of these have a role, you know in in trying because they're going to serve different user communities To get to all of the above. There's just some dotted line work. We have to do and policy around Making standards so that the tools have a place to compute and that the queries are interpretable, you know It's not some but I'd like to re-emphasize the timeline here So that the timeline is starting in July a step could be taken Which is to have some general research use data available Streamline access and a group such as this one doing some of this deciding what the appropriate summary computes are doing them Making them available and even if it's not the greatest number of individuals the ultimate number you'd want It's your it's basically your step forward. It's a toe in the water to doing is that the 50% They are kind of and we can we can summarize what will be in the general research group and send that This is your instantiation of a particular set of Ways of doing this, right? It's for instance. You've decided how you want to recall variance How you want to do processing that and I think it's important to allow We're not deciding how to then I don't understand what what are you saying? We're trying to say a 50% Win might be can we package the world's data? Okay for general research use in a way that you can in one stroke get access to individual data and the Useful pre-computes however the community defines them and then that's available We don't have the tools or the platform, but what we could do now is make the data collected Not have to do a hundred million different requests through different committees. Can we do that? And is that an incremental win these guys have said multiple? I just think ton is in this room in all rooms people hear things that weren't said they've said multiple times We might not compute it ourselves They just said it would be computed, right? So it could be some pre-computes and whatever the right way to get the computer It could be one out could be you guys doing them could be outsourcing them could be four different versions And someone has to agree which one's the best There could be but but there would be something that could be used to query and access the data I think that we're actually in violent agreement here. Yes Maybe the conceptual way to think about it is right now data are packaged by studies and studies are submitted We're talking about inverting that and say we're going to package this by consent Globally and say these are the people that you can do anything Responsible with that there's no restrictive consent. That's an inversion of the packaging It might require international coordination if we could get it, you know So that in one stroke we could replicate the request to all You know parties of this consortium and you get approval through one decision that would be even more awesome, you know I will hope You and you've had your hand up a couple times So well, so I I do think that the place to concentrate on is this broad use certified use of the broad use zone I think we've got to leave the the narrow use people Old school my is my own thing because I think if we start trying to protocolize the use process It's just going to drive us bananas And I would go back internationally. I don't think that we would have one international back I think that's too much to ask of the communities but what I do think is a totally feasible scenario is this acknowledgement of of reciprocation of reciprocity of each other's certification processes and so broad use stack a would Would it would? Honor the the certification that comes from broad use that be way and beyond different countries So effectively you have to apply to your country's Broad use stack. Mm-hmm And then you're then you're done, but but globally that we could also have a Inventory of the individuals that you have access to broad use that they're that we could Certainly certainly certainly the studies can be flagged as well as their broad use or not and I know that the Well, not just the studies, but the individuals and the variance I mean, I don't I'm it's just slightly beyond me to think about whether it should go to the individual level of the studies And the welcome trust that work in this effectively. It's a broad use that they've only turned down I think two people in the history of applications to the welcome trust that So So, yeah, that's the that's the that would be the place to start Right and the reason I'm pushing for individuals is that if we're going to grow this List to the right and add new sources of data through hospital systems or whatever We need to think about the process for bringing new people into broad use Yeah, so I think it's exactly the way to think about it, right? It's not all the studies that exist now It's what's coming the fact that it's going to grow If protected tenfold for the next few years a year, right? And so the problem isn't the narrow people that are there now, but rather Making sure that going forward that consent Conformed to the broad use because if they're all narrow use then this is gonna be really problematic Again, I think we're in agreement. I don't think that yeah, I'm just details, and I'm not Eric Steve I like what I hear, but I want to make sure it's not residuals from the bar last night. So you're saying I didn't drink Are you saying that basically by July we're gonna have the ability to have certified users No, and then the ability to access a centralized data set. No This would be ready so what by July I think we could have through the current system We could have access to a standard set of data, but you're still gonna have to go through a gazillion approvals No one one so that'll be a single NIH wide DAC for this data set And I'm just saying it will have both GRU pre-computed P values everything that's known and But the raw data the sequences the genotypes the haplotypes That's what I'm saying We could make July or this fall an ambition to package that Coordinate it with all the repositories that want to be in the world and then work out a policy or think about how do we keep Extending it. It's also it's also the phenotypes for the phenotypes for the broad use Lincoln This all sounds great July sounds wonderful In just in terms of the practice are so you know that all that the DACs for each of these data these studies are going to are Willing and able to relinquish their Approval to a broad use. Yes, that's been discussed in this has been in planning for several months internally at NIH Well, that's great news Just repeat what you just said Yes line for several months. This has been in conversation and within NIH the DACs Figuring out who's in that list just so if I don't haven't seen it Yeah, but the permission is not only with the DACs right the permission over the individual studies That's what I talked about too. Yes, so individual studies over the next several months I'm gonna allow the single NIH DAC to they have been talked. They've already those conversations have happened And I presume that this is basically for existing data Yes, so there it could be that in the next set of data that are going to be collected More data this will get sort of negotiated ahead of time so that it's going to be part of the registration system So when you submit a study to DBA gap you can say yes or no you want to be part of this And that's being engineered now to Mike in the front and then mark in the back up and so there's a single DAC exists or the plan for creating this It has been staffed Yes So so a little bit of the security issues around this Is the goal to prototype this when you move to a cloud or if you move to a cloud Would the goal be to prototype this on like an EC2 system like do this over at Amazon and then potentially migrate this towards your own cloud computing environment that NCBI Because because I heard you say that there were issues associated with cost of you know growing and shrinking and just you know So for right now, we've put thousand genomes data Into the AWS cloud the Amazon cloud for the community to use It's it's meant to seed and become human genetics data that's consented for open use It didn't have their security problems around it. So that was attractive We could try to see if there's community uptake just to do the computation, you know, and then we can look at What is the right security? There is another working group. I think it was mentioned Vivian, but not see and Don Prusar on trying to look at Protections and encryption standards. I don't know if it'll be at NIH. I don't know if it'll be at NCBI That's an open question. There's a lot of moving parts in the cost framework Irvinda, I don't know whether maybe this is morning in the enthusiasm is not coming through but I would say go for it You know meaning Meaning if there are I think there'll be things to be learned. Maybe there'll be impediments Maybe there are some challenges. We don't recognize technical and regulatory and otherwise. I mean July is you know can be patient, you know, and and just go for it And it may be very useful to see what the specific experience and uses you can always add other pre-computed So so I think it's part of this there's a communication plan And so we're mentioning this in a lot of detail now because you're the target audience I think almost every one of you would be people we would want engaged and using this and kind of understand that this is a Different centralized approach and we're going to have to Again workout processes like how do we add new studies in and and that's still in the conceptual planning stage So it's not completely in whole cloth Delivered yet, but it's coming right now Again, I want to echo Everybody's enthusiasm for this and I think this is great and and I imagine that the other piece of this will be really really important is communicating with institutions that at the certification process right because I can now imagine Once this happens the next time I want to submit something to DB gap Stanford will say well your consent never said that it would go into the broad DAC that was just created last week Right because the consents are two years old even though there the consents did say what it needed to do to go it into DB gap two years ago So I think there's this continuing sort of issue with the institutions That are ever more reticent to certify Into DB gap precisely because they feel like they don't know what they're certifying sometimes and so I think this is awesome I'm like so for this because of all the experience that we and others have had and I just think It's like it's important to communicate to the institutions and say look this is okay Right, you're just certifying it but because of the consent like nobody's gonna now go out and you know The government's not gonna turn around and sue you or whatever right because I feel that that's that's really where a lot of the Impediment is going to come from not so much what's already in there that you can repackage But the next time now you want to submit then there there's going to be you know a hindrance there They'll say you know your consent now aside from being 15 pages long to include all the Information for freedom of information now are gonna have to add all this other layer and going forward and stuff like that I think I think it's an initiative think about Debbie I Kind of agree with Carlos because they back down and change their minds all the time But you know it might be worthwhile just to get together Some of the people who are certifying From some of the major places that you've gotten data from just to get them on board even in a phone conference that this is a change That will occur. It's you know what the meaning of it is and how important this is and even they'll see the benefit because many of them are just single people at even if lots of data is being Certified to go into db gaps So I think it is important to have a discussion with them about the benefits of this and how It's it's not a big change Yeah, I think that this is great And what I would like to do is to have you give us something that we can do a preemptive strike So we can take it to our You know our our VPs of research whoever it is that signs off on this and tell them this is what's happened It's all fine and the next time we put something in it's just gonna be fine because I don't want them to be Suddenly saying why didn't you tell us this was happening and and that might raise more problems So I just like to do a preemptive strike at my institution so that you'll make it easier in the future Yeah So all of that again as Steve mentioned will is part of the talk the thinking that we have in our communications plans the materials that We will need to make available And who we will need to contact so when this is actually publicly announced That will be available This is again just Steve wanted to and we talked about this in advance to give you all a preview because it was directly Relevant to what you were going to be talking about to let you know instead of you being surprised in two weeks when you hear about it because It is so on point with what we're doing and I do We had not talked about having a conference call with some of individual people that were doing the certifying I we've talked about and I'd been thinking about going to the IRB meeting in December and having this as you know Part of the other general updates that we hope to have by then about things going on With GWAS and with genomic data sharing more broadly, but certainly that's a great idea and would be easy to do So I will take that back to our group The other thing is if if if you know that there are big users You could see see them on the emails that you send to the university So if they have questions, they know that these people have been in conversations about this and could provide advice too right and that's something I think that that actually would come from a different pathway and in the sense of working with the ICs and having our IC Program folks and and people reach out to their big users across the different disciplines and different institutes portfolios NIH central can't really do that for everyone So I can help facilitate making sure that happens for genome investigators But the other ICs it would be great for them to do it as well Pro last comment. I think another way we could potentially get this out. Laura is Primer, which is the kind of mothership for IRB's public responsibility medicine research Is always looking for webinars to put on and then those also get archives So that might I think if this is big enough where it might be a good opportunity As it comes down the pike to do something like that as well I did one last week, but we couldn't talk about it So but we can possibly I think the timing is something we'll need to think about It would also be useful if you could prepare some brief slide sets that we could present to our own cohorts Because to the extent that we can they can hear it from us first. They're gonna like that better. Okay. I think it's a great idea Chris about getting back to the conversation yesterday about the Homer et al and and what what can be like posted on the web and not are the Research use Requirements including that you cannot post the results that you actually are that are actually going to be provided on the web I mean what and maybe this is some way that we could help address this issue about about secondary posting or not of the results So what's imagined right now doesn't change any of the terms of use. It's just packaging the data So it's simpler to get so they're still in this particular GRU repackaging there's still the same terms of use that you're not going to identify you're not going to Redistribute etc. I think where we want to talk about Risk and what people can see is in this conversation about Browsers and analysis and kind of what can we show and I think it's helpful to keep those separate so that we can try to Fast-track Investigator access to the data in a lightweight fast way and then have the conversation in parallel We need a couple parallel conversations here one is on the risks and harms and how do we Adjudicate that and and try to think about you know what what our cohorts comfortable with seeing and we've got different Groups deciding different things. Maybe we can leverage some of that experience and have some you know cohort invested Participant discussion. I don't know what the we need to move that We have this idea about putting computation into cloud environments and security and how do we Take the next steps in that I think we're learning with public data, but we're going to need to expand that we have this idea of Messaging and distributed analysis and and so if that's restricted to just The broad use data like you and said that does simplify part of the triangulation on who can do what so that's the certified user Who's you know, I think Laura mentioned this last night who's going to do the credentialing who's going to kind of own the process How do we Stand that up and then you know have the international conversations so that we have peers around the world I think those are all parallel tracks. Yeah, we're on my So you'll have the data set in in July But it will be a while before you establish a mechanism for user certification and for a period This would still be based on a project specific data access one project Specific to access to the approved user channel will get you to All of the data here and so that would be this package set the Cata and the broad use But but but that would that could be a very broadly worded Project. Yes, like I want to analyze everything. Yes, I think so I just wanted to reiterate that that's that we might be blending Initiatives here. So what will happen in July is aggregate data from the 19 existing sets Which have appropriate consent and data use limitations will go into a new data set the Cata data set that you can make a request through existing procedures Just say I know my I want to look at Aggregate data for the studies in Cata to do XYZ that will go to a DAC It will be reviewed and approved exactly according to all of our processes as we do them now and Then what we are thinking about for the future Based on this meeting and based on, you know, other things that we have heard is how can we then take this further? That would include all of the outreach to institutions to cohorts to everyone to think about How to bring more data sets into? What we have how to make this kind of access possible for individual level data as well if they're approved for general research use But those are all future steps. What we have right now is this first iterative or Incremental access which is big and important and we're looking at it for a period of six months and how we're doing it And then we'll come back and and reassess Yeah, anything more about it. You have to go to this DAC and But at least the existence of a variant at that point of interest would that be possible or yeah, so variants Can be do now. I mean that they just go to DB sniff or DB var e gva And then there they can be listed to study if you want P values the significant snips the important ones. I think that's the conversation about risk because you know Rare p values not non significant p values might be one thing but rare variants might have a different point of risk And that's what David was saying last night. Yeah, and I don't think we've settled on that yet The question about what's the threshold for yeah a Being able to post all the p values and then there's another question about rare variants Right From this process All about Creating a kind of matter project, which is which is going to be accessible Which we will have all these projects that have I don't I wouldn't I would really be positive about us also Not forgetting about Right, it's it's orthogonal, but but very important. Yes, right. Okay. Okay, so maybe we should go on Steve. That was great Thanks very much