 Hi. I'm Pearl O'Rourke, and I am not a geneticist and not an IT person, so you can imagine how much help I was to this model. But I'm very happy to answer any questions. My task was to assume this model, made good sense, and I love my group, so yes it does, and to look at it through the lens of ELSI. I think it's a good exercise that maybe can also be used for some of the other models. So this, in my simplistic way, is the model. You got stuff coming from the outside, data collected pursuant to informed consent, it's coded, no HIP identifiers, it goes into the central analysis server in a de-identified form. You then have people on the outside who request that the central analysis server do an analysis and it spits out the analysis without individual level data. So even if that's wrong, for the rest of my talk, assume it's right, okay? And I think that's what all that cloud stuff was talking about. So what I decided is that the more I ask these fools questions, like I don't understand this, I don't understand that, this seemed very much like DbGaP to me. And so I'm using, since we all know DbGaP to some degree, I'm using that as an example. I don't know if this analysis server is DbGaP light or heavy, but just comparing some things. The informed consent form for source material, yes, yes for both. Limitations of use within the system or by recipient investigators will be limited by the informed consent form, yes for both. Neither one has identifiable data, but where a big difference is DbGaP will disclose individual data. Not identified, but individual, whereas this system would not. So being in the LCIRB world, that sounds really good. There are other issues, which I don't think are as obvious. The quote, local certification. That's local certifications required for putting anything in a DbGaP and I'll use that as an example in a second. For the central server, I would presume, yes, is the answer there. Would it be the same or not? I'm not sure. Review of uses. We have all the DAX with the DbGaP. It's not quite clear yet what system would be in place for this. Would it just, people would just ask for any analysis, or would there be some limitation of types of analyses? And I put up here, just because I had to put it somewhere, return of research results within DbGaP is noted as being very unlikely. I think in this system is even more unlikely, which is, for the LCIs, very good. So with that as a background, the vulnerable points from a kind of ethical and oversight perspective are four. The server itself, collection of data, the analyses, and then what you give back to the requesting folks. The central analysis server, I would submit, is a research repository of deidentified data the same as DbGaP. I would also suggest it's time for NIH to come clean and admit that this is an IRB protocol. DbGaP is not IRB approved. The local institutions take all the liability and all the heat, and I think that's unfair. So time to see the light go the right way. Make this an IRB approved protocol. It would be IRB approved in virtually any other institution. The IRB protocol would include business rules for all of the things that you see up here. Okay? In terms of LC-OID issues for primary data source, the data will come from tissue collected under an IRB approved protocol and informed consent form under the auspices of an IRB. This is what for DbGaP, here with a reference here, from an ethical standpoint, informed consent process and documents should make it clear that participants' DNA will undergo GWAS and that genotype and phenotype data will be shared for research purposes through the repository. This translates to this in our informed consent form. We put hours into where to put the comma and the semicolons. Reality, most people don't remember signing it. Most people don't remember they're in research and they certainly don't understand what's there. So when we're talking about what the public needs to know, we got a huge wake-up call here. But that's what we require now. The other issue for primary data source, I assume there would be some kind of a certification that the data going into the server was appropriate. And again here I think DbGaP is a wonderful place to look. Institutional certification, you'll only accept GWAS data into the repository after receiving appropriate certification by the institutional officials, not by the IRB. That's really important. It's your institution signing off. And when you look at what they're signing off to, it's so many things I had to put on two slides. First, that the data submission is consistent with all applicable laws and regulations, as well as all institutional policies. From our own institution, we have changed this and we are not taking all state laws and being responsible for those. So we only say only Massachusetts law and federal laws. I'll probably go to jail after saying that. Secondly, the appropriate research uses of the data and the uses that are specifically excluded by the consent documents are delineated. I have no idea what that five is there. I must have slipped. The identities of research participants will not be disclosed. Then here's where the IRB or a privacy board, remember privacy boards are HIPAA constructs, have reviewed and verified one that the data submitted, the submission of data to the repository and subsequent sharing are consistent with the informed consent of study participants from whom the data was obtained. Consistent with, we as an institution have changed this, we say not inconsistent with. Well, but to be honest, we think that gives us more wiggle room. To say consistent with, you have to have a positive statement that sharing is okay and broad use or whatever is okay. But for these consent forms where before DNA was discovered, and we have a lot of those, as long as it doesn't say only this one investigator will have it, we felt not inconsistent with allowed more. The investigators plan for de-identifying data is consistent with the standards in the policy, which is the HIPAA. It's considered the risks to individuals, their families and groups or populations, community risk associated with the data submitted to the NIH GWAS repository. We have also changed this one and we say we will only assume the risk for the submission. We will not take any risk for downstream use because we have no control over it. Yet this is what you're asking the local institutions to do and that everything is consistent with, that's the common rule, the federal law of the mothership of IRBs. So I think the server has to look at how is the data going to get in, what's the informed consent process and what is going to be the institutional requirement. The central analysis server, in terms of the analyses, to my read it's de-identified data, hence it is not human subjects research. You do not need an IRB action. If you're going to come back and say genetic information is identifiable, the game will change. I think this is a responsibility, the central server has to have some level of review as you're doing larger and larger data sets, community risk. We do not do very well at community risk. But I think this would be one of the business rules that I think both DBGAP, well DBGAP does it through the DAX but any other central site would have to do. Return of analyses to requesters, it's a de-identified data therefore it is not a human subject. Don't need an IRB action. I did have to put this in again. Return of research results, this is from the DBGAP, the website with FAQs. Although it says it's not likely to happen and you got to be really careful that there are a lot of concerns about returning research results, bottom line it says it is possible. Again, what I would highly recommend is as we go forward, close this door. But it is there. My very final comments and I'm happy to take any technical IT questions about the model. First of all, I think there are a lot of lessons from DBGAP we can learn not only for this model but for other models, both what has worked and what has not worked. Secondly, I think this model, if you can get by into it, I actually think it's more streamlined than DBGAP. I think it's safer from an LC perspective. These are not giving out individual data. And for all the reasons that Mark talked about, there's more control over, you know, you can set it for diabetes here. That's the only thing we're doing the analysis on. And I think we must constantly reassess this not only for the advance notice of proposed rulemaking, which should change the common rule as Laura had spoken about before, but also potential changes for HIPAA. Right now, HIPAA basically says that genetic information is not identifiable unless it's linked to other identifiers. I think it's a matter of time before the Office of Civil Rights, which oversees HIPAA, starts going into maybe genetic information is identifiable at which point you're going to be talking a whole other set of regulations. So that's my last slide and I will leave it there. And I can either say no time for questions because it's time for Wiley or take some questions. Take some questions. Who looks nice? Okay. I've never been called that. Thank you so much. You're so welcome. I'm just wondering, you're articulating a view where there's no doubt that these LC issues can, you know, drive what we do and of course they do. But I think you said that this is no access to primary data, although there are several classes and areas of scientific questions where we will need to access the primary data. Meaning it may be, and I don't think this is inconsistent with Mark's talk, that there may be ways in which you go in and reanalyze the primary data in some way, for example, recalling. So I just want to make sure. But I guess my understanding is that the primary data lives at the local site. They send it in de-identified. So coded, genotype, phenotype. And the server itself just has coded without the link. And Mark or David? Yeah, I'm just reading the question that this is going to be useful so long as I can go in and for a certain class of problems reanalyze the primary data according to my choosing. That's all I'm saying. I think we have to make a distinction between the people who are running the server and the people who are serving up the data. So in many ways this began out of the very inane situation where you cannot query an RSID into dbGaP and get the allele frequency. And so how can that be? How do we not have a system to do that? And I think that this is the point that I would add to the discussion we were having before is that there isn't, I don't think in this vision, a single monolithic server, but rather how do we go about building ways to interface with dbGaP data so that different groups could potentially build different servers to allow people to make queries. And then you have a proliferation of potential ways of doing this and what we need to agree on or a set of standards by which if you meet these standards then this will be okay and we agree that this is an activity that doesn't violate the underlying data use agreements. Is that clarification, David? I'll let others go. So I just wanted some clarification on one of the points you raised about there wouldn't be ever be data going back to patients. And if you recall, as Mark described, one really attractive aspect of this is the ability to make next generation sequence data quality better by being able to go across all existing studies, very large numbers of individuals. And this matrix that he talked about of variant by subject would inevitably mean that at some point new information could emerge on the incident alone that the investigators might want to take back to people. So I don't think, I think it's a lot to assume that no information would ever go back to patients. I don't, I forget about the apps and the analyses that people want to put in. If part of the rationale for such a thing is the quality of the sequence information improving by being able to go across, part of the reason for wanting to improve it is to see better what's there. I think you're absolutely right to call me on that statement, which felt kind of good coming out. My point is I think right now the return of research results in any research and incidental findings versus anticipated results is just such a quagmire. Add on to that a bank, a de-identified bank. In that way I think... But I also think it's not, it's the people who put in a given data set presumably have the right to look at those individual that matrix the results from their own data as the quality is improved. And so again, they're... I think that would be a business rule that the server would have to look at in terms of for submitting the people who primarily submitted the data, would they be able to look at it and would return of results be a possibility? That then has to be in the consent form. I mean, it's just balloons. Nancy, I mean, what I would imagine this happening is somebody runs some analysis of the server and they have a biological insight. And then based on the biological insight, they have permission from their research subjects to return results to participants. And so it's not the server that's returning results, but rather the investigator who holds it. And that, I mean, that's sort of saying, I did an experiment in the lab and I found something out and now I feel morally bound or I'm bound by my business rule to return results to participants. In that sense, it's not the server that's returning it but the investigator. Yeah, I think that primary... I think the submitting, the originating investigator, I think we can, there's room there. David? So I'm wondering if you and maybe others want to take it but you're as good, you're better as good as any. What happens, what happens to not just this idea, it's partially because you're up there, but if it were to be the case that HIPAA, that it's decided that DNA sequence data is HIPAA identifier. Okay, I'm now asking you to project in the future what happens. And let me be more specific. There's a way we have of doing business right now. One of the things I worry a lot about working in a place, instead of places where a lot of this research goes on is tomorrow a letter goes out saying DNA sequence is HIPAA identifier. I actually think our current way of doing business has to shut down and we have to stop. Everyone here is worried about, oh, we might throw a wrench in the works. I'm worried about what happens to us if we don't come up with some HIPAA compliant way of dealing with these data if tomorrow we get a letter saying there's a new policy not of this group of people but from some higher group saying DNA sequence is a HIPAA identifier. I think I'll just give you one example. The permission forms for the common rule is the informed consent for HIPAA is the authorization and there's different rules on what they have to have and one of the really stymie elements of the authorization is it has to be specific to use and it doesn't like broad uses. Now you could say maybe the use is just dumping it in but then once it's there every analysis that is done I could imagine that the HIPAA police would say you got to get specific authorization for every analysis you're doing because we don't allow broad. Now I think if HIPAA were to say, you know, whole genome sequences were identifiable, hopefully there would be discussion regarding what would be the ramifications, there would be some workaround or allowing broad consent. The other issue is HIPAA doesn't, I mean there's, I think that's probably the biggest example and it would really be completely different. So a question that follows up on this would be to what extent would you extend the central analysis server to be a distributed analysis server such that the data never comes to the NIH and that way the query just gets run at each of the sites. Now that doesn't address the consent issue with respect to specific versus broad but it does address the submission of identifiable information to the NIH. Is this something that the NIH should be open to? I mean from the ELSI perspective I would say yes. Again if you had the same rules of what each decentralized thing did and once it was in that server they played by the same rules but I think like anything once you have multiple decentralizations it's hard to maintain that consistency. We did, for what it's worth, we have talked about that in our group and if Gonzalo thinks it's complicated to build one such system imagine building an ecosystem which everyone who wants to analysis somehow is going to run all these jobs and make them all comparable. But CA Big has been doing things like this for years. I will not comment on that. I'm not saying that it's perfect but it's a potential. Okay. Bang, bang, bang. Bang. Yeah, I just want to comment. Your matrix was very well displayed with the big gap in the central analysis. The one role that I would include is the role of software upload because that's what you don't do in the big gap that you do in order to make up for the fact that you're not returning results. So I would submit instead of a data access committee you would have a software upload committee because that's what you are vetting is the software or the analysis that is returning results and if you don't watch it carefully it could be a simple algorithm that returns some results that do allow you to reproduce the original data and you don't want that. So I think we can think of a better acronym but I think that's what it's the reversal. That's a good point. Sharon? I guess I had two questions. One to just echo what Nancy said. I think for many genetics researchers the idea of a quality control exome where you could compare your data analyze the same way with large data sets would be extremely attractive but people are going to expect that they get that data back and can share it. My other, so I think that part of the model I'm still unclear between the two of you but about how you're thinking of that working. The other question I had though is are you going to allow clinical labs to have their data analyzed this way for a fee because that I can imagine will be one of the first requests. There are a lot of patients now having exomes done as a clinical test and if your exome isn't analyzed the same way as this magic server there may be conflicts in the interpretation and so I could see that being a very early interaction. I would dump that to the boys but I mean I do think though it does bring up a huge issue that right now I think clinical and research sequencing have so overlapped. I mean I think there would have to be some accommodation for that. I mean I would argue that you could if we allow this to happen and make this a possibility for people to be able to create DbGaP, take in DbGaP data, reprocess it and allow queries then you could imagine someone building a virtual machine that could then just be sold that gets tacked on to the analytical labs with each of the data releases. There's a set of standards. You do it, you sell it and then it just becomes part of the pipeline for doing the analysis that gets tacked on. I think right now I would say the big limitation is there's no way to repackage DbGaP data. If we view this as a way to package pools of DbGaP data so that people can query against it without getting back individual level genotype results then that is what I think is really going to be able to liberate all the data. I also think it helps a little bit with the concern that you have a lot of clinical labs starting that are trying to develop expertise in a pretty complicated data processing challenge and we all hope that they do a good job that they get the results that are consistent with the standards that we set but we're not really providing them an easy mechanism to reach those standards and this would be one mechanism to do that would be good for all of us because if the first line of experience of people with sequencing is the exposure of their clinical sequencing results implemented by people who are not experts in that area then this would be very risky for us. Debbie and then Richard. Richard? Yeah, I just listened to all the discussion today right from when Ewan mentioned that in the UK we're starting to look at the medical record rather than at the research result through to this discussion just now what I'm seeing is that in talking about the dichotomous clinical versus research situation we're talking about preserving the research domain and its integrity and worrying about bleed over and yet we've got to all face the fact that the clinical domain is just going to be enormous and it's going to dominate the discussion within a year. So I think it's being underserved in the discussion we've had so far that we'll come back to maybe approach David's question not with what on earth do we do if the HIPAA rules change but how can we take more, build infrastructure to take more advantage of the opportunity that even in this nation with its imperfect healthcare system will be there. Debbie does have a question. So Richard brings up a good point. I mean it is very interesting to think about the fact that you could allow clinical samples into the system as long as you can keep the data from the system right now I think that research is hampered by the fact that many individuals sent for clinical sequencing for genetics for example go to private concerns those concerns have the data where researchers don't and I think that is hamstringing human genetics research in a way and I mean it probably is very useful to think about allowing clinical laboratories that charge for things in if the data is left behind. Can I suggest, I know we're switching we've just taken a turn that I think is a really important one because I think that I just wonder if it's worth even though we're running late and there's a Celtics game to watch and all that but like how this might enable, empower, relate to benefit from what Richard says which is absolutely right which is a year or two, a year from now, two years from now it'll be as much, ten times as much a hundred times as much and yet that's not clearly we could see this kind of thing we're talking about as profoundly enabling in the beginning because it allows the samples as they come in to be interpreted in light of the research data and also could provide access to different processing pipelines that don't exist now and then over time the learning environment becomes the clinical data we haven't really talked about that and whether we talk about now or tomorrow I think it is where it heads but we should make sure we really talk about it I agree it's very important to consider the clinical aspects of this and I think there's a fundamental problem with what was just being discussed they're two very different things you're talking about a way of taking the data in this centralized server and spitting out summarized statistics that would be safe for anybody to see and not identifiable in some sense that's one kind of computational task that you could learn how to do and package up that's very different from the clinical task if I have a clinical situation and I want to analyze that person's genome you want to analyze the whole genome there's no reason to whitewash this into a bunch of summary values that anybody can look at there's no reason to try to mask all of the identifiable things there's a separate computational process that will use the full power of all of the information to maximally benefit the clinical needs of the individual patient the only way we'll develop that is if we have a set of unrestricted data that people can develop those methods on but David, I don't think there's anything in this setup that would prevent people with permission to access the data in that way to do it to develop methods that have total access to the data but the idea that you're going to interpret the genome without any regard to all the other data in the world is probably also wrong no, I'm not suggesting that I'm suggesting you wouldn't interpret the data just based on routines that are designed to summarize the data and eliminate no, this is a way to pull in for example pull in thousand genomes pull in data that's in dbGaP pull in in such a way so that you could process it with your own data and you can obviously have access to all your own data any other data that you have permission to do if you were on the server for example you would download the dbGaP data you still have access to all the dbGaP data if you had by dbGaP access to the raw reads you could continue to do that and then package that in and slurp in any other data that you have permission to so then we still have people developing systems with the express purpose of eventually producing data that anybody can look at there's a completely different purpose I think it's just that there are more let's say you develop a method to do that based on that approach and now somebody else wants to apply it but they don't want to give you access to the data they could still run the method leveraging what's in the server plus their own data