 I introduced myself earlier, but just to repeat my name is Mark Phillips. I currently have kind of two hats, so I work at a little research center at McGill University in Montreal called the Center for Genomics and Policy, where I kind of acted in my capacity as a lawyer, but it's more actually in an academic context there. So I do a lot of writing on kind of cutting-edge genomic health research data sharing, especially I'm kind of the main privacy and data protection person there, and so I'm doing a lot of stuff around that, but also some other ethical and legal issues, and then I also have another hat where I'm a practicing lawyer, and so I advise clients on especially privacy data protection issues, including people in the field, including rare disease, patient advocacy organizations, on compliance issues like, you know, pre-intelur privacy policy, etc., etc. But I have a former life also as a software developer, I have a computer science background as well, so I kind of have two different fields at once, but I don't have as much formal background actually in life sciences, so it's probably very different. It sounded like from the introduction most of you are researchers in medicine, etc., but since things are so interdisciplinary kind of these years and things are coming together, it seems that kind of a variety of backgrounds is helpful. So that's a bit about me, what we're going to be talking about here is, of course, you know, legal ethical frameworks, and because you're, you know, researchers, scientists, you know, healthcare practitioners, etc., obviously you're not going to be experts in legal compliance, but it's good, I think, at least to kind of supplement what you know with being able to kind of flag issues when they arise, seek out more advice and help, and be able to generally think about how we kind of structure these ideas. So just to talk a bit about the kind of cloud context, this is a bit more the technological context to start, so as you may know or may not know, there's been kind of an increasingly interest in turning to the cloud in genomic research, life sciences, etc., and so in Europe there's a relatively recent project called the Open Science Cloud that they're trying to put together different infrastructures across the continent to serve science. I'm not sure if it came up already earlier today, but out of Chicago NCI has got its genomic data commons, and then similarly, and I believe this is a platform we're going to be working with all week here, the Cancer Genome Collaboratory is another kind of cloud project based, essentially, I mean it's a collaboration, but it's basically based out of Toronto. And so there's some specific considerations that come out in the cloud, all this kind of the intro to the cloud stuff, as people may already be aware, so first of all kind of what is the cloud when we're talking about it, the characteristics, it's kind of set off against traditional approaches to computing in terms of buying your own machine or using an academic computing center that might have quite a bit of power, but the idea here is, so on demand self-service is kind of the first characteristic, the idea is you can kind of go on to Amazon Web Services, buy whatever you want, have it instantly without having to call anyone and talk to them on the phone, broad network access so that you can kind of access the resource from wherever you are, you don't have to be physically close to it, usually on the open internet, resource pooling is kind of maybe the main characteristic, and so this is kind of the idea that the resources you're using are out there somewhere, probably in a server farm, it doesn't matter so much to you generally where those servers are, how they're working and through essentially virtualization, we can you know fire up virtual machines with our desired environment wherever you are, so at some point some of these things break down and we do start to care that we'll talk about a bit more later about where the servers are located, rapid elasticity is the kind of fourth characteristic, the idea that we can scale up our projects really quickly, and so you know the ideas are here, generally the idea here is the traditional approach would be if you have a project you've got to actually buy the hardware or have the hardware around, here we can kind of scale up and scale down as need be, in theory at least keep costs tailored to the project we're working on, and the last characteristic is kind of measured service of the idea is you're paying for whatever amount of processing you're using, what storage memory etc, and not more, we often talk about a few different service models when it comes to the cloud and so infrastructure as a service is kind of the most low-level one where you're essentially buying processing power memory or kind of raw storage etc, and the other kind of extreme would be software as a service kind of things when we were using essentially you know things like Gmail or any kind of end user application, it's all the work happens in the cloud but it's kind of transparent to us, we just fire up a browser, and then the in-between kind of middle ground is more I think like, I'd be interested to hear from others if other instructors agree, but I think what we're looking at with something like the Cancer Genome Collaboratory is something more like platform as a service where you can fire virtual machines etc, but there's a lot of kind of analytic tools that are built in there, there's a lot of repetitive tasks that are essentially we avoid rather than having to build something up ourselves from scratch. A few other kind of buzzwords that come around the cloud, we talk about the public cloud or private cloud which might be kind of in some way especially public cloud might be the reverse of what you would expect normally when we think of public, we're thinking of often the state owning something and private being corporations and here it's somewhat reverse where public cloud when we're talking about it generally will refer to so Amazon web services for example which is or I shouldn't specifically talk about them any kind of public cloud service or cloud service provider you're talking about the services are essentially accessible to the public anyone can kind of sign up, whereas a private cloud generally we're talking about cloud infrastructure within a specific organization, community cloud is kind of similar, the idea is a group of organizations might set up something together and you can think of kind of these two as being similar to the collaborator idea part of the reason was that it's nice for researchers to have a cloud infrastructure, cloud environment and especially the ethical legal considerations tailored to their own needs rather than always relying on kind of stock out of the box approach which might not work as well for specific context and hybrid cloud as you might imagine incorporates elements of the different earlier deployment models so how does this look in practice we're seeing kind of more and more kind of centralization of genomic data because of its huge value and we might see for example this is especially a slide about how the idea behind the cancer genome laboratory is imagined and so we might see a whole bunch of genomic data stored from patients or participants in North America but there might we also have separate clouds in different parts of the world and some of the reasons for these might be technological but often they're actually ethical legal considerations there is more and more concern about if we're sharing mass amounts of data I mean this especially came out kind of I guess it's now seven years ago following kind of the Snowden revelations there was there was concerns about what's going into specifically the US but there should be concerns kind of everywhere how's it being used do we lose control over it when it's when our data sent to some other part of the world and so the idea I think here is that there might be some inescapable amount of fracturing of these of these clouds around the world but from a researcher perspective and this is kind of based around the ICGC international cancer genome consortium project it's nice to be able to have a kind of unified view if we're running analytics to be able to not have to so much worry about gathering data from all over the world even when we want to run kind of robust analytics on various cohorts at the same time and so what we're aiming for here is to is you know to have to have these kind of prepackaged tools that I was kind of talking about before and then of course some of the cohorts you're going to require data access committee authorization so if people are familiar with kind of the research ethics process there's a similar but different process that's kind of emerged in this new space of the access committee approval so the idea is you'll get authorizations based on your project from the various cohorts here ICGC has actually done a really good job of unifying these down to kind of two separate authorizations one from TCGA and the US and the other one from ICGC staco generally so there aren't a whole pile of authorizations to get and the idea is that the researchers wants to be able to pretty easily run kind of a basic analytics across all these so same thing for example run this analytic pipeline across all donors with primary and metastatic data tumor data available and so this is essentially what we're aiming for and I've got a bit of an older graphic here that's intending to show kind of the benefits of the cloud versus even kind of a combination of academic data compute centers based on just kind of the raw amount of data we're dealing with here the need for large cohorts the size of genomic data in many contexts the theory was that things were going to be cheaper essentially faster and better in the cloud I think it's taken slightly slower to get there but I think that's the trajectory we're seeing or slower than we expected maybe five years ago when we were starting but it seems to be the direction that things are moving and so as may have already been discussed I believe this week where the infrastructure things are being based on here is the open stack kind of cloud infrastructure which is kind of consistent with the you know open data open science open source ethos of the whole sector and of bioinformatics.ca in particular so it's kind of some of the rapid technological background and so now for the second part of my presentation jumping into some of the kind of landscape of the privacy and ethics issues when using genomic data yeah go for it. Yeah yeah so there's I think there's kind of two from what I've seen there are two kind of overarching responses to that issue you know because because of because this field is so kind of increasingly internationally collaborative and dealing with different groups and cohorts and if you have everyone in some degrees to be expected the different parts of the world are gonna have different ethical legal concerns and issues and they might but if everyone's you know consent form looks slightly different it becomes tricky when you're trying to say someone everyone using the data has to comply with the original consensus so the kind of two approaches I've seen at least put out there the one that I think I've seen mostly in practice is the ICGC one of just finding a way doing a lot of work in the back end to be able to centralize and have one DAC that even if there are a whole bunch of member projects within ICGC to be able to have kind of one centralized place you can go to to get your authorization fill out one pretty quick and simple application form yeah exactly the other process I mean I haven't actually seen this in practice but what's been a lot of people have been trying to get off the ground is some kind of interoperability or are almost a federated DAC system where you kind of have one's one back in the mini DACs and you try to figure out ways for them to trust each other ways to code the consent so you can understand you know say you want to say you're using data for a specific purpose a way to add to your query which data has been consented for the purpose I want to draw all that data in I haven't saw it doesn't make it doesn't make as much sense to me but I've seen some approaches proposed using blockchain somehow to do this because you have to use blockchain for everything but I have so far haven't seen in practice that kind of I think it's on the horizon and we'll see if one we'll see if when it comes out whether it's a solution okay and it's basically it's a trusted scientist status so once you get those then you have access to all the data sets because you know you're a trusted we know you're a trusted scientist you're not going to do anything bad and you know that they're publishing but having a general generic trusted scientist status you allow you know or you might be able to do it but it would be hard to get access to every perspective dataset that could be out there okay well we can take it up in discussion if there's time etc but so as people may or may not be aware there's going like a few years but there's been a large number of especially in the healthcare setting a large number of breaches and concerns that have been that existed in the media much much less in the research context like very I mean the the I don't know if I would call them breaches but the the concerns that I've read about especially in the research context are more thankfully I think more theoretical breaches where other researchers will try to say re-identify participants in a cohort I'll get to this more later and we'll be able to say you know we weren't trying to do anything malicious but we were able to we were able to kind of undo the security practices you put in place so you should probably consider putting in stronger protections but in health care it's been quite quite large especially in parts of the US but all over and there's been kind of a variety of forms of breaches that you'll see not all of which are as concerning as others so you can see things like and it's not not clear you even call this a breach things like a denial of service attack in some cases where it's just flooding the person is the organization's network so they can't actually use their resources in the way of the degree they want which can be frustrating to people involved and really kind of paralyze the project but it has fewer I mean at least differently all concerned I don't know people have read about a couple years ago there was an explosion of ransomware attacks where essentially it's similar where there's usually your data isn't compromised no one else has access to it but it's compromised in the sense that it's suddenly encrypted and some malicious actors asking you to pay a ransom to be able to read your own data back again and a number of large institutions have been kind of hit by these unauthorized access of course is kind of a large one we're talking about some of those a bit later and then some of the re-identification stuff that I was mentioning will also come up again so just to go over a quick quickly a bit some of the overarching like legal and ethical frameworks that are kind of at play here I'm not sure if people have I'm not sure to what degree people have looked at or worked with these before if people have a sense of I've kind of grouped them here into three overarching kind of categories but I'm not sure if people have an idea of kind of what you know legally prevents one as a researcher or a clinician etc from doing kind of whatever they want with their data people have ideas about things that might be governing what they're doing so hit by definitely is one that'll come actually under like the second category of things that I mentioned here this is a US law I'll talk about it a bit more later but was there an answer over here too yeah exactly so that's I mean that's the first category I've kind of put up here is a broad kind of research ethics kind of category so the IRB is especially the US kind of terminology and Canada the main kind of document that people see is if I don't know people have heard of the Tri-Counsel Policy Statement which is so a document put out by the three kind of research funding bodies here in Canada people here I assume would mostly be going either through CIHR Canadian Institute for Health Research or NSERC the second and probably less so social sciences and humanities research but that council but in theory this policy applies to all research so it's not it's very very much not specific to genomic research the considerations might be different but generally this is what's applied specific institutions might have their own kind of yeah and so in the analog in the US would be more the common rule which is a similar kind of you know it's a and there it's more of a law rather than a than a policy guiding kind of what they call human subjects research there's also some kind of common law duties and I mean I'm from you know from Quebec where the common law is not actually applied in this context but in most jurisdictions that we're going to be talking about there's some common law duties that applies in research specifically so for example there's duties to disclose to participants and be transparent with them but also even in terms of the kind of way informed consent works the way the common law tends to interpret it which is a bit interesting from our perspective again is that if you fail to give full informed consent to research participants then their then their consent to research you know was not proper and you've essentially committed the tort of battery which is almost the similar the analogy in the kind of private law to criminal law of assaults which is going to be less relevant to us right when we're working this data data sphere we're not actually physically intervening on participants or patients we're using secondary use of data so some of these some of these kind of frameworks are going to apply less or at least less clearly in our context the second category I put up here is the one that that hippo that was mentioned before falls under and so it's and it's going to apply somewhat more directly to kind of the data-driven research but and but it's really this is a kind of a field that's really on the rise right now let's it's existed since about 1970 you can pretty much pinpoint it to that year and so personal information and data protection law in the US and sometimes in Canada it's often just called privacy law in Europe there's a much more clear distinction between data protection law and privacy but so in the EU the big kind of piece of legislation that's come out I'm not sure if people would have heard of it is called the general data protection regulation made pretty big waves when it came fully into force last year if people got I'm sure if they remember back this far but in May of 2018 you probably got like a million privacy update notices from all the companies you were working with it was the corresponding with the date that this this was entering fully into effect and companies were kind of panicking hippo was just mentioned which is a US specific law that's specific to the health care sector and so it some it has ripples of effects on research but it primarily targets kind of the health care context and in Canada we've got a bit of a weird situation where things are really broken down jurisdiction only and they can get quite confusing because each province kind of has their own not only does each province have their own often data protection law but it's also subdivided into private sector versus public sector so it's quite different than in Europe where you know there's one law for pretty much the whole continent here in a single country you've got a whole bunch of laws that are relevant the main one or I a main one it's hard to say the main one is PAPIDA which I'm not sure people would have heard of it's the personal information protection and electronic documents act but it's kind of the federal private sector law so any private sector organization my assumption is that OICR here would fall under PAPIDA although I'm not a hundred percent sure I'd have to talk to their in-house council about that but so the idea is if you're in the private sector rather than the public sector and to add an additional wrinkle here in Canada provinces have the kind of ability to adopt their own data protection laws and that are deemed to be substantially similar and then they can override PAPIDA so far I think it's just Quebec, Alberta and BC that have done that in an overarching way in the private or sorry public sector so some universities for example McGill University where I'm out of is our subject instead to there it's not actually called FIPPA but in other provinces it's often the freedom of information and protection of privacy act so you can see the kind of privacy context there and then just add another wrinkle we've got often another set outside of both the kind of private and public sector we've often many provinces also have health information protection laws so in Ontario for example that people might be familiar with FIPPA the personal health information protection act and these kind of throw things into even more confusion because they usually apply often to so they'll often be kind of overriding some of the public sector context because they're applying to hospitals and things like that but here FIPPA they basically define you can see section three here is health information custodium and they basically list the long list of various people in both public and private sector who are who are subject to this act and then not subject to the other acts so it can be so whatever project you're kind of involved in or whatever institution you're working on behalf of it's good to first kind of figure out which law applies to you if you're in Canada or in general and then although the duties although they're not identical obviously they're not the same laws there there is a large amount of overlap in the way they they look and think and feel and even between these two areas so kind of the hallmarks of research ethics often are thinking about informed consent thinking about whether you're doing human subject research or not if you're not doing human subjects research you don't generally have to comply with research ethics duties although that there's some exceptions to that but from our purposes that's kind of what we're thinking about interestingly actually I mean we'll maybe get to this later but under I mentioned the common rule in the US until recently and I think maybe even now if you're working with what they call de-identified data even if it's genome even if it's rich genomic data a lot of the IRBs in the US would say you're actually not doing human subjects research because you can't identify the humans involved and therefore you're not subject to review it seems like that may be changing in the future but perhaps not yet the last kind of big grouping I've put here is basically kind of a contractual obligation so when you get approval from these these DAX the data access committees you generally have to sign a legally binding agreement often your institution will have to sign off to show someone is going to be kind of responsible and even something goes wrong and those contracts will kind of add other kind of legally binding obligations on you that that supplement essentially or also sometimes similar to the other obligations that exist so one question you might have is if we're you know for biometrics.ca why are we talking about things like the EU general data protection regulation or even HIPAA in the US it's because there's not only a huge indirect influence but increasingly I think there's there's legal structures are really struggling with the fact that traditionally you could kind of piece things off into jurisdictions but with a global internet and international collaboration it's increasingly hard to do that and it's tough to see ways around so this is Article 3 of the Europe's General Data Protection Regulation and it talks about who the law applies to and so what we see is that so processing personal data of data subjects who are in the European Union by a controller who's not in the European Union so basically anywhere else in the world will still be this law says subject to the GDPR if they're offering goods or services to these people or if they're monitoring people's behavior and so it's this law is interesting and then it says we can apply to you even if you're sitting you know in Toronto if you're offering goods or services to Europeans or if you're monitoring their behavior on a significant scale so if someone just shows up you know at a hotel in downtown Toronto who's European or not automatically subject to the GDPR but in any case because this law the idea was it apply is a general data protection regulation that applies to public private sector it's mostly thinking about you know it's probably thinking about the kind of tech giants context it's not thinking about the research context you can immediately start to ask yourself well is as a research project offering goods or services or is it monitoring people's behavior and the current thinking is normally especially if you're getting the data transferred no but because of all these kind of difficulties and that the kind of application of this law kind of extraterritorially I've been involved with some others through a project called the Global Alliance for Genomics and Health that people might be familiar with who have a variety of areas they work on some kind of data security standards also kind of other kind of you know genomic file type standards but I'm more involved in the ethical legal side and so we've been trying to clarify some of the issues and we started this series last year of monthly briefs aimed at the research area to try to explain some of the basic ideas and so one of the one of the briefs that was put out by kind of the co-chair of the this project along with me Edward Dove was to say who does the GDPR apply to I mean he worded it in the more probably appropriate grammatical way but essentially what it seems like from what we can tell because we don't have a lot of case law about this is that offering goods or services or that other category I mentioned before doesn't probably apply in the research context but there's another article that essentially says if you're collaborating with people some of whom are inside the European Union some of whom are not the GDPR probably does apply to you and so in the context like the ICGC which you know as the project keep coming back to that the Cancer Genome Collaborate Collaboratory is heavily connected to we do have projects all around the world including in Europe and so it seems likely in these contexts that the GDPR would ultimately apply from the Canadian context one way to comply more easily if you're receiving data from Europe is essentially when it's the predecessor to the GDPR was set up in 1995 they said you know you there was a essentially a blanket prohibition on transferring data outside of Europe because they were worried about what might happen there but they said we still want data to flow freely so different countries around the world can actually have their legal legal frameworks certified by us as being adequate and so Pepita the Canadian lie mentioned before actually early on about 20 years ago now was approved as adequate by the EU so if you're receiving data from Europe and you're subject to Pepita here in the US there's a similar there's another mechanism in the US called the EU US privacy shield that works in a somewhat similar way you essentially have satisfied the Europe the or at least the sender from Europe has satisfied their obligations under the GDPR so one thing to keep in mind I want to jump now to some of kind of the concerns of why we why we care about this especially so in the cloud context and we first started talking about the cloud a few years ago one of some of the concerns were things about losing data control because you're sending the data out somewhere else someone else is kind of controlling it on your behalf instead of it's not in on your computer anymore so one of the ways this this manifest itself is in terms of having to sign standard form contracts especially if it's with some of the large cloud service providers sometimes that the terms especially the larger the provider are going to be probably less and less favorable to you there often is even it goes to the degree where they say we can unilaterally change the terms of this contract just by notifying you and you can't really say anything about it other than kind of ending the contract right there early on there was some bigger kind of genomics projects that were able to somewhat negotiate specific specific versions of their own terms of service it's tougher and tougher to do now although there is more and more flexibility in terms of the out of the box kind of forms of agreements you can have with cloud service providers now so for example generally they'll allow you to say that you only will allow data to be stored in one specific region of the world and not have a processor stored elsewhere if there's legal restrictions that prevent you from doing so and in some cases is risk of data loss we saw this early on especially with some of the smaller cloud service providers who either went out of business or just weren't protecting their data properly which can cause you problems for people if they're relying on that data to be there and in some cases the the contracts are also pretty unfavorable as far as who is liable when something goes wrong often a lot of the liabilities put on to the the client rather than the cloud service provider and then especially going back to the the 2012 Snowden revelation era there was worries about essentially you know Edward Snowden revealing that a lot of these companies were essentially handing over a lot of the data to the government and there's been some ways to or mechanisms to mitigate this that have come into place since then I'll go over these quickly because I'm sure people are kind of aware of them but there's certain harms to the data subjects or the you know participants or patients so one thing that comes up a lot is the risks of discrimination and insurance and employment people might be familiar with there's a law designed to protect this in the US called Gina the genetic information non-discrimination act that essentially says that or at least limits the ways that insurers and employers can insist on collecting genomic data that they can use it etc in Canada we've kind of been an outlier in terms of being one of the only countries that are heavily involved in genomic research that haven't had a law like this there actually there was one passed at the federal level a few years ago now but more recently just in the last few months there was a case that went to the Quebec Court of Appeal and they decided that the law was unconstitutional for more of a technical issue of division of powers so the federal parliament in Canada had put this law into place and they said actually the provinces I'm over simplifying quite a bit but they essentially said this was within provincial jurisdiction federal government didn't have the right to do this and so it becomes more difficult to see if we especially we want to reassure participants that their data won't be misused if data is shared widely because I think it's I think it's reasonable to expect that if insurers can use this data to try to make predictions and there's nothing to prohibit them from doing so they generally will so in Canada there's still a bit of a gap there and there's also risks of disclosing sensitive health information and so there can be things like through genomic data of people's susceptibility to disease but also if people's data is stored in terms of you know say a case control kind of context you can know you can disclose kind of who's part who's a case who has a certain disease and who doesn't paternity information this kind of came up early in the genomic research years of being surprised about paternity and specific cases which can you know cause people to stress in some cases is maybe getting more more theoretical but the risk is that there could be identity slapped especially if genomics begin or biometrics begin to be used or genomics begin to be used by biometric identifier but generally skip to future uses here generally I think they're one of the risks to even if things have certain things haven't materialized yet is that so unlike say if someone's credit card number if it gets compromised there can be you know significant risks or some other kind of financial information that can come to a person but ultimately you can always cancel your credit card get a new credit card if your genomic information is somehow preach there's you can't really you know order a new a new set of DNA or chromosomes etc so if there are future things that can be done with the data it's kind of you know you can't unring the bell and then the last kind of set of potential risks are to researchers and so there's of course the risks that participants lose confidence in in the field it's harder to recruit people for specific researchers the risks are kind of things that it could affect people's careers such as losing funding or perhaps access if they violated the terms of an agreement and then some of the privacy laws and themselves impose things like fines in Europe they've gotten more serious and I've got in some cases at least in theory provide for criminal penal sanctions there was actually some Google Italy executives were sentenced to six months in jail for a violation in a certain case there although I think they ultimately ended up working that out in some other way and avoiding actually going to jail but initially it looked like it was going that way this was a few years ago and so the other the other kind of difficulty with researcher is that these novel risks make it harder to get informed consent because in general you have to inform participants of all the risks that can be out there but some of these become more and more far-flung and hard to predict in advance when data is being shared all over the world and then the other risk that happens is that when regulation is what we're talking about overly cumbersome difficult to comply with sometimes not thought out that well it can stall research so this is kind of the balance and kind of will tie into the next section that I've got here which is kind of some other flip side of obligations that exist increasing on researchers which are kind of open data which I talked before about is kind of more a movement but it's also increasingly becoming an obligation and so but I guess this existed since at least less coercibly since the start of kind of the genomics revolution where as early on as 1993 in the human genomics human genome project there were guidelines kind of put out essentially recognizing the kind of public character of genomic data and the idea that especially given so the fact that public funding was largely funding the human genome project and that things were quite expensive that data reuse was really key and that there should be incentives and maybe even obligations on researchers to share their data as widely as possible and so there was a number of kind of guiding documents that came out around this some of the I got some of the early key ones here the 1996 Bermuda principles but these have kind of continued over time and so for example the NIH in the US they've had a series of genomic data sharing policies the most recent either large iteration was in 2014 and in Canada the tri agencies has their own open access policy and publications so I've got a kind of link to it here and so some of the main things that are involved are that you know not only do you have to ensure that your publications of this stuff is probably well known to you publications are available through open access sites but also you have to deposit your kind of molecular data etc. and I'll also note that there's an obligation to retain data sets for a minimum of five years which I'll come back to it pretty soon and so some of the main issue the main principles around genomic data sharing are things like releasing data rapidly like I just mentioned publishing and open access journals respecting publication embargoes so from this again goes back to kind of the early days 18 minutes left okay so I'm not going to get through all my slides but I'll try to sorry try to highlight some things so essentially there's a bit of a conflict here between open the principle of open data so the if we look at kind of a definition of open data the idea is that the data should be to comply the data should be provided as a whole it should be free to use free use for any purpose and the redistribution should be free which kind of is that polar opposite of a lot of the techniques used to protect privacy and so often we won't provide data as a whole especially identifying information will avoid sending out will impose restrictions on the way it can be used on the purpose it can be used for and not allow a person to redistribute so there's a the real tension that's come to come to the fore recently I think between these kind of conflicting obligations that unfortunately researchers are going to be subjected to or are already being so and one of the examples in the previous talk there was mention of this new project from the Chan Zuckerberg Foundation the Human Cell Atlas that I've been kind of involved in more and less tangentially and so this is an example I think was came to the fore where the European Commission is funding the project under kind of the the science research funding arm of the European Commission they insisted that the data from the project be made available open access through kind of through kind of an open science principle and on the other hand the data protection supervisors kind of arm of the European Commission at one of the meetings there was a representative who said oh if you if you do that if you put all your participants data freely available on the internet we're gonna take you to court and sue you and stop you from doing that which is a bit of an awkward position for the the project to be in where two arms of the same organization one of which is funding it and one of which is an enforcement agency are telling it not to do you know telling it country to contradictory things and so I think this highlights some of the tensions so next section is kind of how do we address some of these conflicting concerns the previous kind of approach of the dominant previous approach until kind of five or ten years ago was really this idea that I kind of alluded to it already before a few times that anonymizing or de-identifying data was the way if essentially if we remove the kind of sensitive bits of people's information their names it's you know birthday it's other editifying information there's no real sensitivity left for the people involved in the data and we can still use the underlying kind of medical or health portions of the data to be able to do the research that we want to do and this is kind of reflected in in the frameworks I was talking about by kind of differentiating between I use this is a very American term personally identifying information we in Canada we tend to say personal information in Europe they say personal data and these frameworks tend to only apply to basically personal information and this there's an implication that there's some other non-identifying information or you can anonymize information so that it falls outside of the scope and you no longer have these obligations like in the context I was talking about of the common rule where it says that you know genomic even rich genomic data without identifiers it's not considered human subjects research so I'll skip through for time purposes I'm probably gonna end up skipping through quite a few slides now but over the last 15 to 20 years we found that some clever people have come up with a lot of novel ways to kind of re-identify information that was thought to have been de-identified so in one case you know removing name and address etc as long as they included in this in the US people zip code birth date and sex they found they were able to re kind of basically re-identify or or combine this with other data this de-identified medical data with data in a voter list to figure out who people were who people were and I believe it was something like upwards of almost 90% of people were uniquely identified by their zip code birth date and sex so you might think that this is pretty you know general information but it ends up being quite identifying and so basically this there's been a real loss of confidence in de-identification except in some circumstances it's still very important but as a general kind of solution to these problems essentially there's been much less there's much less confidence in this so as far as DNA the thinking went so far to even include this and as I was saying it seems as though in the US they're often still thinking this way there's been papers on this but there's been a string of kind of published at least theoretical re-identification attacks one of the big ones was the Homer paper in 2008 which surprised people quite a bit by showing that even aggregates kind of public published statistical aggregates of genomic data if you had someone else someone's kind of DNA who was a participant that you could actually tell whether they were a case or a control and so you could actually tell whether they had the disease in question or not and so this kind of sent a bit of a shockwave through the field of time and it's what led to the US DB gap database which was formally fully open suddenly became controlled access and there's arguments still about whether they kind of overreacted and making that big change but they haven't kind of stepped back from it since then I'll highlight quickly a couple others so the Gimreck people in paper in 2013 was one of the first that kind of went was another thing that people thought could not be done but with only kind of DNA and trace amounts of other data they were able to actually come up with certain people's names the idea the idea was really that how can you get to figure out who someone is with just their DNA and then the most recent here Kai I mean it's not that recent anymore but a 2015 paper was essentially showing there's high identifiability even based on 25 randomly selected snips in certain context so you only need a small number of base pairs to basically be able to identify someone as part of our briefs I put a short brief on this issue I'll skip kind of this one and this is just a quick snippet of a paper they're trying to react to kind of this this reality to figure what to do next and so these researchers were proposing that there's certain certain areas of this whole kind of genomics pipeline or research pipeline that we need to figure out whether whenever technical solutions can exist it's probably preferable to use them was their hypothesis which I think is is pretty compelling but in some cases there just aren't technical solutions to protect them so for example when the data is coming off the sequencing machine we can't really encrypt the data or do anything to protect it there so we probably need legal protections elsewhere maybe not so they're putting forward some ideas here it's still something that the field is grappling with gonna skip this whole case study there's been a few different kind of technological methods that have come have attempted to come up with kind of a replacement for what we thought anonymization may be able to in the past so far there's a few that have been quite promising in the cloud context homomorphic encryption in particular I find really amazing I'm sure people encountered it but the idea is you can store it's almost perfectly suited to the cloud context where you can store encrypted patient data in the cloud so you're if you're say you don't trust your cloud service provider but you want to be able to use their resources the data can be encrypted there and you can actually send encrypted analysis instructions into the cloud through the magic of homomorphic encryption which I don't purport to understand I'll try to read a few mathematical papers on it you can actually the provider kind of perform the encrypted operations on the encrypted data and return an encrypted result without ever understanding the data it's processing and so from that context of the we've looked at the start of the researcher on their laptop working in the cloud it makes a lot of sense unfortunately it hasn't scaled although there's been tons of proofs of concept so far hasn't scaled super well to a very large applications I've got a few others here so the kind of the main approach that's I mean there's kind of a bit of a menu of approaches that have been arisen in response one of the main ones is this controlled access versus open access kind of dichotomy so this is looking specifically at the ICGC project again and so the idea here was there's certain types of data that we probably think are not sensitive or are not identifiable enough that it's still safe to keep them open access to have them freely available to everyone but there's a lot of other types of data that are more identifying or are more sensitive that we for now we're going to ask researchers to submit an application to a data access committee to use only kind of in accordance with the agreement that they'll end up signing and so for ICGC I'll go through quickly if you want to get access to kind of the ICGC PCOG data or I believe I guess it's specifically that there's an access form you can fill out pretty easily online through their website you can create an account and then you've essentially got to put put in some basic information about the principle the investigators the people who have access to the data title of your project a short description of the project I believe you have to submit if whether you have whether you require or have ethics review and then when and if you're approved you agree to the data access agreement which has kind of a bunch of different requirements including not trying to re-identify the data using appropriate safeguards etc etc so the agreement actually includes kind of a number of documents that you kind of undertake to follow including ICGC guidelines they put out in 2008 that they link to certain privacy and security guidelines that will talk about kind of what forms of kind of you know everything from you know firewalls to all the kind of network security stuff that you might imagine and then as well they also include these kind of data sharing obligations in some of the and I think they specifically point to later Fort Lauderdale and Toronto principles oh as well as some intellectual property stuff sorry yeah I'm encrypted especially and so there's a bit of a report back that was done if I have a few minutes I might I might like to just quickly end talking about this might seem a bit further afield I'm not sure if people read about this scandal kind of data security scandal that happened about two years ago with this credit reporting company called Equifax and again this does get further from obviously the genomic data genomic research context but I think was helpful because it just recently in the last few months the Office of the Privacy Commissioner of Canada put out some decisions based on complaints about there was a breach that essentially happened in 2017 that was in the media fair bit and it's the first time I've really seen even though they're kind of linked the kind of privacy spheres and the data security spheres have people with different expertise in them that don't necessarily talk to each other it's not even so much they disagree but they're just really separate and so it's been hard to understand even what your legal obligations are around information security and information technology the laws like the GDPR etc will often just say it's your obligation to have you know appropriate safeguards in place and leave it at that which is in one way thankful because you know if lawyers are setting the data you know the information technology safeguards it's probably not ideal and plus things are moving so quickly that if you really encode them in law it's going to be a problem but if you want to have any idea what your obligations are it does become pretty tricky but I do think that this this recent decision from the the Privacy Commission in Equifax case moves things forward so just to say something about the breach it was kind of considered it was mocked a bit because it was kind of a cascade of failures everything went wrong here from it essentially arose from a small vulnerability and a small piece of software added on to the Apache web server called Apache strut that wasn't patched properly by the IT security team there weren't properly methods in place and then somehow that vulnerability was breached by attackers who were able to somehow gain access to this file share which is essentially like a dropbox that it turned out was being shared by between Equifax Canada and Equifax US and others that had a whole bunch of data that I think they were just conveniently sending it to each other and then it cascaded into like they put up a web page where you could put in your information to see if you were subject to the breach or if your data was included in the breach or not but essentially 95% of the times it would say we you may have been affected so it gave you no information but there was something about the way it was implemented if I remember being properly allowed fishers to generate a parallel site that looked like the Equifax site and they were getting people to put in their information there and everything kind of went wrong from start to finish so in any case this recent decision I'll end on quickly I think it's helpful in because it and obviously if you're you know say you're a sole researcher running from your laptop in your basement your master student or something you're not going to be held to the same standard that Equifax like a giant credit reporting agency is but as as the size of projects you're involved in increase they could get closer and the sensitivity of the data could be similar right genomic data it can be quite similar as we talked about before but the other thing is this just gives you an idea of the kinds of things I think that regulators are going to be looking at and then it might be good to have on your radar but you can kind of scale them down to what you think I mean this is obviously not legal advice but in general I think an approach is to scale them down to essentially your operations and so some of the things they were looking at were as far as safeguards one was vulnerability management so this is the idea that there should be systems in place to make sure that you know you're always getting the updates to run in your computer often those include security upgrades if you're not running them this is what happened with the Equifax vulnerability in fact an email went to the was sent I think to the IT department saying please patch this vulnerability it wasn't done and it was left the vulnerability was there for months some people were blaming this there was an attempt I think especially by the company to blame this one you know rogue employee but the the consensus was really if you're a company this large you should have robust systems in place and not just rely on an email to one person to make sure that something this critical is unpatched network segregation again this is one that might be less relevant if you're a person looking working alone on your laptop but as you as people start to work in international consortia the general principle is kind of what they call an IT security the principle of least privilege that only the people who kind of need to know basis you should separate different pieces of sensitive information to people who need them make sure you can actually access one from the other this is kind of what happened with the the kind of drop box between the two organizations that kind of let everything out of the bag basic information security practices in this case were found not to be there in this space they specifically point to the existence of this kind of file share that was set up to kind of facilitate things but kind of did a bit of an end run on security practices or security protections that were in place but in general I mean these things you'll find in the ICGC agreement to things like things like setting up proper firewalls things have become more important into kind of the virtual machine context to making sure you don't have stray services running that you don't need and then the other one that was important here was oversight so even there was a bunch of things that they found that so even though there were actually in some cases proper policies set in place no one had actually followed up to make sure they had been implemented properly and so the two things that they suggest here that are also I think in the G4GH the global alliance's privacy and security policy I'll skip here were so make it doing internal and external security assessments and so one thing we seem to say is that so there's an ISO certification ISO 2701 that they said this is probably actually appropriate for our purposes and Equifax actually did have this that was kind of the external security assessment but the internal securities assessment should have been happening more often and there are it's also important to note here there are actually ISO health informatics certain health informatics security certifications so that those might be more relevant to your work and then but aside from that even apart from the assessments they insisted on also on pen testing or penetration testing which is kind of the opposite end of the spectrum where rather than looking at the system to see what's there I mean in a sense it's still looking at the system it's there but you kind of simulate an attack you get people using what are known to be current kind of information security attacks or try to get into your system try to carry out various breaches see what they can do and see anything anything that's vulnerable retention was another issue the idea was you shouldn't be retaining the data for beyond the amount of time you need it from the the DACO kind of context usually the agreement will tell you how long you need to retain people's data in your context although it can run up against I mentioned before the the tri-council statements saying you have to retain data for at least five years so if you should make sure your data sharing obligations don't conflict with your other retention obligations and make sure to retain data for the appropriate period and then the last kind of issue or the second last issue they brought up was accountability for international data sharing between xfx canada and xfx Inc in the US which again might be relevant if you're involved in a consortium how are you ensuring that you're still accountable for the data that shared elsewhere and then the kind of the last main issue that they brought up was issues around consent and this was especially consent to data sharing and so in the context here essentially a lot of this is carried out when you if you're doing if you're asking for a data access committee for access they'll be looking at whether the consents are going to be used in compliance but it is important for the reasons I said before to make sure in general that the the consents where they were obtained from from participants are the same you know match up with what you're doing with their data and so I mean for now I did have a couple other brief sections but I think just in the interest of time maybe I'll wrap it up now if that makes sense and if there are any questions I'm happy to talk more after as well but thanks and thanks very much for your attention