 All right, so thanks for coming, guys. I'm Erin, and we're going to be talking about preservation and privacy, and especially the context of when privacy happens cross-border. I have my slides and my session notes available online. I'll eventually replace this bit.ly with the OSF link. Sorry about that. But the session notes are very detailed. If you want to enrich them or add to them, please feel free to do so. And if someone wants to record the Q&A, that's wicked too, or I'll try to reflect them there after the session, if I can remember. One of the key points of this discussion or the purpose of this discussion is to actually continue the conversation, because privacy and data protection is unsurprisingly quite large and intimidating and involves a lot of engagement with specialized staff and professionals who probably bill quite a bit per hour. So I think that we can probably stay in touch and share resources and information. I think that would be a really good thing. So Twitter is one way. My email will be up at the end as well. So this presentation is just going to include a quick introduction to myself into DuraSpace, a problem statement about digital preservation, privacy data protection, including some observations from my day-to-day work. I'll give an update on this project I'm working on for DuraCloud. It's a content and preservation storage project, and it's actually what launched me into this information-gathering project about privacy and data protection. And of course, we'll just do a summary of lessons learned, next steps, that whole deal. And I really would like to have a discussion and figure out where we can continue this discussion. So I'm Erin. I'm a librarian. I've had pretty amazing opportunities over the last six years to work in project management and business development around open-source software repositories. And I'm not a lawyer or even a legal expert. So I do have some experience doing legal research. I was invited to do some testimony at a parliamentary standing committee in Canada in 2012 based on some legal research I did. But that was a while ago. And it's not like my shtick. So just that's a disclaimer to say that I've referred or consulted with lawyers on some of these topics. And I'm by no means an expert. And that's why I'd like to discuss it. Also just to say that I'm here representing DuraSpace. And I'm one of a few people here representing DuraSpace. And just to say that we're a nonprofit organization and we steward open technologies. You may have heard of DSpace and Fedora, Vivo. Those fall under the DuraSpace umbrella. One of the technologies I'm going to talk about today is called DuraCloud. And it's one of the lesser known technologies under the DuraSpace umbrella. And we'll talk a little bit about why that is. So DuraSpace operates in the library and archives community. So we're pretty familiar with digital preservation best practices. And I feel like the best known and best understood one is just to have multiple independent copies of content in different geographic locations. It's something that people understand why that is a requirement or a recommendation. However, this requirement can get kind of complex. If your jurisdiction has privacy and data protection regulations, and even more complex if your jurisdiction doesn't have a lot of hosting or storage options. And I've seen that in a couple of contexts. So for example, I'm a Canadian. And I actually observe some real reluctance to use US-based cloud storage providers for digital preservation, even if they have data centers on Canadian soil. I've personally observed similar sentiments from librarians, archivists, and IT security professionals in Germany, the Netherlands, and the UK, and Aviva says Africa too. So these are some general observations just from day to day work. I don't have stats to back this up, unfortunately. But I do have a concrete example just from this last year. So I review tenders, like RFPs, all the time. And so we received an RFP from a UK-based organization in the summer of 2017. And so it was inbound. They asked us to respond. They said it's all the normal requirements, 200 terabytes of content for digital preservation. It included the requirement for at least three copies of content in two geographic locations. But it wasn't noted anywhere in the information security guidelines that storage outside of the European region would not be considered. So this all happened through private communication and inquiry. So what is OK for you guys and what is not OK? And so this is something that is pretty consistently left out of IT security requirements. And I think it should be added because people should understand what they can and can't do and where they can and can't store their content and make it very explicit. Excuse me. So in the UK, there's a Data Protection Act that says that data cannot be transferred outside of the European region without adequate protection. And I'm sure interpreting that is lots of fun. So the crux of the problem is data transfers. They create legal issues, particularly to a third party across borders. In Canada, for example, there's lots of Canadian examples because that's where I live. Sorry about that. So sorry about that. I just realized. In Canada, the responsibility of privacy and data protection is assigned to the organization that is actually collecting the data. And that responsibility persists through the entire lifetime of the data. So that has some really interesting implications when you're talking about long-term digital preservation. This isn't like we're going to delete this after three years or seven years. We don't actually know how long we're going to keep it. And we don't know if our data protection practices today are going to be any good in five years. So it has a lot of organizations airing on the side of caution because legal advice is expensive, sometimes not available at all. And they have to conduct their due diligence. And sometimes they don't have the resources to actually do that. So in some instances, they do nothing. Or in other instances, they just air on the side of caution, sometimes to the extreme. And so understanding these kinds of legal issues are particularly important when partnering with organizations internationally. DuraSpace is an organization that has members all over the world. And so partnerships internationally is something that we do. It's part of our business. And it made it very clear that we needed to be better informed about the parameters in this discussion. And so this discussion became really relevant when we were talking about creating international partnerships around DuraCloud. So like I said, DuraCloud is content and preservation storage suite of open source software tools. And it essentially can connect to storage in commercial cloud systems, national preservation networks, and academic storage systems. So it has a lot of flexibility in what it can connect to. So it's going to help you manage content across those different types of storage options. And it's going to handle the duplication and the bit level integrity checking. But despite being open source, DuraCloud doesn't really fit the mold of the other projects under the DuraSpace umbrella. It's probably why you haven't heard of it, or if you have heard of it, you may not be familiar with it, like you would be with DSpace or Fedora. So DuraCloud, it's not supported by a membership. It doesn't have a membership model. It does not have a diverse set of contributors. Instead, it was structured as a service and developed almost exclusively by staff at DuraSpace. But it does have an Apache 2 license. We're working on changing this. We really want to attract new contributors. We want to develop more welcoming and accessible documentation, and encourage and support new installations. We know that preservation storage, the location, is really important. The jurisdiction where the data is stored is very important. So we know that we are offering storage only in the US, and that just won't work for a lot of our members, a lot of people in our network, in our community. And so toward the end of making those changes, we've joined up with the Mozilla Open Leadership Training Program. We're participating in Global Sprints. There's a two-day sprint in May. We've worked with the Texas Digital Library to help them install and deliver DuraCloud services. And the international connections have kind of happened over the last year as well. So we started working with 4Science. They're a really well-known DSpace provider. They've been awesome open-source citizens in the DSpace project, huge contributors. And so we have been working with them to improve the installation and deployment documentation, because it wasn't very good. We audited their install to make sure it looked OK, and we provided training so that they could actually launch DuraCloud Europe, which they're doing this spring, which is really exciting. So the whole goal here is to create international partnerships that will enable organizations to deploy DuraCloud, offer those services, make them more accessible in other jurisdictions with support in more languages, make our products internationally supported in as many ways as possible. And we're also in discussions with a number of other organizations in other jurisdictions. So it led to a lot of questions. As we're building up this community internationally, people were asking us, are we flexible? Can we change the dependencies of DuraCloud? So right now, DuraCloud has dependencies related to AWS. Some jurisdictions are uncomfortable with using AWS, so we were asked about that. We were also asked what our obligations were under US law if data stored through our software or services was requested by the US government. And so it just underlined the need for us to be better informed, and it kick-started this process of gathering information. So to begin with, and these inquiries came this fall, so to begin with, we spoke with our general counsel. We talked to our hosting provider, AWS, my colleague Bill Brennan was at the Amazon Web Services conference I think in November. He goes every year, so he was talking to them about it. They directed him to some policy documentation, that kind of thing. And Deborah asked around, who has access to resources? People I can ask questions to, specialized counsel and the like. And that was the most fruitful avenue for information gathering. Because we ended up connecting with these guys. So these are, I guess the second group of attorneys that we consulted with, and we spoke specifically with Joe McClendon, and he focuses on commercial and technology transactions. He was in IT for over a decade. He's a young guy, super smart and very knowledgeable. And we also spoke with Matt Lee, and he advises on compliance issues related to the Patriot Act and many, many other things. We had a fruitful discussion. Some of the things, they just pointed us to other resources and they helped us clarify our own thoughts around what kind of information we need to get and what we need to put out publicly. To just make people more aware and more comfortable. So we learned that there is obviously a global trend shifting to greater transparency. And we all have heard about GDPR. It's the biggest change in privacy and data protection in 20 years. The compliance deadline is coming up fast. It's in about five weeks. And so there are a lot of resources being allocated to making sure that European organizations and organizations doing business in Europe are compliant. We also were directed toward the policy documentation from large US cloud based storage providers. Just saying that anytime that they are provided with an inquiry, whether it's from the federal government or otherwise, it has to be valid. And they're never going to disclose customer information unless there is a valid court ordered subpoena. And so we're gonna get into that because there's a lot of really timely court cases happening right now. But before I get to that, I'll just say that the US Department of Justice, they say that they're entitled to data that hosting providers store in other jurisdictions. So for example, if AWS has a Canadian data center, the DOJ can issue a warrant for information if there's an ongoing criminal investigation. And then if a search warrant isn't going to work, there's this other method, which is called a Mutual Legal Assistance Tree or an MLAT. And it's essentially an agreement signed between one or two or more countries that just creates obligations under international law for governments to assist one another in criminal investigations. So it means that a warrant or subpoena may not be necessary. The MLAT process takes a lot longer. It's much more on diplomatic channels from what I understand. But it is a way to get around needing to issue a valid search warrant. Who knows about the case between the DOJ and Microsoft about data and hosted in Ireland, anyone? Okay. So this was really interesting and I really wasn't aware of this case until these questions were raised and the attorneys directed us in this direction. So for background, Microsoft is also a US-based cloud storage provider and they have 100 data centers in 40 countries and they were the first American company to challenge a domestic search warrant. Seeking data held outside of the United States. It was held in Dublin. So essentially what happened was there was a person who was residing in Ireland. There was information I think in his email account. It was stored on an Irish server owned by Microsoft. And then after that happened, there was a criminal investigation that began in the United States and the United States wanted access to his email content. So Microsoft says that they challenged this search warrant on two kind of levels. The fact that there was concern over the US government could access data without taking into account the privacy and data protection regulations in place in the country where the data actually resides. And of course, concern that their customers would go elsewhere if they felt like they had no avenue to protect their customers' data. And so because of that, they had a lot of support from other US-based providers like AWS, like Google. And actually in 2016, a New York court sided with Microsoft in this case, but it was appealed, it went to the Supreme Court. The justices heard arguments in February and we were expecting an actual result in June, but it might not happen. So federal prosecutors actually asked that the case with Microsoft be dismissed because just a couple of weeks ago, I think it was three or four weeks ago, the Cloud Act came to be. And the Cloud Act, it stands for clarifying overseas use of data act and it clarifies that the DOJ can issue warrants for data stored in other jurisdictions and it clarifies that companies can object if it conflicts with foreign law. So it provides an avenue for those conflicts to take place. So that was pretty recent. The privacy debate is developing and it underlined the need for us to stay abreast of these developments, to gather information and share resources if possible. And we definitely need to formalize our thinking around efforts to build the DuraCloud community. So when we first started this initiative, we didn't know these things and so we need to pivot a little bit to include information about privacy and data protection and the needs of potential partners or existing partners in those initiatives. And so we do want to develop public facing documentation that shows how we'll handle data requests. However unlikely that might be. When we were talking to General Counsel, they said, this is super unlikely. Are you sure you wanna go down the rabbit hole? And we said, yeah, we've had the question posed to us and it being unlikely, we didn't feel was an adequate response. We also wanna develop a privacy and data protection policy that outlines our obligations and recommends best practices. Where we are partnered with ForScience in Italy for DuraCloud, they're already working toward compliance for GDPR and they actually have knowledge and expertise that they can share with us in our initiatives. We've gathered some information that we can share with them and I feel like developing this policy documentation is going to be a collaborative effort or it would be better if it were a collaborative effort. And we're gonna continue the outreach but we wanna make sure that we're airing on the side of transparencies. We wanna be pro-consumer, we wanna be pro-user. And so this is where I'm interested in hearing about the experiences that you guys have had. This is a very specific kind of case for us where DuraCloud is a very specific application and we're talking to very specific organizations but our experience in this field is quite narrow. And so I'm interested in your experiences rather dealing with this or trying to deal with this. Have you had difficulty getting information? And the other facets, I know that there are, there's an entire landscape that I probably didn't even touch on here. So I wonder, what are the other areas that are worthwhile of discussing? And what's not up on the slide here is where do we discuss this? I could post this information to the DuraCloud listserv but would that be effective? How are we going to actually discuss this in a meaningful way that we can share resources so costs are lower, things like that. So with that, I'll open the floor. Thank you. All right, thanks for coming guys. Thanks.