 Thank you all for joining this session. My name is Daniella. If we have not met before, I have a couple of different roles. I'm at California Digital Library. I do the product for Dryad. I'm the PI for the Make Data Account Initiative, and I am now very proud to say as a working group lead for the Research Data Publishing Ethics Working Group held by FORCE and COPE. And I'm excited today to walk through what this work is and be able to reach hopefully a new audience here of all of you at institutions. So anyways, Research Data Publishing Ethics could be a lot of things. There's a lot of things around data ethics and data governments. So I want to be very clear of what we're referring to today. There are a lot of issues that can come up at institutions holding data. Think of all the things that have happened as ransom attacks and other. This is not that. So what we're referring to here is that with the broad open data movement and how much data is now being published, there are ethical challenges that are coming up with these data that are openly available published on the web. Here's some images from retractions and others that have happened because of publicly available data. And so thinking about our roles as institutions, as data publishers, as data owners, the onus is on us to be as responsible as possible when we're publishing data. And I think as a data community, we focused a lot on being responsible in terms of making data usable, thinking about fair, making sure data is preserved and archived with best practices like core trust seal, but as far as being responsible publishers in terms of research integrity and in terms of the ethics of publishing to date, we did not have guidance around this. So with these in mind, we got together a group of folks during the pandemic to think about what would it look like if we had a, what we were really short handing as a cope for data. Can I get a raise of hands? People here have heard the phrase cope. Wow, great. Okay. Makes it easier. So for anyone who doesn't know, COPE is the Committee on Publication Ethics. They have always had this broad guidance in the community for journal publishers on what to do when publishing ethics arise, but we have not had this for data. And so we were really thinking, what would practices and guidelines for repositories and publishers and institutions look like if we could get this together? And so we brought together folks from disciplinary repositories, federal and other, and general repositories, institutions, research integrity officers, a small group of us to think, what is it that we really need? And we decided that we really needed a recommendation guide. And so we put together a straw man proposal and then launched it through force. And a lot of people questioned, why force? There's so many communities we could have gone through. And the real reason for this is that these publishing ethics guidelines have to be broadly interoperable through the stakeholders. So we couldn't just do it through RDA. We couldn't just do it through the libraries. It had to be clear that everyone who could possibly touch a published data set is involved. And so the goal was always really to make this as broad a group as possible, but also for COPE to come in and endorse and own this as well. And so we are very lucky that that is what happened. But we got together a group of over 70 people. I think there's one or two of you in the audience that are in the working group. So thank you. And we started to think through, what is it that we want? So we started anonymizing every case we could think of that's come up around this. And it became clear that there was four categories of cases that could come up. But really what we had to do before we started working is make sure that we had the right stakeholders in the room. It couldn't just be repositories and publishers. We need to make sure we also had research integrity officers, technical folks, library folks, researchers themselves that were weighing in on these conversations to make sure that we weren't building recommendations about a stakeholder that wasn't there. And also for repositories to make sure it was broad enough that we weren't just building for a well-resourced general repository or just a disciplinary thinking through. So we were very lucky to have many institutional repositories there as well. And lastly, kind of on this working group in general, we had to be very clear in scope to be able to get anything done. There are so many things in the data world we could tackle. And sometimes we felt like we got pulled, data citation, data policies, all of these things. But for us to be as efficient and move as quickly as we did in six months, it had to be so focused on specifically the research integrity and ethical responsibility of being a data publisher. And so we spent about four months with working group members joining a call once a month, splitting into four categories. And those categories being risk to participants and to external communities, rigor, so the validity of the science, authorship and contribution, and legal. And we worked through the trickiest things that we could and came up with this general recommendation guide. And we were very proud to have the COPE trustees endorse this, take this on as ownership, as well as the force board. And shortly after posting it, we received endorsement as well from RDA and ISMTE. So we're slowly building to have this become a normalized accepted community set of practices. And so, you know, what does this actually mean? And here's a screenshot of it posted on Zinodo. So it's all publicly available in the documents. But I'm just going to go through briefly what the categories are and what this kind of looks like. So the first is authorship. And this one seems to be the most straightforward, but also might be the one that we see the most. If we think about repositories publishing data, we do not actually write to all co-authors and ask them if they agree to publish. This is something that's common in journal publishing, but in repositories, especially those that don't have curation, it's just submit and posted, right? Like a preprint. So there are a lot of issues that we could see here, where when something is published, we could have these examples where a grad student was admitted from the list and writes in to cease to author, name change, all these different things. How do we handle it? Obviously, we could just go in and change it. But what's our responsibility of making these changes as a published archived piece of work? Going from maybe the most simple to the most complicated, legal and regulatory, of course, being the most difficult, also because it was the hardest to find expertise in the working group around this, having lawyers that could understand and contribute to these conversations. And the thing that makes legal so tricky is that legal really can come down to where does the author live, what institution owns the data in what country, and what country published the data. So there's so many different avenues of where we could see these issues come up, and we broke them down by license, policy, and regulation. With some examples here, and you'll see on the documents, many, many examples that have come up that we know have already happened around this. And so I think this is one especially that is especially important for institutions and all of us that are here at CNI as owners, as the stewards of the data that's being made public. What happens if there is a legal dispute with that? Next, we have risk. So risk initially, as we were talking about it, we thought there's going to be five categories. There'll be risk to participants, and then there'll be risk to communities. Participants being human information, clinical data, things like that. Risk to the community and society could be other things, like historical sites, to society's information about national security. Things here, the risk of what if this information is public, and it should not be, and what happens when it's on the internet and broadly made available. And so these might be the most obvious in the data world. Anyone who's in a repository here has probably unfortunately seen a data set that has human information in it, and this is something that of course is really important for curation to catch ahead of a data set being published, but not all repositories have that privilege. So what is our role once this kind of a slip up does happen? And then lastly is rigor. And rigor refers to any issue around the validity or trust in the science published. In these cases, we've often found our tied to a journal article where something comes up here during peer review or post review. But I think this is a really interesting one to think about from the data perspective because whose role and responsibility is it to be looking at the validity of the science. Curation at a data repository is looking at is the data usable? Does it have enough metadata? Is it something that should go online because it can be reused and indexed? Review is what could find some of these issues. And a repository is putting that data public, but what happens years later if someone finds out that the data were falsified, that the data have gaps, that the data were derived from images at an article that were them falsified. So the chain can really come down a lot for rigor. Okay, so I'm going to use rigor as an example because I think it's a really interesting one. And I want to walk through kind of what were the questions and topic areas that we were thinking about while working through these recommendations. So the first is what actions should be taken if the data set is not yet published and who needs to be involved in that decision? Thinking about rigor specifically. For this one we're saying, you know, ask the author for updated files. There's gaps in this data. We heard because the journal contacted us. Can you give us new files? If they say no, the repository should have the onus to say we will not publish this. If there's issues about the integrity and that has become clear, the repository should also make sure they contact the institution. We need to be more responsible to push these issues to the research integrity officers that are at every single institution here. For a publisher as well, they should be thinking about what their role is if it's found in peer review and how they're able to find where the data are and get that repository and institution if necessary. So again, this is broken apart. The data is not yet published. But what happens when the data is published and who needs to be involved? Well, the same as before, but the data set is now a published public record. It can be anywhere. So if the repository can't get updated files from the author, the repository should take a big next step, which is to put a notice on the data set. And this is something that we haven't seen really before this. And as a note, when we were discussing this, we found that there's so many parallels between data and article publishing. But one big difference is that we didn't feel that an expression of concern or attraction was right terminology for a data set. So really thinking about what is the harm versus what is it you would want to flag to future users of that data if there was a rigor concern about it. It's also a great place to point to a published article that could have that expression of concern. And this is a great example as well, because publishers do have cope flow charts that walk through exactly what to do in the situation if there's validity being called into question, but repositories had not. So who does it need to be reported to? Well, the big thing is how do you let the public know in a way that's not inflammatory? So we could put a big flashy thing on a repository, but why would we do that if we didn't want to call attention to it? Think especially about something like risk. If a repository has published human information and we put up a notice on it saying we've taken this down because there was human information. One can go to the way back machine, find it, mirror it at another archive, and get it. So you really want to think about what is the perception of how it's being reported and how people could find it and really minimizing any risk while we're doing this while still making sure that researchers who are going to go reuse this data have the proper flags to understand what it is. And this is, again, another big difference between publishing and data publishing. Finding data up there that are flawed or have a gap or other are probably important to still be public. These data are things that people should still be using when doing analyses and trying to understand trends that are happening and what people are reporting on. Polling it is something that we should only be doing to data records when it is harmful to society to have it be public. So this is something that I think is a little new in repository thinking, yeah, it's so nimble, it's so easy, we can take down files, we can do this. We shouldn't be taking down data just because we can put in new files. We should really be thinking of data sets as published records and making it clear publicly when there's been a change, update it in the metadata and do it safely. That was kind of a mix of that. But the big thing, too, as well is what happens if we get unresponsiveness? What if the author never comes back to us? Do we just sit there and just hold it? No, we should move forward and be responsible for saying, we are taking action. We follow these best practices and this action is to put up a notice and this data set. So again, specific to rigor. If it was authorship, we probably would not put a notice up. But for something like if we found out that there was gaps in the data and the authors are unresponsive, always try the corresponding author. Try for all authors, just like as Cope would say. And if no response, take the action we need and always escalate to the institution. Of course, the institution cannot do anything in a research integrity office. But we should start thinking as a complete cycle of being as a data publisher is responding fully to these events. So that's kind of, it's hard to do a presentation because these documents are 40 pages long, but kind of a preview of what we went through for each of these categories that you can go through and look at and try and understand and see the examples. We're very proud to be able to have this, to be able to iterate on it and see how people respond to it, what makes sense, what do we need to change? How do new cases arise and how do we respond to those? But there's also more investigation needed that came up just by doing this work. And one is the range of resources available at repositories. We know not all repositories have curation staff, have the right staff for making changes as necessary. So we had to think about what's the minimal thing that we should all do as responsible data publishers and then work up from there. And a question that came up about this in the past was, how do we resource this? And I think the answer to that is this has to become the norm. In the minimal resourcing at a repository and at publishers that say they focus on data, we have to be incorporating these best practices as responsible publishers. The next is complex legal cases. Like I said, it's real tricky. These cases have come up and we have been stuck even with recommendations. We received, we heard about an anonymous case recently where the data were being asked to be taken down because the country decided they no longer will allow open data. So what do you do about a retrospective data set like that? Hold the water. Public speaking again. And so we need to start thinking about what are resources that we can start sharing, anonymized and be able to pass around at repositories who are facing this. So a case comes up. Let's think about how can we make that something that other repositories can use in the future? Something like that requires a lot of resource, being able to have legal counsel and other. So how much could we think about making these interoperable guidance around that? Another common terminology. What do we even call it when we put a notice on a data set? If we're not going to call it an expression of concern or a retraction, what is it? What is that notice? How does that become a normalized term that publishers understand and that repositories understand and a researcher knows if they see it up there, it's something they should pay attention to. So there's a lot of these terminology things that we need to keep working through. And then lastly is communication between publishers and repositories. Of course, I can speak as being at dry out. That's something that we've always been lucky to have, being connected to journal articles. But what about an institutional repository that doesn't have those connections? What is it that we can do for a repository to understand where related articles based on that data may be if we find out post-publication that all these articles are based on falsified data here? And vice versa. How do publishers know to always go find the repository where that underlying data is? They should have that with the data statement. But how do they know who to get in contact with? How can they raise this? What's that escalation pathway on their side as well? And so that's to say there's a preview of a lot of things that need to happen. But these are just some ideas of things that really came up once you get all the people in the room thinking about it. I want to share, since it's just me, two perspectives from the working group. So the first is a perspective of a research integrity officer. So Yerg is at Berkeley Lab. He's in the group. And he's saying that the reason that they're a part of this is because it has to be a part of the institutional stewardship of the data. And we said to Yerg when thinking about this, what do you think is the biggest risk if research data publishing ethics is not a priority at institutions? And he said that will be the downfall of the people trusting and seeing the status of an institution if it becomes clear that the institution does not take these seriously. And we know that publishing practices are crazy across the board. We know data sets are coming in from all over different standards. So what is the institutional role? The institutional role is to have better training and to respond at the integrity office level. So again, not library, but from that perspective, if it gets raised, it should have the same repercussions and thought as if it was being raised by an article. Another perspective from the research integrity folks at PLOS. And Renee was saying that a big issue here is just understanding that this is part of a data policy. So if PLOS is gonna say, we have a data policy, we support open data best practices. This needs to be a part of it because they're actually part of this process now. The data going to the repository doesn't just mean that it's repository land. The institution owns it, we're all still a part of this process. But we have to start connecting the pieces more. So if journals hear about it and they're gonna go post an expression of concern, not all times does that mean that the data have an issue. And not all times there's an issue with data, does it mean a related article is an issue? But there should always be that investigation and then that communication should be made. And I think that's a big switch that we're gonna start to see as we start to connect all the stakeholders a bit more around this. So what are the next steps? We spent six months getting together the recommendations. Really important that we start to just get more awareness of it. That they start to become things that libraries, institutional repositories, all are looking at using, thinking about building trainings on, etc. But we also know that we have to make it easy. So right now, we are focused on doing cope style flow charts for each category, so really training style. Start with a case, follow it through. What are the steps that need to be taken? Who do you need to speak to, etc. And we also need suggested policy text. So this is an interesting one that we're thinking about is what would a policy text look like at a repository, at a publisher, and at an institution? For saying this is a priority for us, we abide by this, we will take action on this. And not having it be so specific to what the case may be. But also leaning, having this policy text in terms of service, around the licensing, and all this more information so that we can be upfront and honest. And then lean on that policy to try and simplify these cases as they come back. And lastly, I put your thoughts here, the working group is growing. It would be great to have more institutional uptake in this. Please join if you're interested or let me know after this. But we're really interested in hearing what else would be helpful? What have we missed? What is something that we should consider going forward to make these as useful as possible? And so with that, I know I wanted to race through, because I'm hoping we can just talk about it. Questions, please? Microphone, that'd be great. Also, if you don't have questions, I'll take comments that look like questions, just like talk. Hi, Joan Lippincott, C&I Emeritus. I'm curious if you could tell us a little bit about the landscape. So there are so many types of repositories from multi-institutional to sponsored by some governmental group, perhaps, or a society to small colleges that have developed them. Can you give us an idea, if you say from one to ten, of where repositories are in these overall recommendations, are most of them, say, at the one to two level or some of them at the nine to ten, in terms of having implemented these kinds of statements and policies? Can you just give us an idea of what it's like out there? Thanks. Yeah, it's a good question that we actually did not look into at the beginning, because we didn't know what it is that we were looking for. And so this question that you raised just came up at our last meeting thinking about policy. We thought, okay, let's go look at the whole landscape and let's grab policy text from everywhere and then see the similarities and put it together and then we realized but the policy that's out there right now isn't referring to these things. So what is it that we're actually looking for? And so I don't have an answer for where people are, but I think that most folks and why we got such broad capacity in the working group is that most of us had seen the cases and said, what do we do? Hi, Daniela. Thank you so much for this. This was wonderful. Just a comment and then a question. My comment is just really thinking about, I agree, data publishing is a really good analogy, but I think I wanted to also acknowledge a lot of this comes out of the archival world as well, a lot of the best practices that we apply to our digital archives. So also looking to that community kind of bring in some of those best practices and existing procedures for some of these challenges. So that's that. And then my question, I don't have it well formed yet, but I'm really intrigued by this issue of how do we as data stewards, data owners, help, you know, when our data is taken and placed in another repository, use the Internet Archive example, the Wayback Machine. I'm also thinking of harvesters like CORE that has actually harvested all of my repository. And that's a challenge for us when we do actually have to take down a dissertation, for example. It doesn't get reflected in those other resources. You know, we want to make all this work open, but there is a licensing issue here when this data is aggregated elsewhere. So any thoughts on that of ways that we can communicate? We want this work to be open and used, but we also want to have control and stewardship over it for these following reasons. It's absolutely a great question and point, which is that if we're making data available through CC0 waiver of license, the data could be anywhere. We could have it mirrored on 10 different repositories, and we don't need to know where that is. So when something does happen, how would we even let people know? And I think it's something that we have not thought through as a community yet, and there isn't an answer right now. But what is the responsibility if we post at the original citation with a notice on it? Or what if we have to remove a file? How are we supposed to find out who else is posting that original file that shouldn't be there? And so I think it's a big open question that we need to begin thinking through. So again, if you can join the working group, we can start thinking through it. But this is a great example. What do we do? I don't know. I thank you. Tom Kramer from Stanford University. I think you just answered my question with a no. But I will ask the question anyway. Did the working group come across any merging best practices or established best practices with regard to data usage agreements and specifically registering people who are using the data in case follow-up or any sort of contact after the fact is appropriate or could be desirable? We did not cover data that needed restricted access. We only covered data that was made publicly available. But it is an interesting thought to think about how much of the recommendations could be applied to something where it is a part of the data use agreement and this could be upfront in the language about what would happen if you found out that the data were being misused later. Hi, Danielle. Congratulations on the work. Thank you. Fantastic work. Two things. The first is with regard to retractions, NISO just launched a project called COREC, which is now getting underway. The working groups will start work in January with an idea of looking. It's a follow-on to work done by Jody Schneider at the University of Illinois to develop, as you said, some terminology around retractions, which is sorely needed not only in the publication world, but also as you pointed out in the data world. It's also looking at ways the infrastructure to support notification that there is an issue of some concern about a scholarly object. That could be a paper. It could be a data set. And how do we connect to that entire ecosystem? Because there might be a paper that is suspect that is associated with the data set or vice versa. And we're exploring all of that. There's also work at the notify project of connecting different repositories to send JSON notices between different repositories. That's also some really interesting work that's probably worth highlighting here, too. Yeah, that's a great reminder that Jody Schneider, someone who's in the working group, has been leading up a lot of efforts that a few folks are interested in kind of the life cycle of retractions and what the ethical issues around retractions and other her work around this funded by the Sloan Foundation is very interesting. And it sounds like it's continuing through NISO. Any other? Still got 10 minutes. Any other comments, thoughts, things that are interesting or not interesting to you about these? Hi, I was thinking about this in terms of the resourcing needed, especially at an institution level and how we as library publishers or data publishers or as repositories, the resourcing that's required for this and kind of the gaps between what we have available to us and how much this additional burden this puts on resourcing and the just ethical considerations of having repositories without these types of standards. And I was just wondering if within the working group there was any discussion about resourcing itself, you know, not just at institutions, but in general, like as you develop these, what is the resourcing needs and kind of the gaps there? Yeah, it's a great question. I think we need to think about what it is that we would say a policy should always include to then understand what the resourcing looks like to enforce that policy. And so it'll be really great to have different institution, disciplinary generalists and publishers and research, the research integrity officers in the room going through, well, we tried it here and this is what it took for us. We tried it here, this is what it looked like for us. And it'll be especially interesting to kind of compare what that looks like at scale. So we have folks, NIH repositories, NSF repositories in the conversation, but we also have smaller institutional repositories. Is it possible that actually it's similar resourcing or what is it dependent on? Is it scale? It's not just how many you're publishing because you're really thinking about what are the issues that could arise. So it's probably gonna be a lot about curation and what that resource looks like, how much checks are happening pre versus post. Of course, a lot of these not being able to be caught like rigor and legal. Hi, Lisa Hincheliff at the University of Illinois, Durbana-Champaign. It's really great to see this work very sorely needed. I think one of the questions I would have for you is obviously there's still work to even get the documents rolled out and the like, but are you anticipating some sort of like pledge? Like we as a repository pledge that we are following these guidelines, are you? Because I mean, imagine in the best possible world then you have researchers and maybe grant agencies saying, we expect you not to just deposit your data, but to deposit the data in a place that has blah, blah, blah, preservation, blah, blah, blah, and is a member of or some sort. So is there, can you, not the individual repository policy, but like the bigger policy apparatus that you're moving towards? Yeah, that's a really good, really good thought. It feels hard to answer right now because we released this all in September and really right now is how feasible is this for folks to build workflows around it and what does it look like to start to build more connections between publishers and repositories that never had those connections before? And so making a larger pledge, I guess, is really what we're trying to talk about in the policy per repository, which is we've been going back and forth. Would a policy suggested text be three pages or to be two sentences? And the two sentences say, we take on as our responsibility to be responsible publishers and follow this guidance. What does that look like and how specific does it need to be? But I think it's going forward. I would love if every conversation about fair included this. And every time people talked about data and publishing it or archiving it, that the ethical and research integrity things around it were involved. So maybe we can start now and start getting there, but it's a great point to be able to start that trajectory. Hi, this is, it is an interesting topic. But when, so when you put, and I don't probably gonna sound like a stick in the mud, but when you create a repository or an archive and then maybe you have funding for it and interesting parties, but then as times evolve in some amount of time, maybe you lose the funding or you lose the leadership, what happens to the data? Which is the big question of what's the sustainability of repositories and published information. And I would go back to what Lisa just said, which is it's just gotta be part of that. It's gotta be that if you are taking on the publishing of and making information available that whatever happens after your organization, you will still be responsible for being the publisher, publisher being broad term of making that information available. And it's yeah, what does that look like in practice? I don't know, but we all say as repositories anyways that we're stewards of the data going forward, this needs to be part of stewardship. Last minutes, any last comments, questions? Craig Van Dyke from Clocks. This is kind of obvious, I guess, but there's tons of research data and as a preservation service, one question among many is what is the line above which data should be preserved long-term in a system like Clocks? And what is below that line? And how do we agree on where that line is at the moment? I don't think there's any understanding about that. Yep, these are all larger questions around being the data stewards and making it responsible. But I think it's if we can all remember what the point of why we're making information open in the first place and then what that risk could be. I think a big one we were talking about when we started all these conversations was what if as a repository it goes through curation, the data looks great, it's clean, there's tons of metadata and then we find out that it was climate denier data and actually it's perfectly good to them but you don't find out until the community went and did review on it after the related article was published. That seems like the one that really sticks for me is what is our responsibility for making that information open? So just, I guess just pushing more on if we're gonna make data available and reusable. What are we gonna do if we find out that it's data that has some issues? If that's it, we can end earlier. I know everyone wants to socialize. So thanks so much and hope to chat with you now.