 Hello everybody. Welcome to the webinar number four, the second webinar of our second module on the fair data principles. My name is Liz Stokes. I'm from the ARDC skills team, and I'm going to talk to you about what the fair data principles have to say, beyond protocols and into what repositories can do to make research data accessible to their users. I would like to acknowledge the traditional owners of the land on which I'm standing today, which is the Gadigal people of the Eora Nation. These sovereignty has not been ceded to these people, and I pay my respects to the traditional owners, both past and present and welcome any First Nations people who are joining us today. So let's get straight into it. So little bit of front matter, perhaps, maybe I'll call it. There's a link down the bottom there to the code of conduct. Please have a read of that and let us know if you have any issues. You are more than welcome to put questions or comments in the chat modules in go to go to webinar here today as I'm talking, but I won't respond really to them until the end question and answer time. But if you word it such that it is urgent, then one of my awesome ARDC team will no doubt respond promptly. So, and of course, I encourage you to take on any thoughts that you have and share them in our Slack channel this afternoon after this webinar. Okay, I accidentally moved on. Great. Well, let's keep going then. So the overview for today is that we will. I'm going to look at recapping over repositories role in enabling fair data. Look at some examples about how different repositories mediate open and closed access to data, and then we'll have a little Q&A session and finish up with a any questions you might have around the activities, quiz and community discussions for this coming module. Okay. Right here. Okay, so the fair principle that I'm going to be covering today is really this this one this a to metadata that metadata are accessible even when the data are no longer available. This is actually the principle for this is really the backstop for repositories data repositories that we know and love. It cycles back to the undertaking of the other principles under findable interoperable and reusable that repositories do. But today I'm going to be concentrating on access to data via those repositories. So I would like to put a little disambiguation here. Okay, in that for the fair principles accessible means access to the data. It's not necessarily about the web content accessibility guidelines, although that is certainly part of best practice in facilitating access to anything on the internet. But it's really more about who can access what data under what conditions and as every repository and their use cases can be quite different from each other. There is no one standard to manage all of this. Okay. Well, let's get into some practical answering some practical questions. So what can repositories do to do to enable fair data? So you recall I shared a few examples. Looks like the slides not working. That's interesting. So you'll recall as I was that a few repositories that I shared, for example, Zenodo and the Australian data archive, which featured a nice little introduction in our Slack channel last week. Thank you. Thank you to Marina. I think we'll I guess we'll just wait for that slide to keep on loading or I might move forward into that. Often, often we are when we're looking at what repositories can do to enable fair data. We're often looking for examples of best practice and what are the exemplary data repositories out there. So one reason why I chose some of those repositories such as the Australian data archive and the ICPSR, the social science data archive hosted by the University of Michigan, that both of those repositories are benchmarked have gone through a certification process for the core trust seal. The idea of benchmarking against trusted data repository requirements is certainly one reason that you might you might go into enabling fair data by pursuing that certification process. It's it's not necessarily easy and it certainly takes a certain amount of time, but my colleagues at ARDC have assisted a few people in going there. I think as I was saying, I was going to mention benchmarking against the trusted data repository requirements. Hey, it's working now. Great. Okay. I've included a link down there to the core trust seal. You can I encourage you to follow that link and have a look around because it's a nice little map, which takes you to actually the physical address that's been look that's been registered against each repository. It's a nice way of understanding Australian repositories and where they're based. But, okay, I'm just going to going to leave that there, not go deep into that certification process. So another thing that repositories can do in order to facilitate access is to implement a mechanism for authorization and authentication. Now, two multi-celebic words starting with a what does this really mean? So, for example, on to slide 11 is the ALA, the Atlas of Living Australia. Okay, as you can see here in this slide, they have provided a range of different ways that you can what raise of different ways for authentication. So to authenticate as a user, you can either sign in with the Australian Access Federation. You can choose or you can choose more social media and corporate type accounts with Google, Facebook and Twitter there. Or you can create your own account. So they also provide a way of a username and password account there. But the point I'm making is that this is all for facilitating the authentication, which is what the machines take care of in terms of our fair data principles humans and machines working on the same data together. For authorization, on the other hand, this is something for humans to decide. So I'm going to come back to a later discussion of mediating data in that way. Another thing that repositories can do is to expose the data and metadata with a well documented API. Who remembers what API stands for from Matias' lecture on Monday? I welcome your answers in the chat there. I'm going to show you an example from the Syro Data Access Portal. This is a screen share, but if you take your cursor up to the top right hand corner where it says API next to Help there, you'll see some pretty splendid and thorough documentation on how the API works and how you might get automated access to the data that Syro provides. This is really where I wanted to start talking about these different methods of facilitating access to data. As I mentioned before, a lot of this comes down to what people need to consider in terms of understanding the needs of their researchers and the researchers. And the people providing access to that data. So for example, mediation is really all about respecting the wishes of the data generators and content owners or people who are responsible for providing that data. But it's also about navigating any legal frameworks that we might operate in, which sometimes tend to value the individual's rights over their intellectual property. And of course, there are different cases where a researcher or certainly a repository manager may wish to provide more or higher security to data and to restrict access to it. But the people who had given their data or provided that may want more openness about that data. So we could go to an oral history example where the people sharing data for perhaps a certain community, they may actually want to be named even if they are discussing something that is quite private. OK, so the mediation that occurs between the repository managers and the researchers and the people providing that is, well, how many people is OK for us to share this data with? OK, another example on the other hand might be thinking about medical data and access to that. So people might never actually want to be necessarily identified, but they may be very happy for that information to go at wide and be shared with relevant researchers and other research groups to progress advances in medicine and combating things like a pandemic, for example. So and I suppose it's also an interesting point to note, just thinking back to how LA provides access through social media platforms. So not necessarily only through AIF, but social media as a as a thing that has happened. I do apologise for my incoherence right now. The way that social media platforms restrict and enable information to go to people in your network and to advertisers is also actually an example of mediation. One that perhaps we have already signed on to in theory, if not necessarily having having read all the details of the of that thing that you need to read to sign that you accept and agree those terms and conditions. But I'm moving away now into analogy territory, so I'll just pull it back a little coming back. So, for example, another concern that people might have is that the data might not necessarily be digital as well. So it's I suppose many of the researchers on on board here might be familiar with needing to organise paper forms for having a discussion with their research participants about consent for collecting the data and what might happen or what might be done with that data after they have collected it. So and this negotiation over access to long term consent. It's not uncommon to be in paper and and certainly that's something to take into account for repository managers who may be concentrating largely on having a repository that is for digital objects only. So it's there, there's potential to branch out into physical holdings as well. We could also look at commercially sensitive data and where decisions need to happen in terms of controlling the bounds of who might access that data. This kind of mediation might happen via legal instruments or providing a memorandum of understanding between different partners. So it's really all about ensuring that there is clarity or what the mechanism is to enter into negotiations for how to access that data. Or for example, if we're talking in the commercial sector, we might actually be talking about data science initiatives. So that includes. Sorry, that was my daughter. So that includes access to perhaps the software and any code or algorithms or pipelines and workflows. Awesome. Finish that sentence. And of course, so maybe the collaborative research centers where a university department or FAPI might have organized have a partnership with a commercial organizing that so they might have federated federated agreements to share their data. So these are only a few examples of that, but some of those. As you can appreciate, some of that data might need to always be closed. And it's really about having that clarity about what data or rather what metadata is available. So, so that people have have a record of that. Okay. So, so deciding on on the access can be. So that was all sort of pretty heavy. Actually, I'd like to say, and they're all this, you know, it's a veritable minefield when you're talking about access to sensitive data and what you can enable to be open or closed. Hang on a moment. I just need to be talking right now. Thank you. So, I wanted to highlight the coalition of publishing in the Earth Space and Environmental Sciences, which is what the COPDES acronym in that slide is. How they decided to publish an agreement in 2014 about what they would do in order to facilitate the fair data principles. Okay, so on to slide 17. So, part of that is so they wrote a commitment statement and they encouraged individuals and institutions at all kinds of organizations to sign on as signatories to this commitment statement. Among those signatories, there are researchers, publishers, societies, institutes, infrastructure providers and repositories as well, all coming together as a community to implement these principles. And of course, if you're in the earth and space sciences, you could sign on to this too. So I'll just put that out there. In case you were looking for something to do after the webinar. So now it's time for me to move on to the this final. Okay, let's get back to the metadata. Okay. So, making the metadata available, even when the data are no longer available. So, what does this mean? Curating metadata indefinitely, which is ultimately what I suppose we're expecting our data repositories to do requires quite a lot of effort. So, so it's actually quite important for us to consider the the end goal or what might happen if, for example, the metadata are to be moved. Okay, maybe your repository is changing platforms or infrastructure so you need to migrate your metadata and your data and make sure they stay together. Or perhaps the project closes for which the repository has been created, or it could be, I don't know, like maybe even universities that have been around for centuries, but they, they may need to close. So, so developing an exit strategy for how the metadata will be available even if the data needs to be moved is an important thing. So moving on to other reasons why the data would no longer be available is that other file formats or standards may actually change. These examples that I have up on this slide are all kind of likely activities or things that could that may happen. So the published data itself maybe have been withdrawn or retracted. The creators might have moved on, they may have changed institutions, for example, or the research project was a giant con, for example, or rather it closed. Sorry, I don't mean to cast aspersions on our research community. So the research project concludes. Or, for example, maybe this this has happened to you, the government or other department changes its name. So then it's very hard to find that data where you thought it was in the labyrinthine structure of their website. So how does it help to have the metadata available so moving on from like all of the problems things that could go wrong. Here are a few examples of where it would be good to have access to that metadata. So it enables you to have contextual information to follow up. If you want to get in touch with the original data creators, you may want to find out what else this research data was related to. Other other related research outputs and to support meta analysis and citation so that we don't necessarily want to break the citation chain. So, especially if the data perhaps was retracted, at least you can look up the citation to that. And you have some kind of provenance trail for the work that went into that. And indeed, when I say that meta analysis is not really a pun, I know I love that and that helps me understand it. But having metadata available can enable that distant reading by doing analysis on the metadata that is available for a certain discipline subject or field. But also doing meta analysis, which is the field that I personally really only have a cursory understanding of. Although I appreciate that meta analyses are an excellent field of research methodology, I should probably say. I have a librarian background and can gloss over things moving right along. Okay, so if you would like to take access conditions further, I would encourage you to join the sensitive data community of practice, which is actually convened and looked after by our colleague, and we'll put these links in the slide chat. You can also continue to the discussion around access. Maybe you have some examples that you would like or thorny issues of moderating or mediating access to research data that you would like to discuss with your fellow participants. And also there's a link to the resources on the adc website. So we have things for managing sensitive data. There's a guide for publishing sensitive data and a flow chart for sharing sensitive data. So in summary, these are those the four fair data principles that we have covered in our accessible module. And ultimately, like to wrap all of this up, considering the access metadata access to metadata as well as data into the long term. The idea is that it should be archived long term and made available in such a way that it can be easily retrieved by humans and machines or be used locally with the help of standard communication protocols. So I guess it's time for a Q&A. I'll give you a few moments to ask some questions. Okay. Okay, so we do have some questions. Oh, and a lot of people did answer correctly what API stands for applications programming interface. Good memory guys. Okay, now. And also, some people commiserating with the intrusion by your daughter. For example, one person saying that their 18 month old is now all up to speed on fair data principles. Okay. And so we do actually do have a question. Can you think of an example of a closed repository? Oh, um, Um, yes, so, um, I can't, I personally can't necessarily think of any repositories that are fully closed because well, I've not seen them. But there certainly are a number of repositories that have closed data stored and you can't get access to that data. And I'm pretty sure, although I might have to be corrected here that the ADA Australian data archive is one of those. It does have open data sets, but it also does have closed data sets that are stored for archiving and can't necessarily be accessed by anybody else. Um, okay, a procedural quest. Oh, sorry, Liz, did you have more on that? No, I was just going to say that other, like another example of closed data, perhaps, I guess, do you, do you mean fully closed forever? Or do you mean closed until, until somebody asks for it. And I asked this thinking about medical data collections. Thinking about and government departments. So the collection of mortality and morbidity data across hospitals, which is collected and curated by data custodians, jurisdictional data custodians across usually organized in across the states and territories. Right. So there are, there are processes for applying to access that data. And those are generally governed by advisory boards and other ethics clearance. So that's, that's kind of my model for data that is, that is normally closed, but can be opened up on where it's appropriate if it falls under a research project. And in fact, there have been a lot of comments on this popping in while we've been speaking. So for example, somebody has shared that the Australian geospatial intelligence organization has a fully closed spatial data repository and given geospatial intelligence. I'm not entirely surprised by that. And we do have a clarifying note from from the from a representative of the ADA. They would store part of the data as closed where it maintains the complete record of the project. But generally they would only accept data where at least part is intended for sharing, whether that is mediated or open. So I think that actually brings up an interesting discussion point in and of itself where from a project, some of the data can be made open. Some of the data is mediated, but some will always remain closed. And I suspect in fact the for a piece of data that will always remain closed from my personal experience is say the names of participants in anonymous research. Sorry, in after the data has been anonymized, you still need to keep a list of the participants. But you can't share that list, but you could share the anonymized data. Now, another question, nothing to do with accessibility. Matthias has your beard grown since the last session. It has grown. I haven't cut my beard for quite some time. Same with my hair. The isolation life really. Who has time to go to the hairdresser. Other than that, we have no more questions. But certainly more compliments about how we've been able to do so well despite the work from home situation. So, I think we should probably leave it there Liz, did you have anything more to add? In fact, sorry, you have more slides. Let's keep going through them. Oh, do I not? Oh yes, it's the feedback slide. Thanks. So don't forget to share your feedback from today's webinar. Often, I don't think of the question I really want to ask until at least two and a half minutes after the speaker has finished and packed up. And I look forward to your discussions on the slack and chatting away next week. So thanks everybody.