 Marian is a senior policy advisor at Dan's, and she is really the right person you can ask your question to if you have doubts about fair data and repositories for data. Marian, thanks for your time today. I'm now making you present her in case you want to share your screen. Yes, thank you Ilaria. I do like to share my screen, so let's see what happens. It was working. I saw your presentation for like a second. Yeah, and it says that Katherina Barboza is now presenter. Okay, no, that shouldn't be the case, so I'm making you present her again. Okay. Okay, again. Okay, cool. I see now that Lucia would like to share her screen. But I'll just go on with yours. Okay. Okay, thank you, yeah. That was indeed my intention to present a very brief bit about what is already in the movie for those who haven't seen it. But first, thank you Ilaria for the introduction, and welcome to all who are here in this session on fair data in trustworthy repositories. So let me ask you, by way of introduction, we're a rather small group, so maybe you can add some input in the chat box. Can you please indicate if you ever used fair data or if you plan to use fair data? Yeah, please don't be shy and participate. So you can use the chat box for this. Okay, there's one yay. Crystal will. Okay, making data fair, yeah, in your publications. Okay, that's also a nice one. Maybe we can get back to that one. And indeed, not everyone is in a position to use data. If you're not an active researcher at the moment, maybe there's not a reason for you to do so. Who of you uses serious trustworthy digital repositories? It's very quiet in the chat box. Silence. I hope that someone of the participants had a look at the recordings of your movie. But I think that... Maybe it's good if I just recap a little bit from what I think is important. Okay. Open-air compliant repositories are being used. Yeah, certified repository use. Okay, that is all very relevant. Let me talk you through a few of the slides now. I'll make sure it won't be a full presentation, so stop me if I'm... It's tedious. So one thing I really would like to address is this slide. And I'll make a bit of space. So last summer, we saw the recommendations from the high level expert group on fair data. And you probably saw them too. And among the recommendations, there are two that are specifically addressing repositories. Recommendation 10 says that repositories should be encouraged to achieve the core trustee certification. And recommendation 29 says that repositories should publish assessments of the fairness of data sets. Now, I work at Dunst in the Netherlands, and one of the services we provide is a long-term digital repository. So of course, I'm biased when I read these recommendations. I like them very much. And I like especially the notion that repositories should not just stimulate that everyone delivers fair data to the repository, but also that the data sets that are already in our holdings are as far as possible and that anyone can also measure the level of fairness. So if you like, I can tell a bit more about that. Another slide I want to show you again is this one with images and information from the European Commission. In their guidelines on fair data management in the Horizon 2020 program, they mentioned that they prefer that you deposit data, documentation, metadata, code in certified repositories. And of course, ideally repositories which support open access when possible. So they already connect the notion of fair data and repositories. But actually, I think it should be like speaking about stakeholders or shareholders in data management. There is the research data lifecycle and in several stages of the lifecycle, it's good to think about making data fair and particularly in the preservation and in the sharing stages, repositories can help you to distribute and provide access to fair data. So basically that is my hope and my ambition in this area. So I think that should be it for now to recap the movie. Maybe Ilaria, you can share the questions that have been posed ahead of the session. Yeah, so just let me share my screen. So there were a few questions on Mentimeter to well survey research data usage and practices. So the first is the first question is, have you ever made use of data of others people? And well, the average answer is sometimes, but we only have three respondents, unfortunately. Then you see that the opinion from the public is that getting good quality data into repository is the responsibility of researchers who created the data. And I'm curious about this one. Is this the usual best practice? Or are there usually librarians or data support staff that are really helping researchers? I think it's efficient. So there are at least a good number of domain specific repositories that are very explicit in guiding and even training research and research communities in what good quality data means in their field. And then it is typically the responsibility of the researcher or the research team or maybe the data management working in the institute to make sure that the data set meets those criteria. For instance, when it comes to the correct amount of documentation or when it comes to making sure that sensitive data is properly guarded or properly anonymized, whatever fits best. So it is give and take. The researcher is responsible in general, but we positively sometimes provide a lot of information and support to help them. At the other hand, you have very broad and generic repositories. As Enoda was mentioned in the chat box, they typically don't have staff to deal with that kind of work. So then it's really left to the researchers and the scientists. And then there is sometimes even a real check on the quality of the data. So it differs widely. The main repositories are usually quite good and take part of the responsibility. But even then they assume that the response that the researchers and the research community itself, the people who generate, collect the data are best placed to do that. Thank you. That's a nice answer. We have other two questions in the chat box. The first is from Joy, who is asking, is it feasible for assessment of fair data to be carried out by repositories? Quite time consuming and might depend on the real users. Yeah, it is quite time consuming in general. Could I share my screen once again? Yes, just let me enable you as a presenter again. Okay. I've prepared a few slides on this, just to explain what it might mean to... So I'm going to move this one away. So when it comes to assessing the fairness of data in a repository, typically, the question is basically, okay, what is fairness? Does the data fit my purpose? For me, if I have a specific research question, fairness might be something different than for someone who wants to use data in education, for instance, or who wants to use data as a journalist. So fairness is a kind of fit for use. And of course this is not the same definition of fairness as you find in the bullet list. Probably all know or in the underlying article that was published a year and a half ago. So fair is still a shifting notion in a way, but at least can we trust data and who should do the assessment are important questions. And then there are a couple of initiatives to come up with measurements for fairness. One is by the Go Fair Metrics Group. I realize that the image on screen is too small to really read, but the notion they are working from is that different entities play a role in measuring fairness. So not only making it but also measuring. Communities would play a role in the metrics that are relevant to their domain. It may concern, for instance, typical file formats or typical flavors of metadata. But there can also be more automated parts of the measurement. Some elements of the different principles can be measured by tools and they would not need human beings. That is an approach that is being tested or explored and done. Again, I'm biased. This is the model that we are developing into a prototype at the moment. Is it possible to really find the items that you know from the bulleted list of fair aspects in a kind of star system, as we all know from Amazon, bull.com and whatever? So assuming, for instance, that if metadata doesn't even have a persistent identifier, it shouldn't deserve more than just one star. But if it would have a persistent identifier with very limited metadata, perhaps two stars and so on. This would build to a kind of scoring system with weights. You could do this for all of the elements of fair. The results might then be visualized like this, for instance, as a kind of batch with a level of fairness. What we noticed in this experiment so far is, of course, we tried to make the 15 underlying fair principles very black and white, because if you want to have them measured or assessed automatically, then it can only be black and white. I mean, machines often very deal badly with subjectivity. And then we found that it's relatively easy to define findability, accessibility and operability levels, but not so much reusability levels. So we came up with the idea that reusability is more the results of the other three. Like I said, it is a prototype. We're working on a new version. So you can go have a look at the first prototype, but be aware that it's under development. But we also asked the question of who should be the assessor, the question that Joy asks. And during the period we run the first experiment, the first prototype, we thought, okay, maybe it can be data users, but we found that the prototype was still perceived as quite heavy. It is still a lot of work to measure on all these 15 items a data set. So then the idea came that it might be an archivist or the data manager at the repository. What we're doing now is redesigning this checklist for researchers, basically for at least two stages during the process, at a very early stage when they are generating or data or when they are looking at a repository to find relevant data to work on and to reuse, but also for the later stage when they are preparing the data for depositing them in an archive or a repository. So we hope that this kind of measuring tool can play a role. And like I said, we're still not sure who should do the assessment and if it is feasible for repositories to do so. Thanks, Marian. There's another comment from Joy on this and then there's a question from Darwin. Yeah, so basically Joy's comment is about if there's something, a link that can be shared about a fair assessment and I think you already put it in the chat. Yeah. Not in the chat, sorry, in your slide. Okay. And then there's this question from Gareth. The process to gain the contract seal certification involves the submission of a set of self-ordited set of answers to a number of questions establishing if the repository meets particular requirements. While many of these requirements are aligned to fair, the principles themselves are not explicitly mentioned as part of the certification process. Is this likely to happen in the future? That's also a nice question. The principles are indeed very much aligned to fair and let me show them for those who may be not familiar with them. So this is still, yeah, you can still see the screen, I think. Yes, it works fine. Okay, good. So what you see here is the majority of the requirements in the core trust seal certification scheme and all the blue words relate to making data fair or assessing the fairness of data. Now the requirements are written basically for repositories that want to acquire the certification or that have acquired a certification. I think there are around 160 repositories worldwide that have it. And indeed there is no mention of fair in these requirements. But I'm not sure personally if we want that. I mean lots of researchers, lots of data supporters, information professionals, data librarians, repositories have worked in a fair manner before this buzzword came about. So I wouldn't be surprised if at some point in time in a couple of years this notion of fair is no longer as hot as it is now. It's more important that we live up to the ideas and the concepts behind it and beyond it. So speaking for repositories we have always been in the area of making things findable. We didn't call it that perhaps. But of course the focus and the stress we place on good metadata, good descriptions is lined with compatible with the notion of findability and so on. So there is no one-to-one translation, I agree. But I'm not sure if we should call everything fair although I know this is a session about fair. And I don't think it's very likely to happen in the near future of the core trustee requirements because they are very new. They were established last year, of course building on other certification schemes. So I don't think the fair principles will be directly incorporated in the current requirements. Yes, I would agree with the pair that the alignment should be stressed. Although OpenAir is my sponsor for this Q&A session this should be part of what the core trustee award is disseminating and communicating as well, that's right. Yeah, you can wear several hats, I don't mind. You are more than welcome. Are there any other questions from the participants? If not, I would like to ask you a question, Marjan, because I've been asked several times something you might help in addressing. So something like what is the best license to apply to data to make them really fair if only one or if there can be a combination of licenses? Ah, that's a nice one. So, okay, I think this is a good moment to say that fair is important, but that A in fair does not mean the data are open. Again, you all may be aware of this, but sometimes it's confusing. Accessibility doesn't mean open to the world. Okay, having said that, the best license, I think nowadays for declaring data open, as far as they can be open would be a CC0 waiver. Technically, it is not even a license, but okay. So that gives you, gives everyone the best opportunity to reuse the data for any purpose whatsoever. And even with this very open kind of license of course we are all still subject to good academic behavior. That means we should still cite the researcher and the research that produced the data or the generated data. So let there be no concern that CC0 means you can't be credited because that's just not true. Another thing that comes to mind is we talk about data. I hope that the awareness will grow that data is very diverse and can also have code, software. Lots of research projects do have software as a project deliverable and that should be as open as possible as well, of course. And for software you typically wouldn't use licenses from the Creative Commons family, but then you could think of new licenses of EPL licenses. There are also different flavors in terms of more or less open. Having an explicit license for fair, would you ask Ilaria? I'm not sure because I don't think there are licenses that say how interoperable, for instance, something is. No, not as far as I know. So basically licenses can just cover a part of the fair requirements but not all of them. No, they will typically address the accessibility. So what you can do with the data? Yeah, it would be interesting to see if something can be fostered in this direction. Well, to make them more comprehensive and targeted to fair in general. Thanks a lot for your answer. So I will report back to the people who asked me this. And also, yeah, so another thanks for enjoying about the work you're doing at Dance. You're welcome. Thank you. Just out of the curiosity, how many trusted repositories do you have right now at Dance? Sorry, how many? Yeah, sorry, I'm reformulating the question. How many repositories are trusted according to the core trust seal? I think it's about 160. So let me check if we can show you the map on the screen. So this is an indication of what's there in the world. You see a strong dominance in Europe and Northern America. Screen is freezing. That shouldn't do it. Yeah, no worry. You also see that if you can read it, let me make this slightly bigger. It was okay. Actually, so this is the website of core trust seal, which is the current default scheme for getting repositories certified. And there are some more advanced certification schemes as well, and they have smaller numbers. But even the core trust seal, because it is so young, you can also see here a couple of repositories with the data seal of approval and the world data systems certification. The core data seal of approval and world data system were independent initiatives. And they joined forces also using a dedicated working group of the Research Data Alliance. And together they came up with what is now called core trust seal. So they picked the best common requirements, updated them, made them more up to date. And currently, when a repository applies for certification, it can only apply for the core trust seal and no longer for WDS or DSA. But usually a certification holds for a couple of years, and this explains why it's the mixed few of these different flavors. Yeah, okay. Thanks. That's very clear. And referring to Garrett's observation about the fact that it took a while preparing all the documents to apply for the core trust seal, how long does it usually take for a repository to be assessed as certified? Oh, good question. Garrett, what do you think? Yeah, from your experience. Well, that's not the best motivation. Yeah, okay. It's a lot of work, but it's rewarding because you are displayed on this nice map. I can give some indications, I think. So when a repository enters the assessment procedure for the first time, so perhaps a few words about the procedures. Core trust seal has a board and has a set of 16 requirements. You solve the majority of them. It's not much more. And of course, there is some guidance on these requirements. The process is that the repository writes self-assessment based on these guidelines. And part of the self-assessment is that the repository should provide written evidence to support their position on each of the 16 requirements. Now, this may be a stage that takes a lot of time for a repository because even if a repository is very good, it doesn't mean they have all their processes in writing. So when the dance archive applied for certification for the first time, we had to write a lot of documents. Not just for bureaucratic reasons, but it also helped us to consider our policies and to make them more specific and to connect also the high-level policy to their day-to-day workflow. And I think that's an instance of it's worth it, as Garrett writes. So it helped us to really think about our processes and to bring the people from the archive together with the people from the policy department and the technical department. And yes, that took a lot of time, but it was worth it. When we applied for a renewal of the seal a couple of years later, we did it, I think, in 15% of the time. So there are initial costs involved, you could say, and these are maybe high. But the next version three to four years later is much easier and much faster, typically. Okay, so it's mainly because the effort needs to be an institutional effort in the beginning and then it's just like keeping all the processes update and make sure they're really keeping on sticking with the principles. Yeah, that's mainly it. Because writing your self-assessments is not that hard, but you have to provide written evidence, and that may be hard. At the same time, I'm one of the reviewers for Court for Seal. Court for Seal have a lot of reviewers. Each self-assessment is being reviewed by two people. Of course, not related to the archive that wants to get certification. And we see very good examples of repositories that have all these policy documents openly accessible on their website. And that is very good because that also raises the awareness of reviewers and researchers mainly, and of the people using the archive for reusing the data. So it is an ongoing process of communication, dissemination, awareness raising and so on. And I think that's very valuable. And also if all these policies are available to everyone because they are open, they could also work as a driver to other repositories to learn more about how their workflows could be improved to get the Court for Seal. Yeah, definitely. You can learn through the websites. You can also find the underlying documentation at Court for Seal website itself. So you can get inspired by many good examples. Yeah, that's very good to know because you take it for granted sometimes that your institution is doing good and that all the processes are in place. But then if you have to apply for a specific certification as the Court for Seal is, then you really have to reinvent all the workflows sometimes and all the processes. Yeah. Okay, thanks a lot for this answer. Is there any other question? It doesn't seem so. Can I ask a question to you? Yeah, let's try. Okay. It's like one of the questions you started with Ilaria from the Mentimeter. You've now all seen these ideas about measuring fairness. And I mentioned what we think of it. What do you think? Who should measure the fairness of existing data sets? Let's make life easy and say one data users to the archive. And you can use chat box again. That's an interesting question. So the first answer is the archive. I think maybe the archive can work as a kind of control body to ensure that everything is compliant and respected. Yeah. Curators, yeah. Well, if after this session you have some ideas about it or maybe if you should play it with the prototype, let us know because we don't know what the right way could be or I have to do it not a single correct way. Yeah, and I think that in this stage of prototyping, all the feedback you can get could be very useful to fine tune the prototype and to learn what are the expectations from the users. True. And for users it may feel like a huge burden. On the other hand, if we can make it as simple as for instance filling out a brief survey after you've used the hotel booking.com or bought a book at Amazon, there is not that much burden. So we'll have to see if we can make it as lightweight as that. Yeah. That would help, definitely. If there are not any other questions from... Oh, yeah, no, thanks Gareth. There's another observation. So I saw recently that the most important aspects of there are the F and the R. If you can get there, your data can be considered almost there. So findable and reusable. I tend to agree. I think we're doing all this for reusability. So it's not because this is what Dan's arrived at as a conclusion with the prototype, but also because I think interoperability per se is not enough. Findability per se is not enough. Reusability per se is almost there. So yes, I think so too. And having the R and the F and the R is also what is the current state of affairs. So in this original working group, the Lawrence working group, that came up with the notion of the fair principles. It was a working group here in Leiden. There were quite a number of participants from life sciences. I'm not from life sciences, but my understanding is that they have good protocols and practices when it comes to interoperability, deciding on definitions, deciding on protocol, deciding on measurements and so on. So in that field or in those kinds of fields, interoperability may not sound too threatening. On the other hand, I know from social sciences and especially where people work in very small teams, sometimes one person teams, findability is obvious, but accessibility and interoperability may sound somewhat scaring. So especially interoperability tends to get people anxious, we noticed. From that perspective alone, I would happily promote fair principles rather than fair principles for the next, let's say, two years and then move on to interoperability and so on. Of course, that is not black and white and of course the world is changing and so on, but I think it is mistaken to assume that all domains can make the data fair at the same pace. Yeah, I think that's reasonable to think that communities at different levels of inferiority in terms of interoperability and in general compliance to the fair principles. Yeah, I mean, that would be ideal, but that would be probably too much to deal with immediately. And part of the concern I think is raised by the European Commission because in their template for data management plans, they suggested interoperability would also work across disciplines. I like the ambition, I think it is quite far-fetched from any fields at the moment. Yeah, you called it in the right way, it is an ambition. Maybe something to tend to in the long term, but let's start small and then grow. I would support that, yeah, thank you. Other questions from the audience. While you're thinking about something else to ask Marianne as she's here, I would like to remind you that this webinar has been recorded. The recordings are available on YouTube and this Q&A session is being recorded right now and it will be available on the OpenAir website very soon, together with the links that Marianne provided during the presentation. There's another comment from Garrett. It's one of the reasons that the dance batch scheme is very interesting. When translating fare into repository requirements, having this benchmark against is useful. Yes, I think so. Thanks, Garrett. Okay, if you don't have any other questions, I will close the webinar here. Thank you, Marianne, very, very much for your time and all the good work. And thank you everybody for joining and for being so active. Thank you very much, everyone. Thanks for the question and the discussion and I hope it will be continued. Yeah, yeah. I keep a note for that. Thanks all. Bye, bye-bye. Bye-bye.