 of research data in the social sciences and humanities. The presentation has a focus on research data management. You already have received a briefing paper. We are responsible for that. There will be a subject this afternoon in the afternoon sessions. I will come to that later. This is the outline of our presentation. There should be an introduction of the open research data pilot. This morning, Karine already told you everything I wanted to say about this introduction. It's better for you because this presentation is between you and the lecture. It will save some time. I will skip another. I will skip about text slides or something like that. My colleague will tell you more about research data management. It's prepared for responsible research and also the roles and responsibilities during the research project. Then I will come back to you for the data services. You already heard a little bit about it. It's a no-go. When you can find your data in Kolsky, I will be more elaborate on that. I will give you a short summary and later I will tell you more about these afternoon sessions. I can skip the next slide. We will do this, wait on this. I want this, but it doesn't work. It doesn't work. We had a given. Yes, you can do this. The back please. I understand that, but if you on the computer you put just saying number six and it would come to number six. Okay. Open Air supports the European Union, but Open Air supports the open access policy of the European Commission. The National Open Access Desk. There are a lot of people involved in the National Open Access Desk today here. Who are the know-hows for their country? Maybe it's nice to see how many know-hows there are here now at these moments. Ah, lots. Very good. We also have from the international intermediate between the research and the European Commission. That's a national contact point. We also have them here. How many contact points are there? National contact points. It's nice to know for your afternoon sessions. The Open Air website is a lot of information. We already heard so, and also information about research data files. So if you need information, you can find it here. And now in the description. It looks like the other slides. Okay, be there for responsible research, my love. We're going to start with a brief video clip. So Eddie didn't say much so far, I'm going to say less. We'll find the link to this clip and the remaining half of the clip in the information that was already e-mailed to you. And it will also be in the breakout session this afternoon. The clip was about why and what of data management planning. Let's talk a bit more about the how. Before, in the earlier presentations, DMB Online was mentioned as the preferred tool at the moment to find a template for the plan. It is preferred because it was already there. The comics of the digital curations and the total DCC have done a very good job by building this application that you could consider as a generator for data management plans. There is a clip on their side as well but I'll show you a couple of screenshots just to talk you through. And I know many of you are not active researchers and still I'm addressing them as if you were big researchers because this might be the kind of information they expect from you. On the first screen, you can select that the European Commission is the founder of the research for which you are going to write a plan that is the profile of a particular template. And next you select your organization. If it's in the list, what project, for instance, the European Funded Anexia Project in the divine sciences has made a version of a data management template. There is no problem if your organization is not in the list and you just leave it open. But the thing that I would recommend is to always check the box for some extra guidance provided by the DCC that is not fine tuned to horizon 2020. It's general purpose common sense guidance which is very good to have in extension to what's in the templates. The program then summarizes what you have selected so far and it also pretends some information that I ended earlier. The identifier that you see on the top is one that I assigned myself. This is not the so-called project identifier grand number persistent identifier or what have you. This is an identifier for your convenience to share the information with your colleague. And inside the application you can work with colleagues on this plan. You can share the information and collect it. And at the bottom it says that the document is not a fixed document so data management planning is working on a living document but it's a very pleasant statement of course. You should deliver it in the first six months as a project deliverable that you are supposed and welcome to change things, expand information and so on. The summary here below the orange bar says okay this is a couple of questions and topics within this template. Remember that there are a lot of templates in the tool so they are different in retrospect. And basically what's in the blue field is what I suppose to provide informational and it goes through the whole process of carrying out research. This is what it looks like if you're starting to fill out the plan. This was the place where I entered my fictitious identifier for this fictitious plan. And it's one of the setting of questions and answers. On the right hand side you see the guidance. In this particular field is the guidance provided by the ECC from this famous NX number one. The NX 2 description of the liability management plan but it could also be the guidance provided by the ECC. Another slide I'd like to recommend to look at in this application even if you are still writing your initial review this tends to look like what is being asked of you in the latest sessions during the project. And basically the information here is about making your data reusable so it asks questions about how will the data become this kind of a buffer for others. How will others be able to assess the value of the data for the wrong purposes and so on. So you don't have to fill out this template within the six months it's sensible to at least look at it. And there are some initiatives within the open air community to see if these two templates could be integrated to one template and you have to answer only a couple of questions initially and you flesh it out during the project and then it will be able to come here but that's ongoing work. Easy part. There is a template. The more complicated part of course is what to write in a particular case of your project with your consortium and open science is about collaboration and it's about collaboration between lots of organizations of different natures. So this is roughly the team of stakeholders that you will be dealing with when you carry out a project and the researcher is quite central here and the researcher what we call a front desk or front office, a head desk within the organization the knowledge institution or within a discipline or whatever that would be even better. And typically the front office in the discipline or in the organization would provide information on legal issues, who is allowed to do what with the data and what should I write in the plan or practical IT information, what kind of facilities do we have for safely dealing with the data while we are working. And that's also the institution that might have their own data policies or data management policies. It's not just families which are very active in developing policies in the area and policies could be very incompatible. That's just the fact of life that won't be solved within a short time but it is good if front office people within the organization are aware of that and can tell the researchers, yes, that's how it is. Don't worry, it's a pilot, we'll leave. The release of front office is a clear stakeholder. Front offices might also want to collaborate with what we call back offices. Long term archives, for instance, could be considered as a back office. Organizations that provide secure data transfer or high performance computing could be so-called back office services. They need not be within your discipline. They need not be within your organization. So that's maybe with the front office back office separation of concerns. Publishers have data availability policies or they don't have but they might get them. So they will impose some restraints and constraints on the researchers to provide them with data. And of course within projects there are many private organizations as well and they can have their traditions and considerations on what could be done with the data and what could not be done with the data. All these stakeholders should ideally be involved in writing the plan so that the plan is not important. It is the planning, as was said this morning, and it is an instrument for communication. It's not just administration, please. Okay, let's recall the goal. We are still in this early stage of preparing for the real projects. We are in this first couple of months. And the idea is that at the end of the project the data will be well curated and preserved. Now here comes my negative slide. Storing data on a hard disk or a USB stick or even on your organization's server is not necessarily curating and preserving. That is just storage. Curating is typically meant to include activities like enriching the data. If data has been reused in a couple of years it is nice to refer to applications based on the re-use and add that information on the future applications too. The data itself is an instance of curation of data. Reserving includes activities like making sure that the data files remain usable over time so it might include converting data to another format. That is not done when you only store the data somewhere on a hard disk. Now if you have archived the data properly that in itself doesn't mean that the data will be found. It is up to the researchers to provide good metadata, good descriptions, good information, good keywords and so on in order for the data to be found in a repository. If they are findable it does not necessarily mean that they are also accessible. It has to be clear who has access to the data so that it is a legal issue perhaps or an ethical issue but it could also be a technical issue that has to be solved. If the data has been archived and they are found and they can be accessed it doesn't mean that they will be usable if you have not explained what a variable means or if it is not clear what definition you in your research have used for hiding that pressure. So the data will not be compatible to data from other sources. That's the word here, interoperability. Even if you have taken care to make them understandable about providing all that information they might still not be usable if you have worked in obsolete, outdated, unavailable software for instance. So there is a lot to think about but the upswing of this is that you should provide the context with the data that was already illustrated in the talk this morning about Professor Margins. It's also about the code and so on. Let's see what it means. This is a lot of text. The blue words are the most important. So we talk about data management but data is only part of the package. We need a metadata. Additionally, of course, things like title and who created it, I mean, since when it has been available ideally you use a metadata standard that is common in your line of work and if there's nothing available in your line of work well that might be a good idea to start talking about standards. It is a slow process. We went this morning if you want to do it in an open infighting way but it can be done and it is being done. So for the time being, you could rely on existing generic metadata schemes and a couple of them are mentioned here. Furthermore, you are expected to provide information and documentation like depending on your domain, of course, code books, explaining variables, lab journals, preferably electronic lab journals and other paper ones. If you're dealing with respondents or interview people, for instance, you might have informed consent forms. Now, the sensitive information of a consent form need not necessarily be shared with everyone. The fact that there are consent forms where people have stated, okay, it's okay that you use my data in my interview for this or that type of research that is fairly more information for the next user. And everything you've used in a range of instruments and tools, queries in the syntax queries in your statistics program, the configurations of the machine that you have used, it is all relevant. Now here, sometimes confusion starts because this is relevant for replicating studies. That's why sometimes a package like this is called a replication package. If a replication package is okay, it includes everything that an intelligent colleague can use to trace back what you have done. Actually, we, of course, are hoping for more, not just replication or replicating. We are aiming towards the more creative kinds of reviews used for other questions and used in other areas and so on. So a replication package might be a term that is too limiting. If you have better ideas about the term I'd like to hear from the breakout sessions this afternoon of what we are looking for, terms that sell well. This question could also arise about a notion of data because this is all fine and good. But yeah, in my domain, it's somewhat different. There's always some that say, yes, but in our domain, things are different and they are right because in your domain, in your domain, in your domain, things will be different. And that is why it's so important that initiatives start within disciplines. So there is, of course, a top-down initiative from the funders, for instance, or from your board of directors, but please take also initiative from the grassroots and participants. Develop your own funders. Another possible bone of contention is that's quite a lot that we should deposit at the end of the project. Yes, that's true. But it doesn't mean that you have to deposit everything that's gone through your hands and through your computers over a couple of years. There are two good reasons for making good cuts on what should be deposited and what doesn't have to be deposited. One of them was already raised in the grant agreement. There are good reasons for leaving some information, some data out of the public domain. That doesn't mean you have to opt out of the whole project. You can stay in the private, but just writing your days of management plan, I'm going to make this section of the data available under a restricted access licence for some of the reasons. At least that is explicit and people know they can ask you for it and they have good reason to think, okay, maybe I can get still access. The other kind of selection is the selection that may not need to be deposited if things can easily be reproduced. We hear examples from physics, for instance, where storing a huge amount of data is very costly. Storing it in a sensible way with all the documentation that you might have makes it even more expensive, but the work can be done again and again and again, so you can just repeat the experiment. Of course, this is again domain dependent, but please think about it and motivate in your plan why you select those things and why you make such choices. Archives, repositories, databases, storage and so on. Pedro told a bit about Sonodo a little bit ago. People are into archiving and what have you because it's also our background and we like to talk about it. Basically, it is a place where you store things safely. And we like to promote the idea of trustworthy literary repositories, not every repository is trustworthy. Now, I'm not intending to bash colleagues who manage repositories. That is not what we mean. What we mean is if a repository does not have the mission and, of course, the budget to invest in keeping things available in the long term, they will probably not apply for a certification as being a trustworthy repository. I leave the rest to Amy. We were still in the process of writing the plan. We were still in those early stages. This slide you've seen. This is the recap. If you have collected all the information from your stakeholders, make things explicit, have made some choices and when you face them in the plan, you are now ready to export your plan. You can do it in several forms, but probably the European Commission will be happy to receive it as a PDF. And that's basically it. And remember that it's a living document. So this was the preparation stage. This was a large number of slides. During the project, which will take more time, you will still be you researcher, your customers, I might say perhaps. They will carry out the project and you will be involved again. Perhaps in another role. During the research, questions might pop up like, okay, I know I have to anonymise my sensitive data. What is the state-of-the-art tool to do that? Or is there a state-of-the-art trusted party that can do it for me? Another question might be, how can I share my data with colleagues outside the consortium? Is that okay? Do we have a tool for that? What do we need to have them sign a contract? Another question could be, okay, I'm using a new instrument which is fine, but the data format is different. How should I go about to make the data format as sustainable as possible? My data turned out to be bigger than I thought. Now they don't fit the repository. Where can I find another repository? So a lot of questions can pop up during the project. And the stakeholders and you will be asked to help answer them. During the project, it makes sense to do what Federal law has already introduced, think about linking the data and publications. And I'd like to take a data-centric approach and to just claim, not originally, that the publication is part of the context information to the data. I fully agree with the idea that the life of the data will probably be longer than that of the publication. But there is no need for you to include the publication into this package. No, please put it in another good repository and make sure that the repositories talk to each other. Either because the repository can accommodate data and publication at the same time, or because you use smart persistent identifiers that help to trace the relations and keep it sustainable. I started with the movie that was brought out so to speak in 2020. And there are also some incentives, of course, that are broader than the project and the project funding. So at least the paper was published in the Esther Physics domain. It's one of the areas where people have looked into the question, okay, what's in it for me if I go and take this extra trouble of properly archiving my data and leaning to them? Well, the good news is, although it is hardly above anecdotal information so far, the good news is papers that link to data are cited more frequently than people that do not link to data. If that's not incentive, I don't know what is incentive for research. Another incentive might be I am so pleased to see my old data get a new life. In 1977, there was an expedition to Spitsbergen where the scientists took a lot of data on biomass and vegetation and the data were analyzed to the back that you see at the left-hand side. It is too detailed, of course, on the barge screen, but when last this summer a new expedition went there, they had the map and they found that the underlying details of data were still available and usable. So they could make a new map, of course with new techniques and technology, and they were able to compare the changes throughout these four decades. So that's interesting, that is a long-term thing. And of course the funders involved in this second expedition were very pleased to be part of this expedition and their website is lovely, so please take a look there, even if you're not interested in the data. And the ultimate incentive, of course, if nothing else helps, I don't know if data management does not prevent theft. So, for another few minutes, we're talking about data services, how to find a repository, and I want to talk about trustworthy repositories. Because, as Marjan already said, you can store the data, or the search, you can store the data in a repository, but it's also important that it's a trustworthy digital repository, it's important that the data you want to keep are still real for 20 years, et cetera. I think, for example, of DVDs, in a few years' time, no one can see them anymore, so you can have just think about it. And that's arranged in just very digital repositories. These repositories can have a seal. In Europe, are the data seals a proof of this example? It's a very basic seal for a trusted digital repository with, I think, 16 guidelines you have to follow. And this is a self-assessment and there are reviewers that will assess what you've done. The Nestler seal is, there are more guidelines in it. It's also a self-assessment. Of course, the ISO is even different with an external assessment. And in the United States you have the world data system and at this moment the boards of the BSA and the World Data System are trying to make it one seal. So where do you find a repository? Well, you can find an external repository, like, for example, easy at dance. They are quite well, especially for the social sciences and humanities. It's seen in the world. You can also use an institutional digital repository. And of course, Sonodo we've already seen something about it. I will show you again. And then, of course, you can find another digital data repository at www.reezydata.org But it's important in that you search your SNES and again, the point of the Treasury Treasury Digital Repository which also this morning Professor Martin said that all kinds of data are disappeared. So it's very important. You'll have to see if it matches with your data needs. For example, it cannot take no, it can be actually, of course, clear but also XML or other things. What does it mean? Open access, restricted access. It's important that there will be a permanent digital identifier so that the data can be cited. And it's also important that the repository helps with outside the data. When you see the registry of research data repositories it's repository can register in the Internet herself, how do you say it? They're from all kinds of academic disciplines that doesn't speak to one. It's for permanent storage, of course, and it's funded by German research foundation. And in this model there are around 168 repositories you can find in it. You can search for example by discipline by country. This is very hard. You can see the green ones are the countries where there is registry of research data repository. Of course, if it's blue color then there are digital repositories because it can be possible that it's not registered in this registry data. Of course, Melodio Pedro shows some slides. I think it's very nice that they also have communities. And so all the data of what community can come together. You can see that it's on the European Commission. There are a lot of publication and data sets already there's 16,000 publications and 1600 data sets with all kinds of information. Of course, Melodio has all kinds of features like it that will give you a digital cryptocurrency permanent identified for the data set. You can it's nice that you can search if you share the data you can search for the data not only data, of course information etc. What I said, you can create your own community and it's safer than information on your computer. For open-air to make the link with open-air there are already a thousand data sets in open-air and here you can see the information about it. For example, the data providers you can find data sets by data provider or by year or by access mode etc. and this you must know this slide I'm glad you didn't show it this morning but it's very nice that you can see the relation between the publications and the data sets 52 publications from 20 different open-air data providers there are 392 data sets from by year it's a data repository in the earth and in the environmental sciences and the project is HIPOC in FB7 project it's about ecosystems and you can find the distribution of the project but you can also find data sets and publications and here one from again and the data set from the entry cost I think this is a very nice idea it's not you can do it for sharing information with other researchers. So a very short summary we talked about the research project in about 20 20 areas are automatically part of the pilot we had about the possibilities about data management plan of course if the data has to be in a repository you have to make it open for example create the funds CCO of CCO by license and there are now 11,000 data sets in open-air this is open not to European Commission it's open as possible as close as needed and this is a patrol from the RDA literally last year in Amsterdam I think it's also very it's started since a few years now that's very important to also take care of the data and even make them open available so the last presentation about the open research data pilots just one minute for the afternoon session Dan is responsible for the past research data management and training sport and we visited the support kit for this we've made a briefing paper about research data management and we'll receive that and it will be the focus of this afternoon our program is that we have talk about what kind of challenges you think they are in your country for research data management we want a little bit of breakout sessions about the brief we've made and about the program and see if there are more questions any questions at this point you can talk about the breakout groups of course as afternoon for more details but a question you would like to ask right now which is relevant for the whole group it's a call for action there is a petition launched by Lira the League of European Research Universities called Christmases Over and it deals with the formal practice of double tipping and it calls for action Lira wants to put this on the agenda on the European Commission under the Dutch presidency and I really urge you all to sign this because they would like to have at least 10,000 signatures it's now at the 7,000 something and it will not just be the Lira University signing it is a matter that all of us are confronted with the libraries we need to cut budgets whether they want or not and the researchers should not be paying money for publications research money should go to research so please look up the tag Christmas is over and sign the petition yeah okay that's absolutely important for anything and so let's have quick questions discussions for this afternoon and let me give a word about the discussion this afternoon I have a separate one under what because this afternoon we want you to work not only the European Air People but yeah I want to see what I have to do now this afternoon you all have dots colored dots on your and we divided the group because we want a mix of people in the groups that's why we mix you up up front we want to have people who know some things about open air services and the mandate and people not knowing much about it mixing up with people that are more involved in project coordination and others who know us because we want really interactions afternoon we want to hear from you how we can help you even further which support is needed how can we improve the services that we offer so this is what will happen this afternoon so there are four groups before coffee and after coffee each time we have two groups talking about the same topic and after coffee we switch to the other topic so the green ones for example will talk first in the library which is down here just beneath us about the data pilots while in the afternoon they will talk about the mandate on open access in practice in the first conference you know but that's my room so you just I will just gather you and take you with me because it's in another part of the building the blank out room is up here so we have a library here under my feet which is first for the green group and then for the red group the blank out style is just one more up and the blank out will first be occupied by the yellow group and afterwards by the blue group and then you have for the mail in the year this is for the mail in and press conference room that's following me so the first group for me the red dot follow me just after that room and then after coffee break there will be a coffee break in between of course the people from the green group can follow me in the press conference then we get back here because we will report about what's happening in these groups what are the questions that come up and round up and afterwards we have reception at five o'clock so I hope you could say that too now after lunch which is at the same room we had before at two sharp we will start so please just before two o'clock go to the room where you have to be also at the registration desk it's indicated so thank you very much