 It's not too intruding. So today in this webinar we will talk about open research data, particularly for Horizon 2020 projects. As you might know Horizon 2020 was the start of an open research data pilot in which the management of research data and open access to research data was promoted. So where are we now? Research mostly goes like this. You start with a brilliant idea and then you do some experiments, gather some data. You write up your conclusion and then finally there's some time for pizza as I imagine because you have to wait until your publication is published in peer-to-peer. At the end of the stage of your research you have the golden standard of research of today, which is your peer-reviewed publication. But as you can see in the slide, a lot of your work actually disappeared. Your experiments, the research data is not included in this publication. In best cases it's linked to the publication. And that's a bit of a shame because actually your research data is really valuable for other researchers. So the challenge is to go from this linear process where the publication is the final product of your research to something that's more meaningful and where research data has a place in the process. So instead of just creating data, processing data, analyzing data and then forgetting about your data, the aim is to also preserve data and give access to data so that other researchers can reuse your data, work with your data. So new research can be conducted and new findings can be produced based on data that already exists. So this is the research data lifecycle, which is the way to go for the future if we want to preserve and do something with research data. So then what is data management? Data management is a process in which you try to explain your data, make it available and store it safely. It deals with contextualization of your material so that you can describe what your data is about and what you did with your data by providing information about data sets in a structured and clear way. Your data is safely, both during your research process and afterwards, so you should make sure that you know who has access to the files, that you have a backup poster and decide what data to keep. And if possible, make your data available to the wider public and other researchers. You can gain more impact, your research will be more visible. And there are dedicated research data repositories that can help you with saving and storing your data and open your data if possible. So there is a little bit of misconception about open data, what does open data mean? Open data is data that is free to access, reuse, repurpose and redistribute by everyone without any limitation. Most people think about data in terms of sharing data, data can be made available if you send me an email, for example, this is not open data, this is data sharing, it is restricted access to a limited amount of people on certain conditions. So basically the difference between open data and sharing data is the difference between free hugs for everybody and more restricted hugs on the certain conditions. This idea of open data and data management has gained more momentum as of late because there are some very good reasons to manage your data well and provide access to your data. First of all for researchers themselves, 80% of data is not available within 10 years after the research and that's a shame, a lot of good work is being lost. Maximize usefulness of research data, your data might be valuable to other projects and other researchers. Some papers are on the rise so it's not only about publications anymore, it's also about what data you get and you can actually get credit for your data and good data management. Citizen signs and policies, if data is made available and understandable, policymakers and citizens can work with it and adapt their policies to the latest finance. It promotes research integrity and transparency, to be honest good data management should just be part of good research because it allows other researchers to work on your work and to check results. And last but not least, open data has a much longer shelf life than closed data and good managed data can also get credits for example in data journals. Now I'm not the only one that's convinced of data management and open data. The European Commission started with an open research data to foster open signs and avoid duplication of research and loss of resources. So this started in 2014 with a flexible pilot for some areas and by 2017 this became the default option for all projects. So the open research data pilot has two major pillars. The first one is data management planning and the second one is open access to research data if this is possible. So who participates in this data pilot? Starting from 2014 there was this limited open research data pilot. It was limited to some areas. You can check your grant agreements. There was a requirement to provide a data management plan or DMP for every data set and there was a possibility for other projects that were not part to opt in but also for projects that were part of the pilot to opt out. Now starting from 2017 the open research data pilot is extended. So participating became the default option for all projects. There has been a change in the requirements. Only one data management plan is now required for projects but the possibility to opt out of this pilot still exists. Just for data management for example if you need, if you have large data sets and you have cost-installing data are eligible. So it's a flexible pilot. This means that you can opt out but you have to give a good reason for opting out. There can be several reasons. Privacy is one of the most stated reasons or not generating data can be another good reason to opt out of the pilot. You can opt out at any stage. You can completely opt out through a project management or partly opt out. This means that you for example do not make some of your data sets openly accessible and then you can describe your issues and why you will not make certain data sets openly available in the data management plan of your project. The mantra of the EC is that you should aim to make your data as open as possible but as close as necessary. So there is no need to jeopardize any of your results if it is not safe to open your data. The EC has written some guidelines on how to comply with this Open Research data pilot. You can find them in fair data management guidelines, look them up online. It gives information on who is part of the pilot. It clarifies the concept of fair data. I will talk about fair data later. It explains what the data management plan is, when it should be updated and it explains the costs that are eligible. And very important it provides a template for the data management plan. So this template is not obligatory but I would highly recommend to use this template and if you not make sure that all the subjects in the example template of the EC are matched with your own template. So what are these requirements? What do you need to do if you are part of this research data pilot? Well first of all you should write a data management plan. You should deposit your data in a data repository together with all the information and the tools necessary to validate results that appear in your publication. And you should open up your data and data sets to the public. So these are the four main requirements of the data pilot. So let's start with the first one with data management plan. So data management plan should not be delivered at a proposal stage but you already have to give some information at a proposal stage. Give information about what standards you will use, if you are going to make data available and if not what is the reason, how your data will be curated and preserved, the current seat of agreement on data management. This is that you already think about what you are going to generate and what you can do with this data. You should also plan your budget at this stage if you will need some extra funding for data management, be sure to include it. So since in DFP it is not a requirement at proposal stage, it is also not part of the evaluation. But DMP is a deliverable, so you should definitely keep in mind that you have to deliver a DMP later in the project. So as a timeline, after the proposal stage, the first version of your DMP should be ready in six months. This is soon, but it doesn't need to be full-clone DMP, that is often not possible because of course you do not have all the information yet. It should be a basic outline that proves that you already have thought about your data and also starting this early in the process with thinking about data will make it easier to do good data management. Then you should provide updates in case your data change or the policies change, if your consortium change, and at periodic evaluation. So whenever something major happens to your data, make sure that you update your DMP. And at the final review you should have a final version of your data management plan with all necessary information. So I already talked a lot about this data management plan. What is it exactly? It's a document which outlines what you're going to do with your data, how you're going to handle it, how you're going to get your data. So it starts with the gathering of data, then how you handle it during your pool check, and what you will do with the data afterwards, where you will store it, how people can access it. It's a living document. It should be updated because your data is not static. Your project is not static. And some of the information you will only have later in your project states. It should reflect on creation, preservation, sustainability, and security. And it should outline what parts of the data will be open, and how you're going to make your data open, how you will handle this. So the content of Arise in 2020 DMP is outlined in the easy guidelines on fair data management. And it contains five major topics. First, data summary, what's your data about? How are you going to collect your data? Then these fair data principles, resources, who will be responsible for data management? And are there any costs involved? Data security, this is both during the project and after the project. How are you going to manage access to files? And ethical aspects. So what is it about these fair data principles? What does it mean, fair data principles? Well, these fair data principles are a standard to which data management should apply to in best cases. So it's a kind of best practice in data management. Fair stands for findable, accessible, interoperable, and reusable. And it deals with questions, for example, for findable about how can people discover my data, how will search engines discover my data, and how can people understand my data. Accessibility, it deals with topics such as metadata, persistent identifiers, and naming conventions, but also keywords and the versioning of your data. Accessibility deals with questions like where to find your data sets, and how can people access it. It deals with software and documentation, and with data repositories, archives to store your data. Interoperability deals with how can other people use your data, how will you make your data available, and how will it match other data sets. It deals with standards, vocabulary, and methodology. And reusability, how can people make sure that they can use your data, and how will they know that they can use your data. And it mostly deals with licensing of data. So this is a lot of information. So how do you start with writing a data management plan? I would recommend using DMP online, which is an online tool which can help you write and keep up to date your data management plan. And they have a specific template based on the European Commission guidelines and example template that they give you. So you can select European Commission on the Funders, and you will get the right guidance. You can also here select Extra Guidance, which I would recommend. Once you've selected your funding, you can create a plan. You can fill in your basic information, your grant ID number, for example, and a description. And then it's asking you the questions of the DMP template of Horizon 2020. So the various version, depending on the stage at which you're in in your research, it has an initial DMP for six months and a final DMP for the review. And you can share your DMP with your partners, so you're not the only one doing all the hard work of filling out a data management plan. It has the guidance based on the guidelines of the EC, so you directly have all the information in one place. And then it has extra guidance and links from the DCC to explain certain concepts of what is good practice, which can be really helpful and useful if you're just starting out with data management. You then can export your DMP and send it to the EC. I'm going to make some recommendations based on the EC guidelines and on the topics that they address in the example data management plan. This will be general recommendation. A lot of information that you have to provide will be discipline specific. But since this is a general webinar, the recommendations will be general as well. But for specific information, you can always try to find more information online or contact your colleagues. So what data does this open research data apply to? Do you have to describe all your data sets in every stage? No. So the data pilot mostly concentrates on the data and the metadata needed to validate the results presented in your scientific publication which are linked to your project. You can also include other data, as you specify in your DMP. For example, if you are data is very interesting and you think it will be useful for other researchers, be sure to include it also in your DMP. So it does not apply to all data. You have a certain freedom to choose which data sets to incorporate. But the minimum requirement is that it should incorporate this data that is necessary to validate your results. And again, you do not have to share data if it's inappropriate for some reason. So exemptions apply. Let's start with the first question. Data collections is part of the data summary question. You should state the origin. Do you generate data or are you reusing data? And if you reuse data, who is the owner and what are the rights to the data? Can you use the data? So be sure to provide a source and check if there are any intellectual property rights attached to the DMP. You should also give some information on the type of data you're collecting, what the quantitative data, what the survey, how you're going to elect it. That's basic information that should not be hard to fill out. Another question in the data summary topic of the DMP deals with data file formats. And here the idea is to make it as easy as possible for other people to access and open your files. That means that you should use easily reusable file formats. If possible, use open standard for file formats. So PDF is, for example, not a very open file format and not very reusable at all. So instead, you can use ODT, for example, which is an open standard. And use commonly used file formats that are commonly used by your research community, so if you're not able to find an open standard, at least people in your field can open and access your data. Use a consistent naming convention. This is especially important if you work together with a lot of partners and have a lot of different people gathering data and structure and organize your files, which is a little bit self-explanatory. Atical aspects. I'm shortly going to talk specifically about personal data, because for personal data, there are very strict rules. And the GDPR, which you might have heard of, the General Data Protection Regulation, will become in use as May 25. And in general, for personal data, the consensus is to be as strict as possible. So the EU has a broad interpretation of personal data. Basically, personal data is all the data that can lead back to a person. This can be a name, but also IP address, and also quickly. So be sure to, if you work with personal data, only collect what is absolutely necessary and only use your data for the purpose intended. You need consent, and this consent is also a broad consent. So you will need consent for collecting data, but also for processing data, for saving data, and for archiving data. So make sure that if you work with consent from your participants for collecting data, make sure that it's broad enough. People that give you personal data have the right to be forgotten and the right for data correction. And you, as the researchers, should make sure that there is a procedure in place that people can ask for this thing, that they can ask to delete their data, or can ask that they can correct their data and see their data and know what information they have on them. You should also have a procedure in place to send people a notification if there's any breach in your data security. So privacy should be the default option. If there would be any divergence on the original statement, you have to ask for additional agreement and additional consent of your participants. So this is if you work with personal data. There are other ethical aspects to keep in mind, consent for us. If you can anonymize your data, and if you anonymize data, be sure that you be very thorough in it, that if you combine data sets, you cannot track any information back to one person. So in your DMP, you can include a link to the ethical chapter of the description of directions, because probably most of the information concerning data will be there if you have any ethical issues to report. So data security deals mostly with how you're going to handle your data during the process of handling your data. So it deals with file sharing and storage. You should make sure that you have a strategy for file sharing. If you work with partners in different institutions or different countries, be sure that you make sure that you know what the file sharing procedure is. If it is secure, who can access the files, and if some encryption is needed. Backup procedures, this can also vary for every institution. Be sure that you know what the backup procedure is, and be sure to have one. This is a key issue, too, because you do not want to lose some of your work. Refunding your ICT department is a good tip for all these file sharing and storage issues. So file sharing and storage is especially important if you deal with large data files because of storage capacity or with personal data. GDPR also applies to file sharing and storage, so be sure that only the persons who absolutely need to access personal data can access the data files. Now we start with the FAIR principles. I am starting with accessibility, and the idea behind accessibility is that people outside of your project should understand your data based on the information that comes with your data. So you can make your data understandable on a project level, in which case you provide context, but also on a data level. And there are various tools you can do that. Some software includes some tracking, and software can help you to track what you're doing with your data and how to understand your data. But also codebooks and protocols can help you make your data understandable. Version control is an important part of documenting your data. Again, especially if you work with lots of people working on the same document, decide on a system, and be clear and consistent with naming conventions and make sure that you know who has access and editing privileges to the team. Creating searchable data. I'm going to talk about metadata. So your data should not only be accessible and understandable for other researchers, but also in a digital era should be understandable on PC level or computer level. So metadata is data about your data that can describe a single piece of data, but also the whole collection. And it does not only apply to digital data, also physical data can have metadata. So it helps you to organize your research data and to give information about your research data to others, helps you other people understand what your data is, how it is collected, and how it is structured. So metadata consists of a set of attributes that describe your data. Some basic information is there, subject, title, connections who is involved in the project, are there related publications, then important access rights. So there's digital information which gives clear information on who can access the data and what people can do with it. Technicalities and preservation, where is your data saved, and for what periods. It can help prevent inappropriate use because it has access rights and it has information on what the data is about and also that, for example, the technical ones. Different domains have different standards for metadata, which makes it a little bit complicated. So it is recommended to use the metadata standard of your domain. This is an example of doubling core. As you can see, it is machine readable. So don't try this at home. Don't try to come up with your own metadata standards. And it has this different attribution that will identify the data. So try to find the standard of your domain. I'm referring here to the Digital Creation Center, which has a list with metadata standards for specific domains. If your domain does not have a metadata standard or you're working with cross-disciplinary projects, there are also general metadata standards which you can use, which will provide metadata for every type of data. Doubling core is a well-known one. There's also an online tool that can help you write metadata for your data based on doubling core. And these are some other metadata standards. As you can see, there are various different metadata standards. Let's go to storage. Where to store your data? USB sticks are probably not the best option. But what about your project page? Isn't that a good place to store your data? But one of the problems with project pages is that it's not often very sustainable. There are no services included. There are no technical standards. Is there metadata attached? And how to find this data? If all data sets would be on project pages, it would be very hard to, for example, a collection of ecological data, for example. The example I posted here, you can see the last update was in 2010. So it's not very sustainable. And they also provide the basic data sets, but not the coding. You can email them if you want to code, which is not actually very open or very usable. So a better option is a research data repository, which is an archive, especially equipped to handle and preserve data in a sustainable way. So you should look for data repository that matches your data needs. Do you have large files? Does the data need to be stored for a very long time? Disciplinary data repositories are recommended, or institutional data repositories, if your institution has a data repository. And if you do not know where to start looking, RIT data is a good place. Or you can use Zinodo. Zinodo is a cost-free data repository open for all types of data. It's a catch-all repository where you can also save and upload publications together with the data. So RIT tree data gives an overview. You can look for all kinds of subjects, content types, or countries. And it will give you specific information on what this data repository provides. Gives information on access level, on licensing, resistance identifier, which makes your data identifiable and searchable. It's this unique code for your data. And if it arrives into certain standards. If people start writing a data management plan, I sometimes recommend that they first look at a repository and which one they will use. Because repositories can help you greatly with writing a TMP, because then you can already see what kind of licenses you can use, if there will be a persistent identifier which makes your data findable. And it will also help with the question, how you're going to preserve your data. So what are the deposits? Do not just drop all your data files in a repository. But look for the data that is useful. And the bottom line is everything to validate the results presented in your scientific publication. So what are the deposits? First of all, the data. So make a selection together with the metadata. It can be a metadata file. And then you should also include documentations. So again, the baseline is that people outside of your projects should understand what this data set is about. So documentation is about your project, but also about the tools used to process your data and what happens to the data. A readme file can sometimes be recommended to also add to your uploads of your data sets. Then to the question of open data. Open data, keep it simple. So the EC recommends to make your data as open as possible, but as close as necessary. So if you do have some private, some personal data, make sure to anonymize it or keep it closed. Apply an open license. This is the easiest way to make sure that your data is reusable and open in a way that is appropriate for your data. Creative comments license, for example, recommended on CCBI, CC0, in which if somebody reuses your data, they can, but they should always credit you. So data repositories can apply or provide licenses so that it's just included in the information. OK, CC0 is technically speaking not a license. Sorry, Gwen. Thanks for the update. So make sure to choose an open license. This is a lot of information. So what does a data set look like that is uploaded according to all this recommendation? This is an example from cenolo.org to catch all repositories. And here you can see that it includes a README file, which explains which data sets are included in the zip file that contains all the information so that it's understandable for humans and other researchers what the data is about and what's in the file. That's understandable for machines because it includes metadata about the data. This is in a standard, I think, for biologically data. Should look it up. It also includes the scripts. So the tools that were used to process the data and then in various forms. And then it has a license for open data and a DOI. So the data is searchable and uniquely identified. It also has keywords and related publication. So this is kind of what the perfect upload of the data set looks like to make it understandable and findable and reusable and accessible. So the repository this was uploaded to is cenodo. I talked about it yesterday as well. It is a catch all repository. You can create a community. All kinds of content can be uploaded to cenodo, not only data sets, but also, for example, software or publications. It's free to use. It's safe. It has a GitHub integration. And when you upload your data, you can choose which type of file you use and license or access rights. It also gives you a digital object identifier. So as you can see, a data repository can already give you a lot of information and a lot of tools that can help you in good data management. So to conclude all the steps you should take, step one is to write a DMP. You can use the online tool or the guidelines of the EC with the example DMP template that they give you. You should update this DMP. First version should be deliverable at six months. Then update at periodic evaluation and for the final review. Step two is to find a repository that matches your data needs. You can look for discipline-specific repository or browse retweet data to find one. Use cenodo if you do not find a disciplinary-specific repository. Then deposit your data. Deposit your data sets if possible, make them open. Together with the metadata, the data about your data and all other tools necessary to understand and read your data. Make sure to use standard file formats, standard metadata schema based on your discipline, and attach an open license. You can find support in the EC guidelines. I also listed TCC. They have a lot of information and on openair.eu. Openair just shortly is a European project that supports the EC policy. So we have an infrastructure that collects information on projects, publication, and data, combines them in a structured and easy to consult way. But we also provide support for researchers, for research administrations, and also for funds. And we have a lot of training and support material on open data and on the open data pilot. We have some fact sheets with information on the open research data pilots and on how to deal with personal data. And we have web pages on creating a data management plan and selecting a data repository. In openair, you can also link your data sets to your publication. So if you look up your project page, every European project has a project page on openair. You can see your publications attached to your projects, which we harvest from data providers. And if your data sets are not there yet, but they are already in repository, you can link them to your publications. We provide a lot of help on open science. We have a lot of information. So if you're looking for more information on data management, open data, or other open science topics, we have guides and fact sheets. There are workshops held in every country. We do webinars like this one, and the one from yesterday. There's a help desk and a frequently asked question. And this is all being carried out by regional experts. So we have a team of regional experts in every European country and beyond. We also, for example, have Turkey, Switzerland, and Norway. And then they can support you on a national level if you have specific questions. And you can also find more information on, for example, local data policy on the country pages. That's it for me. That was a reminder. If you have any questions, feel free to post them in the chat. I'll open it. It's an ongoing project, or is it ongoing? It's an ongoing project. We just started a new project phase this January. As Irina said, that will continue for the next three years and a half. And we also aim to be more sustainable in the future and become a legal entity. So that open air will not longer be a project, but legal entity is in that way sustained. Let me scroll back. Were there any questions? Feel free to type them in the chat. Irina also gave a lot of good information and posted some links in the chat. The owner of the DMP Online Tool Server is a digital creation center in the UK. It is open software. You can also integrate it in your own institution, which for example, in Belgium we have our own DMP Online Tool which is based off the script and software of DCC. So I hope this answers your question. If you would organize a festival, would you rent a conference center, or would it be open air? Maybe we should consider it. We had a fair in Athens. The weather was brilliant, but it was still inside the building. If you would have any more questions, you can always send me an email or send an email to info at open air. And the recordings and presentations will become available. Okay, thank you very much Emily. This was very interesting and I'm very happy that we managed to get through this webinar without any technical issues. So I've posted the link to the open air dedicated webinars page where you will already find a link to the slides and I will add the recordings there as well. And I will also send the mail to everybody who participated in this webinar with the slides and recordings and the evaluation forms. It would be really much appreciated if you could fill that in so that we can for future webinars that we can tailor our offers to what you are needing. Yes, so there is a remark about the webinar from yesterday. We noticed that quite a lot of people had actually issues with the connection. So there was a recording available actually it's already on the page. So if you click through to the open air webinar page you will see the recording. And if you have a bit of issue with starting up the recording, we should just like, we have some issues ourselves sometimes which is just like move the bar a little bit and normally it should start and you should be able to watch it entirely. Okay, so thank you very much Emily for this. And we hope to see you soon for another webinar. Thank you.