 Thank you everyone for joining us today. We hope you will find the event very useful for your own research, and we hope it's also a way of you knowing that you can always contact us. I'm Kristina Magder, I'm the Collections Development Manager at UK Data Service, so I lead on all the acquisitions that we get into our collections, but I also wear a research data management hat and I lead the training portfolio when it comes to research data management. And I'm joined today by my colleague, Hina. Thank you Kristina. Hi everyone, thank you for joining us today. My name is Hina and I work with research data management team with Kristina and I mainly oversee the guidance and training related to ethical and legal issues in data sharing. Thank you so much Hina, and we have quite a lot of content to cover today, so let's see what we're going to discuss. We're going to be looking at data sharing and its benefits, and what is a data management plan, and why is it required? What is usually included in a data management plan, and we're going to be looking at a practical example, the ESRC template, but we're also going to include a number of resources, tools and templates that you can use in your own time. So when we're talking about data, we're thinking data does not exist in a vacuum, we're talking about this data lifespan in the research data life cycle. Usually when you create a project that creates data, that data can be used for so many different reasons besides the project that created them. And as a repository as the UK data archive, we see this happening with large governmental surveys, academic studies, research center studies and so on. So when we're thinking of the research data life cycle, I've included a diagram there. It starts with the planning of the research, we're collecting the data, we're processing and analyzing, we can finally share our results and share the data. By sharing the data, we ensure that the data is preserved in the long term run, and it also helps others to reuse the data, and then the cycle continues. Now, when it comes to benefiting data sharing, we like to present this example to clearly show everyone that we're not here talking just about researchers benefiting from sharing their data. Because of course it does increase visibility of your work, we get the digital object identifiers or persistent identifiers with our data, your data is going to be stored in the long term. And it also helps your publications because we've seen more and more journals are asking about reproducibility and transparency. But it also helps the funding body because that way, different funding bodies can demonstrate they're making optimal use of publicly funded money. Also, it avoids duplication of data collection and we're going to see in the years or see data management planning, it's very similar with others as well. They are actually asking researchers to make an assessment of the data that's already available. But the public benefits as well, because they realize with hard evidence how the science helps society benefit the society, it helps with adoption of emerging norms, norms such as open access publishing, and of course it helps with compliance with laws and regulations. Last but not least, it also benefits the research participants because it does allow other researchers to use the data that was already gathered. It minimizes data collection on the hard to reach population. And my example here is always usually an example we've seen at a workshop done by UK data service. There was a big call around the migration crisis. A lot of different researchers went and gathered migration data. And when we did the workshop, we've actually invited a couple of migrants to our workshop. And the very first thing they said was, please share my data, I want everyone to know my story. Now, when it comes to data management planning, a data management plan, it helps you describe five different things. How the data is going to be collected, how is it going to be organized, analyzed, preserved, and shared as well. Why might we need data management planning? Well, it genuinely helps plan ahead with your research project. It helps if research members leave, we know people might find different jobs, or they might go on sick leave. It helps you identify different support that you can get, including for example, support from UK data service. It helps you plan storage both in the long and the short term run, ethical considerations and legal considerations as well. It shows accountability for your funder, for your institution, and for the different partners in the research project. It also helps make data fairer and we're going to cover fairing just a little bit. And of course, it fulfills funder's requirements and we're going to see there's more and more research centers in the UK that are asking for data management plans, sometimes with different namings, but this also applies to the European community as well. So the fair principle for publishing data means making data findable, accessible, interoperable, and reusable. You can read more about them on Force 11, but I tried to give a brief overview of what making data fair means. So findable, it's all about using community endorsed metadata and documentation standard and actually having persistent identifiers so others can find your data much easier. Accessible, actually including an availability statement, who is the data made available to? Is there an embargo period? Are there any restrictions applicable to the data such as non-commercial use, for example? Methods and tools to access the data, making it clear, for example, you are using CSV because it's an open format or you are using a proprietary format because currently open software were not feasible to use, that the metadata is preserved indefinitely. So even if, for example, it happens very rarely, a data collection must be taken down, it must be removed from a catalog, from a repository, the metadata stays within the catalog forever so other people can discover that this collection existed 100, 200 years ago and they can actually see why no longer is made available. When it comes to interoperability, interoperability, we're talking about standard vocabularies and standard metadata schemas. So with UK Data Service, we use the data documentation initiative metadata schema. We have our HACIP, an electronis desorai that helps us index all the different collections and allows researchers to find data much easier. Reusable, again, we're going back to making sure that the data is licensed so other people know how to use it. You have used established data quality assurance processes and we're going to see there's actually a couple of tools that you can use for your data quality assessment and it ensures that the data is preserved for the long term. When it comes to DMP structures, they do differ from one funder in one institution to another one. So always, always make sure that you check with your funder and your institutions what their requirements are for data management planning. However, overall, they see this brief section, description of the data, both existing data and new data, management and curation of the data throughout the research lifecycle. How are you going to store and back up your data? We've probably all heard of stories when, for example, the data was only on a laptop and the laptop was forgotten in a train, so this is why it's very important to keep in mind back up. Legal and ethical consideration is very important, especially with legislation when it comes to personal data, to bear in mind these aspects and my colleague Hina is going to cover them in more detail. Data sharing, how are you going to ensure that the data is shared but also usable for others? Because just throwing a CSV file up there in the wild might not be the best way of making sure that the data can be used by others and also responsibilities and resources. In this section, it's all about thinking who is going to do what and what type of resources do I need to make sure that my research project is successful. As mentioned, different research funders, different institutions do have different data management planning requirements. We can see, for example, with Arts in Humanities Research Council, they actually call it the technical plan, but they do include sections about the standard views, the preservation of the data, continued access of data sharing and use as well. So just not just throwing that CSV file out there, Cancer Research UK, they call it the data sharing plan. Again, they're talking about the volume of the data, the format of the data, metadata documentation. So we can see all of them are very similar and the end goal of creating a data management plan is to ensure that your research project is successful. And I would say, even if it is not a requirement from your funder or your institution to do a data management plan, following the structure I've just shown very briefly on two pages, covering the different aspects ensures that your research project will be successful. So, for example, with DMP, the Economic and Social Research Council data management plan includes 10 sections, assessment of existing data, information on new data, quality assurance of data, backup in security, management and curation, a section in which we need to discuss all the difficulties that we can see in data sharing and what measures we have in place to overcome them. How are you going to ensure consent and anonymization to ensure reuse? How about copyright and intellectual property rights? And we're going to see this is very important not only for primary research, but for secondary research as well. Responsibility, who is going to do what, and also preparation of data for sharing and reuse. When it comes to assessment of existing data, usually this section, I will compare it with the literature review, but call it the data review. What existing data sources are about the topic that you want to research? And in what gaps have you identified if any, again, data management plans can be done for secondary analysis? If there are no gaps, the secondary data can be used. It's always useful to have a look at the UKRI Gateway to Research webpage. They provide a list of all the different past, present and future research grants in the different outputs they have. And again, more and more research councils in the UK are actually asking for data to be shared. They are mandating data sharing. ESRC was the very first council in the UK to mandate data sharing as a requirement in their grant proposals. A couple of examples of same data sources. So rather than basing data from just Google, I'm going to Google my topic and I'm going to find some data sources. I'm not quite sure if I can use them very well or not. You can always use the UK data service data catalog. We have over 8,500 collections of economic social science data, but also the CESA data catalog can come in very handy. CESA is the consortium of European Social Science Data Archives. So if you're doing social science research and you have not checked the CESA data catalog, please do have a look at it. It only contains the metadata. So with this data catalog, it takes the metadata that we get data service provide in Germany, CESA, CSDA and France and so on, but you can discover much more data that you could use in your research projects. Information on new data. So say we've identified some gaps and we really need to create our own data. We're talking about primary data here. What do we need to include in this section? What is going to be the volume of the data? So am I going to do a survey, two surveys, interviews, focus groups? How many people? What is the data type? It can always be mixed methods, quant, qual, but you need to specify exactly what data you are collecting. What is going to be the data quality and the formats you're going to use? And again, especially in the open science landscape and in the fair data landscape, ensuring that the formats are as open as possible. So again, maybe making sure we have a CSV file, we can of course use proprietary software as well and all repositories welcome different formats to be uploaded, but having an open version is always best as well. What kind of standard documentation and metadata are you going to create for your project? And if you've never done research before, getting in touch with UK data service or your institutional repository to discuss more about what type of documentation you might need or what type of metadata schemas the repositories are using, it's always advisable. What are going to be the methodologies for the data collection and data processing? And also is the data trustworthy? And this is where I'm going back to making sure that we use data from a fair repository, a trusted repository. We have a variety of resources for information on new data available online. So please do make sure you consult our web pages. But again, I can't stress enough, if you have any questions, please do get in touch or email addresses right at the end of the presentation. And we welcome all queries about data sharing and data management planning. Quality assurance of data. What should one include when we're talking about quality assurance of data? Calibration of instruments, if applicable. Again, some of it can be very time of research focused. Taking duplicate samples, are you going to have a pilot, for example, if you run a survey or if you run a focus groups and so on? Are you going to use standardized data capture solutions? Data entry validation. And I put here our QAMI data tool. It was developed in three open source. Everyone can make use of it. The code is on GitHub, but on our web page we have more user friendly. I would say documentation about how to make use of it. And it gives a health check on your data. So once everything has been entered into the database, you could use QAMI data to see whether is it well-labeled, do you have duplicate IDs, and so on. Describing if you're going to use standardized methods of transcription. And again, this is something that we make available on the web. We have a template that can be used in your own project. And also, is your data going to be peer reviewed? It's by no means a compulsory, but more and more peer reviewing is getting more popular, I would say, in the social sciences as well. So if it's something that will happen, definitely mention it under the data quality assurance section. When it comes to security and backup of data, again, people might forget their laptop on the train. So we need to ensure that the data is stored in backup continuously so we don't lose our data. I would say best advice here is always consult with your own institution. So of course, there's so many cloud environments now in solutions to ensure that the data is stored correctly, and it's safe, and it's backed up. But this does differ from one institution to another one. So getting in touch with the IT people at your institution asking, I am conducting this research project. I really need to ensure my data is safe and backed up. Could you please let me know how this would happen on our system? Most likely they're going to have a policy they can refer you to, and you can always use sections from the policy within the data management plan. The second thing you're always to consider is your data sensitive. Do you have personal data? And again, the security measures here will be different from one institution to another one, but you must pay a lot of attention around who is going to be able to access the data. It has that domino effect on the ethical protocols you're going to use the data management, the participant information sheet there, because the participant information sheet that can send form. So always bear in mind, is my data containing personal data, and always mention it when we're talking about storage and long-term storage of data throughout the project. And also bearing in mind that data security arrangements need to be proportionate to the nature of the data in the risk involved. So for example, if you're doing a quick Qualtrics poll, that's the first thing that came to my mind. It doesn't contain any sort of demographics, any sort of personal data whatsoever. Then you wouldn't need so much security information. But again, backup definitely because we don't want to lose that data. When it comes to management incubation of data, it's really important to think from these three perspectives. How am I going to prepare my data, organize my data, and also document my data? I can't stress enough document my data because sometimes preparing and organizing goes so well. Everything is so nicely labeled and the data itself, but then there's no documentation. And that means more resources will have to be involved later on in the research project to prepare the documentation. If we start documenting just at the beginning of the research project, then we're making our data so much easier. So thinking in terms of the types of data, are you using primary, secondary, is it anonymized? Is it pseudo anonymized? Because it depends how you're going to manage it throughout the research lifecycle and how are you going to curate it. So if we're thinking about secondary data, for example, curation would be very minimal. It depends if we take quant, for example, you might want to create some new derived variables. So saving that syntax there of how you've derived the variables is very important. If we're talking about primary curation takes a higher part in the process because we're talking about making sure that our data is put in the context very nicely. Again, transcription, if it's quality, it's done very well. How am I going to anonymize transcription? Metadata for data description is well, making sure and ideally, I know this differs from one research project to another one, but ideally when it comes to metadata, you would want someone that has been heavily involved in the project to create this metadata, data about data, as they say. Because if you have someone that's involved in the project, they will know it so well, it's going to be so much easier to provide, for example, what was the sample of the study you've done? What was the data collection method, rather than trying to get someone else to do that for you? Key documentation. And here, again, it differs from one project to another and from one data type to another. But if we're talking about quant, qualp, having a short user guide, I'm not talking 50, 60, 70 pages, even five pages describing key things. Have you done a pilot? What was the sample? How did you collect the data? Have you used weightings and so on. For quantitative data, having data dictionaries is very important, especially if the data must be made available under access controls, because that ensures everyone can get an informed decision if they need to use the data or if they don't have to use the data, interview schedules, focus group schedules, and so on. It's important to bear in mind naming conventions for data files as well. And we give a couple of examples on the web page. So rather than just naming files A, B, C, D, and then not necessarily knowing which one is which, actually ensuring, for example, I have wave one survey, wave two survey, and so on ensures that we know what data we're working on. And finally, covering the long-term preservation of data, how are you going to ensure that the data is not lost? And again, this is by making it available via trusted repository. So a couple of examples of documentation. And we like to refer to documentation when it comes to user guides. We can see here technical reports as well, or disclosure control reports, F-study level documentation. It describes the entire study. They're usually in PDF format. They can be Excel, Word, or open software as well. Data level documentation. We're here talking about giving variable and value labels for qualitative data, giving a data list that nicely describes who the participants were, what the interview number is, the name of the file. But also for secondary research is very important to have a variable look. And all of these templates are available on our web pages. So you don't have to start from scratch. You can always have a look at them. For secondary data analysis, when it comes to the variable log information, it's important to know what variables have been used and from where, because there might be consequences depending on the license the original data was under. So even if it's made available under a creative commons license, it might be a creative commons non-commercial share-alike. So the data that you're creating must be shared because of that share-alike under a similar license. And finally, we have capital metadata. Keep going on about metadata, metadata. This is how capital metadata looks like. Most of the fields are actually what we call controlled vocabulary. So you drop the list and you select, for example, what is the spatial units? We have government dosage regions and standard statistical regions. What is the observation unit? So who was the data done with individuals but also families and households as well. But we also have three text fields, such as the population. You would just have to name the population of the study or the abstract of the study. Those are three text fields. There's a lot of documentation that can be made available for data. We do have resources online about what type of documentation, different types of research projects should help. But again, I'll keep insisting, if I'm sure, always get in touch with us. We are here to help as much as possible. And now I will hand over to my colleague Hina for the second part of the presentation. Thank you, Christina. Yeah. So much research data, even sensitive data, can be shared ethically and legally. So it is extremely important that you address this section as well as the following sections very carefully in your data management plan. You need to consider any potential obstacles in sharing your data and explain possible measures you can apply to overcome these. And for example, if you are collecting personal information, how you are going to handle it, you need to think and plan ahead, which data could be potentially difficult to share and why. So you also need to explain the reason for not sharing it. For example, if ethical issues such as potential harm or is there any other concern that could cause difficulties in data sharing, you need to mention it here in the section. And you also need to explain your strategies for dealing with these issues. As I said earlier that most data can be deposited for future use, so is the ESRC's position on that. However, this could be achieved only if the researchers pay attention right from the planning stages of research to certain aspects such as including consent for data sharing when gaining informed consent, protecting participants identity by anonymizing data if required to do so, and addressing access restrictions to the data in the data management and sharing plan before even commencing the research. So for instance, you have collected data which has personal information and you are unable to obtain consent and anonymization is also not possible as valuable information could be lost or any other reason. So you need to stay at this explicitly in this section in your plan. So consent, anonymization and access control are the three important issues that facilitate data sharing and I'll go through these briefly. So in this section of your data management plan, you need to explicitly state about the planned procedures as to how you are going to handle consent for data sharing for your data, whether you are planning to anonymize any personal information in the data, if so, how you are going to anonymize it. You also need to make sure that your data can be made available and accessible for future scientific research and at UK Data Service, we advise researchers to employ a three tier strategy which is consent, anonymization and access control. I'll go through each of these now. So I'm sure you all are familiar with what informed consent is. However, when it comes to data sharing, then consent is used for two purposes. We are all familiar with the consent that is used for research participation and is considered as one of the founding principles of research ethics where it is sought before participation in any research activity and for all participants. It usually involves providing information regarding study purpose, risk, benefits, voluntary participation and so on. However, consent can be used as one of the legal basis of processing personal data under the UK GDPR. If a researcher collects, manages and shares personal data, then consent of the data subject can be used as a legal base to process this personal information. So consent form plays a vital role in data sharing and it is very important that you design the consent form, keeping in mind these three important sections. If you plan to share your data, the first section should be about taking part in the study that includes, sorry, it automatically moves. So yeah, the first section should be about taking part in the study that includes some basics such as, I don't know, Kristina, if you can switch off the timer. The first section should be about taking part in the study. That includes some basics such as participants have read and understood information about the project. They have been given the opportunity to ask questions. They understand that they can withdraw at any time without giving reasons and without any penalty. And the second section is all about the information that is being collected will be used. For example, how the data will be stored for how long, how the confidentiality will be maintained. And the final section should be around providing information about future uses of the data such as publications, archiving data and so on. So this final section in the consent form is really important if you are to share the data for future reuse by other researchers. So this is the screenshot from the UK data service model consent form template. It addresses all the three key areas that I have mentioned, breaking down these areas in three different sections on the template for the user fees. And it applies to all types of data including interview, focus group, surveys and so on. You can adapt it according to your requirements. And at the bottom of the slide I have added a UK DS, sorry, consent form link that you can have a look at it later. So some of the best practice for this section is to state clearly in your DMP that you are going to handle consent for data obtained from human participants. You must ensure that your consent procedures inform participants correctly about data sharing intentions and do not preclude or unnecessarily limit sharing of research data either as open data or on a restricted basis if necessary. Award statements such as data will not be shared outside the research team or clearly explain which data is it, just the personal data or all data. You need to be very specific in explaining this in this section of your DMP in your consent form. So you can refer to the UK DS model consent form which addresses all these issues. And another important issue is never set a time limit on the retention of the data collected from participants or state that all data will be destroyed at the end of the project. Such undertakings are not required by data protection law or research ethics policy and they will prevent you from making your research data accessible to others even if they have been anonymized. I have added a guidance on these issues at the bottom of the slide. So in the UK there is a duty of confidentiality that is based in common law and that occurs where confidential information comes to the knowledge of a person in circumstances where it could be unfair if it were then to be disclosed to others. So there are some exceptions when you can disclose information. For example, if participant consents to on one sharing of their personal data then sharing does not preach duty of confidentiality. Sometimes public interests can override duty of confidentiality as well and occasionally there are instances when you may need to give up data such as code order. So the best practice is to avoid very specific promises in their consent form. As researchers we must adhere to data protection requirements when managing or sharing personal data. So if personal information about people is collected or used in research then the data protection regulation applies and data protection legislations that are most widely applicable to the research data or data protection act EU GDPR which is now called the UK GDPR. EU GDPR is the EU white data protection regulation that was introduced in 2018 and it replaces UK data protection act 2018 that was used until that time. However since last year when the transition period of Brexit ended it is now called the UK GDPR. Currently UK GDPR and EU GDPR both are aligned and they place the same legal obligations on researchers but in future the two pieces of legislation may diverge as the UK has now left the EU. So it will be important for researchers to ensure that they gain local support from their university data protection officer when their research project will span across the EU. So if the researcher based in the UK collect personal data about people anywhere in the world or a researcher outside the UK collect personal data on UK citizens then data protection act and the UK GDPR both applies. However if the researchers are undertaking research projects which span across the EU then the EU GDPR will also need to be considered. So if you are planning to collect personal data then GDPR applies. So you need to explicitly mention in this section how you are going to protect the confidentiality of your participants whether you are going to employ anonymization. Anonymization is a valuable tool that allows data to be shared while preserving privacy and the process of anonymizing data requires that identifiers are chained in some ways such as being removed for example you just remove the identifiers such as name address, state of birth or substitute, distort, generalize or aggregate this identifying information in the data to make it shareable. So in your data management plan you need to explicitly state how you are going to anonymize your data making sure that data will be available for future use by others. And if you have any further queries in terms of this you can always get in touch and we are happy to help. I have added a link to the guidance on anonymization at the bottom of the slide. So another strategy that enable you to share your data for future use is access control. For example how you want your data to be reused by other researchers or students you can specify this by licensing the data to match the intended uses. So in your DMP you need to address access restrictions to the data. You also need to consider if you need to restrict data used by others for a certain period as ESRC expects that data should be deposited after three months of the end of the current period. So you can deposit your data and ask for embargo for a certain time in case you have any publication plan. So just to give a brief overview for our licensing and access frameworks here at UK Data Service we facilitated three levels of access for data, open access for the data that contain no personal information, safeguarded access for data that contain no personal information but the data owner considers a risk of disclosure resulting from linkage to other data and it is available under end user license and users need to register to access data. Users also need to agree to certain conditions such as not to disclose any identifying information and then final one is the controlled access which is for the data that may be disclosing and it is only available to users who have been trained and they granted it and the data usage has been approved by the relevant data access committee. So another important section in the DMP is to address copyright or IP right ownership. If you are planning to use secondary data sources for your project then you need to consider copyright or IP rights. Copyright or IP right are assigned automatically to the creator or the researcher who owns the data. When data are shared or archived the original owner retains the right data archive cannot archive data unless all right holders are identified and give permission for their data to be shared. So you need to address this issue by checking the terms and conditions associated with what with the source that you are planning to use. You need to check if you are allowed to use it, if you are allowed to modify it and most importantly you are allowed to share it for the future use by others. If not you might need to obtain permission to share it. Also in your data management plan you need to explain who will be the copyright owner of the data that is being generated. Is it the original researcher? Is it you or both? If you are unsure you are always welcome to get in touch for advice. So if you are using secondary sources then best practice is to assess who the copyright holder of the data set is. Are you allowed to use them and in what way? Are you allowed to archive and publish them in a data repository? Most of the time we encounter problems when researchers are allowed to use data as the data is under open license or they register to use it or they are allowed to use it for their personal use. What they do not realize is that it is accessible to them for their use but they may need permission from the data owner for sharing it or archiving it. So you always need to seek further permission to distribute material. You do not always but most of the time you may need to seek permission to share material you do not own. So if permission is not granted you may need to remove copyrighted variable or content from publishing or sharing. So you can have further information on all the issues I have gone through on the following links in your own time. So there is a guidance on ethical issues, rights in data, data protection and other rights on our website. So in this section called responsibilities you need to indicate in your DMP who within your research team will be responsible for data management, for metadata protection and for dealing with quality issues and who will be responsible for the final delivery or depositing of data for sharing or archiving. If several people will be responsible state their roles and responsibilities in your in this section of your DMP and in terms of collaborative work you need to explain all these across partners in this section of your DMP. So at the bottom of this slide these two links have detailed information on these issues. Final section of your data management plan should address preparation of data sharing and archiving. To do this you need to consider what would be your plans for preparing and documenting data for sharing and archiving and whether these are appropriate or not. Have you considered and mentioned enough evidence that the data will be well documented during research to enable high quality secondary research. So this is the link to the data management checklist on our website which you could find very useful as it addresses all these sections and that Christina and I have gone through yet and all the sections that you need to complete in your DMP. So a very useful checklist you can find on our website. So here are some useful resources that you can have a look at in your own time. These are some of the guidance requirements templates by different funders and then there is another very useful resource which is called DMP online tool and that you can find it useful as well. Yeah so thank you very much.