 So my talk consists of two parts, first a shorter part which is an overview of the NERC Environmental Data Service landscape and the context in which we operate and then a bit of a meatier part, second part about our services with a particular focus on best practice guidance and training. I already mentioned that the Environmental Data Service consists of five data centers. So I've just put in the beginning a little diagram of who we are. So the British Oceanographic Data Center, UK Polar Data Center, us at the AGDC, so the National Geoscience Environmental Information Data Center and finally the Center for Environmental Data Analysis. You can find more on the website about all of those. And also to keep an eye, we have a new brochure website coming up very soon in the future so also keep an eye out for that. So the EDS is a network of data centers maintained by NERC which provides access to the scientific information and data which NERC have funded. A key element of this context is the NERC data policy which we've had in place for years now. We were one of the first organizations to have one and the key points in the policy which I would like to flag up are that if you are an NERC funded researcher you must offer the data you generate to the EDS for curation and also to enable us to provide reuse access over the long term. We in turn are responsible for curating these data and for making them freely available to all users for any purpose including commercial benefit. NERC will allow researchers an embargo period of up to two years from the end of your data collection period so that you can work exclusively on the data and publish the results before other people have access to the data. To read more about the policy there's a link also to the website. So our work on curating the data this covers all aspects of environmental science so that would include digital data, indexes and lists, models, physical specimens, sample materials and some third party materials as well and our data assets include many unique irreplaceable historical datasets which provide a valuable resource for new environmental research and innovation. And to establish which data we would be interested in we also have a data value checklist which will help researchers to determine the likely long-term value of the data produced by the research projects and it also helps researchers when you're preparing a full data management plan so you know what you are offering to deposit to the EDS and what are the requirements. The support that we provide for all researchers, how do we fit into this? Once the NERC and the research project have agreed their funding we as a service reserve a list of these funded grants and we contact the principal investigator of each grant to initiate a discussion on what is in scope for data management planning and what we both expected to be happening over the project and we work throughout the research project together with researchers to help them document the datasets and to ensure that the data that we get into the environmental data service is fair compliant so it's findable accessible and reusable and also interoperable in the long term. The other word in my talk title was research data management so let's have a look at what I mean with that. There's a lot of literature out there to help you look into that. So research data management the Kerstin Brindy book is an excellent resource for researchers in particular and she talks about RDM being accumulation of many small practices that you make a routine part of your research and they are one element which help you back your data fair so easier to find understand and more likely to be usable in the future both by yourself and other re-users and the Coxson Verban book also looks at our perspective so EDS perspective how the data is created how it's made findable organized stored shared and preserved but that applies also to any research process and a concept that's a bit newer than the RDM is data stewardship which is consists of the people organization and processes to ensure that all the appropriately designated students are responsible for the government data and that may refer to different types of data stewards so they may be technical looking after IT systems and applications domain stewards who know their particular data domain or operational stewards which support these projects why is this important why do we need to manage research data so first of all there's the here and now for the researchers it helps you do better research and optimize the use of your data during the active research and also helps you collaborate with collaborate with other researchers whether they are locally or internationally working with you it also helps it the long term because it ensures that the data is preserved for future and it's easier than to discover it interpret it and reuse it and it sustains the value of the data because there's a lot of work going into creating and collecting it and finally the transparency and demonstrating value so this is important for the funding agencies and also research journals so they require the data to be tracked and demonstrate the impact and the return on the investment I find this just RDM life cycle a very useful way of looking at both the researchers input to the research data and the repositories and catalogs so first of all the eds provide all the support from cradle to the grave so whether it's from the data creation the planning support with planning and designing your management then the researchers are doing the collection and capture and collaborating and analyzing then at the end of the broad sector we come in to help manage and store and preserve and also with the publishing and data sharing in the long term and then ensuring that your data goes into appropriate data catalogs and registers so that it can be reused and cited again increasing your research impact so we provide advice in the beginning of program in the kickoff meetings training and guidance throughout helping with the management data management planning and help you to get the copy of the data to the data centers I'm not going to go through all of this but it's showing you the key activities that researchers themselves are involved in so there's an awful lot of work going on and I'm going to mention the resources required a bit later in the talk so it just doesn't just happen there's an awful lot to think about and the earlier in the project you think about it the better it is so you don't need to wing it I'll leave it there for a couple of minutes so you can have a look at it I will cover these in later slides in a bit more detail also important there are rules and regulations that we all need to comply with you will all hopefully be familiar with these key pieces of legislation so the freedom of information act FOI there's the EIR which is the environmental information regulations and also the famous GDPR or data protection act and therefore if you are not specializing in legislation and law and records management these can be a bit daunting but the good news is that NERC EDS we have the expertise in this area and we can handle requests for the data that we hold so that's another good reason to ensure that your data comes to the data center so you don't need to worry about these a little bit more about the EIR in particular the reason being because obviously most of the data we have if not all of it is environmental data and it comes in scope for the EIR regulations and the key point to think about is there is a presumption that all these data will be available to all people who request them unless there is a clear reason supported by the law as to why they should be withheld and this piece of legislation is the most open of the three so we may have to release data and I can talk about that later if you're interested also of interest to the researcher who puts a lot of work into the data generation is the intellectual property rights so whose data is it at the end of the day the Copyright Designs and Patents Act talks about research related outputs and most of these come fall under literary work and they are therefore protected by copyright databases which are a common output they may be protected by both copyright and database right however facts or ideas as such cannot be copyrighted the usual retention of copyright is 70 years after the author's death computer generated work 50 years and ground copyright even 125 years so the IPR is quite a long-standing piece of legislation how that fits into with the NERC data policy IPR in the data that the researcher generates well it depends on the researcher's contract of employment so usually the employer of the researcher owns the copyright the IPR so if you work for NERC it's owned by NERC the university owns university staff data but it should be in your contract of employment and the policy requirement of depositing your data with NERC does not affect the IPR so you are not handing that over to us however you will be required to grant us a non-exclusive license so we can manage and supply your data for reuse in the long term I mentioned database a copyright in the earlier slide and the TNA so the National Archives explain this as a database is a collection of independent works arranged in a systematic or methodical way so this is about the extra skill and labor that the author needs to organize the data in a database so it's not only gathering the information it is the intellectual investment in presenting the data content in an original manner and the databases are protected for 15 years from the date of publication so that's something that you may be interested in I also mentioned that we require a data license so that we can share the data and make it available so any data that we make available is accompanied by a license which outlines any limitations on how the data may be used how the source of the creator must be acknowledged and also the limits of the liability for the reuse of the data and at the moment we are using the open government license version 3.0 if you want to read more about it there's information on the National Archives website but in brief what people are free free to do is copy, publish, distribute and transfer all the data we hold they can adapt it and they can exploit it whether it's commercially or non-commercially but they must acknowledge source of the data in their product so reuse rights then if the data is under copyright you can reuse it but not republish it if you have obtained your data under a contract with sometimes you may state in the data management plan there may be more restrictions and that may supersede the copyright legislation and if you use data which is not clearly licensed you will need the permission to republish the original data so make sure you've got that right that was the first bit about the NERC context so in the remainder of the talk we are going to look at the research data management best practice so we provide trading and information on the website so this is a quick run through of some of the things that we advise grant holders on. First of all the data policy states that you must agree a full data management plan with your dedicated data center and this is not a one-off document it's a living document throughout your project that must be kept up to date it's you can call it a roadmap to your project plan and related data and outputs or a record of how you manage your data and it's also an agreement between you and the repository on how the data is managed and transferred at the end to maximize access and digital continuity and I mentioned my digital preservation project so that's one of the key aspects for us as well and your data management plan should include information on the following areas of your data management so a data collection so what types of data what formats what volumes how you manage data quality control for example so your variables and units of measurements any vocabularies and standards you use how do you collect your collection methodologies folder structures etc we will look into this in a future slide it will also have something about your metadata and documentation what information is required for future users to be able to interpret your data and how you will capture this information in your projects information about your storage and backup so during the active research project where is the data kept how what is your backup regime how do you control access to it have you considered your security risks and data transfers preservation planning so selection of data for long-term storage what effort is required how long is the data kept what do you get rid of how do you dispose of data data sharing again during and after the project so you need to think about your discovery metadata and licensing data citation and DOIs and finally the responsibilities and resources required which is often just an afterthought so who does all this and who is down to for us to contact for example very important then we are having a look at some of these areas that you are going to think about as part of your data management planning as you collect and create your data you need to start organizing your data so we don't have a blueprint for a system for you but it's important to have a system and use that system consistently and key points about that you always need to put your data in the right place so you decide what is the master location so you or your team may work across multiple machines so have a system which mandates everybody to store the data in a master location in a central location when they finish their work and do not store that on your laptop hard drive or a usb or chocore your best system however will depend on your research workflow and there's as many of these workflows as there's research projects so please talk to the eds because we can help you with any queries you may have but a good system is also simple and flexible so it's easy to use for you and your team one thing about research organizing your research data is in the planning so folder structure is something that i'm told new researchers may not be very familiar with because they use big bookit however in stem field folder directories are still very important and you can structure them depending on what your data is so for example by your dates sample numbers instruments data types it's up to you your research workflow again arose by any other name would smell a sweet what is in a name organizing your data also includes developing your naming strategies and these will apply to all files folders and physical samples equally you need to think about these in advance to make it work using consistent names within the project because this helps you avoid duplicate data especially if there's many people working in the project and also at the end makes it easier to sort out and analyze data if you have different data types you may decide to have several naming conventions in your project not just the one so what information should go into your your name so you need to think about all all your data variables and how to document them clearly to make it easier to interpret your data examples of good naming file file naming strategies are first don't have too long names because if you start moving data across different platforms and systems they may cause trouble so try to design a strategy which has up to 32 characters and which are also consistent and specific if the data file is moved from its current location to somewhere else best practice is to use either capital letters or underscores in between the words so don't use spaces or especially special characters for dates a space practice to use the ISO standard so year month date and to help you include as much information as possible if you have used acronyms initials etc document them in a text file read me text file because those 32 characters will not include everything there's just not space couple of examples there of a file name obviously i've used whole words there but you may have acronyms if you haven't thought about this in the beginning there are some renaming tools but it's easier to do this from the beginning of your project rather than do the renaming at the end also important not just the data itself but when you offer us your data we ask for a metadata entity to accompany your data and consider this as a roadmap to your data so the key fields include the title of your data set which needs to be a very clear and concise indication of the content so people looking at the name title understand what your data set is about so it needs to be understandable brief and simple however if you use acronyms abbreviations they need to be explained if there's no space in the title then data description can be used to capture that information and the description then or it can also be called a data abstract it's a summary of the content of your data set which allows the re-user to determine how relevant and useful the resource is to them it is written in clear English in complete sentences not fragments or bullet points that is not good practice a good description will tell the users some of these things they are not always all relevant most of them are however so what is the data set about what's been recorded what form does the data take geographic coverage if it's relevant data collection period the collection or creation methodology why was the data collected and what's the purpose who was responsible for the data collection and interpretation did you actually take some data away from the data set and why did you do that and the linage statement so when you go and deposit the data these are some of the questions that you need to answer before you deposit the data you may also provide some additional data documentation that you have collected and generated throughout your research and they may include various different types of documentation from research notes there's Darwin's page of research notes as a little illustration the research methods so how you acquired your data set step by step any laboratory notebooks read me text files they can be used for anything that you find them useful for and you can create templates of them so you can use them for the next project as well any data dictionaries and databases so if these are necessary to understand and interpret your data you can offer those two ideas as well I mentioned data protection earlier in the legislation and this may not always be relevant to environmental research but if you are working on into domain research or doing surveys or interviews it's something you may need to consider so is your data sensitive do you need to collect it if you don't do not collect it at all and if you need to keep it make a clear plan how you protect it and keep it secure and don't just do it once but review it regularly and think about who's going to access it do you need to encrypt part of the process when can you destroy the data and do all people involved understand this so is there a training requirement as well data security is a slightly different aspect because it's more about restricting access to the data and there are some strategies as and while we have a look at this think about how many of these strategies are you already using in your own data management work because failing to prepare is preparing to fail so is your software up to date is your anti malware and firewall up to date do you always practice safe computer use it such a strong passwords not sharing those passwords do you need to control and lock access to your data files or use encryption and in the end destroy unwanted data some examples of strong passwords are three random words so purple knitted sweater don't use that don't make your own number of words to special characters again something that makes sense to you at tenor and bgs we use the two factor authentication so that's something if you have access to please use that you may also have a passport password manager which there's an example of this is not a recommendation it's just as an example data storage and backup so this is both about the active research and after the project the long-term storage so in active phase you may have your data on network drive or hard drive or in the cloud the data is fluid and it's therefore at risk it may change in the long term storage where the data is preserved at repository that means it's the final safeguard version of the data i mentioned the backup earlier and if you are not using three to one backup here is what it means and it's a very good thing to do have three copies of all your important data on at least two different storage media and one copy in a different geographical location because lots of copies keep your stuff safe if even if you lose one copy you will have other copies secure in your other locations so and also you need to test your backups do not just rely on it working you need to test it and if somebody else manages this for you a storage provider check with them how often they back it up and consider what is difficult to replace so the more valuable your data the more important it is to back it up and part of that is also your file versioning strategy so you're all familiar i'm sure with the method of having ordinal numbers for a major version and decimals for minor changes so 1.0 is the original published 1.2 has had some revisions then 2.0 second approved version never ever use labels such as revision final final final definitive copy because in next year you will not remember what those meant also you may work in different storage locations remember to sync your versions so you don't need to start looking at dates and trying to establish which is the final version of which you can discard and do discard any obsolete versions that are no longer necessary sharing a cake is easy sharing a data doesn't just happen with a fork and knife so you need to think about a future reuse can your colleagues find your data can they understand what your data is can they use your data now or in 10 years time or for that matter can you use your own data in 10 years time what you can do to make this easier and use or usable for your colleagues is those things that we covered earlier about naming convention metadata and documentation providing that clear user license and access to the data protecting your data from unauthorized users and storing your data in a readable format to maintain that long-term accessibility if you're working in a collaborative project there's a little bit more to it than just managing your own data you need to think about those other users so you may need those data storage managing and herding not in that cuts in this case but little signals so again you need to in a research team think about and agree on those naming conventions data versioning so that everybody is doing the same thing any common file directory structures file formats any conventions that apply on your data collection so plan all of those layouts before you start instead of having to tweak them later on in the project you where the data will be stored and also who owns that data if several organizations are involved and whose responsibility it is to manage the data I mentioned resources are important so you need to think about this earlier as well all the time and effort that needs to go into this work from creation to management and implementation of your DMP the people and skills that you need to manage this data management work from data analysis to quality and then the collaboration and the storage and technology so managing all the backups security and also software models and code that's also in scope before then back to the EDS so before you come to us and over the data I mentioned the data value checklist so you can use that to help you decide what data to offer us and there's also an awful lot of best practice guidance both on the EDS and individual data centers websites so please go and have a look there because a lot of answers may already be answered there a lot of questions may be answered there and think about those additional documentation and metadata which I mentioned earlier so you can't just dump everything at the data center in the end you need to prepare to make it a good quality valuable data set and you may even prepare a data transfer action plan which has that this is an example so your action plan may be different you have your up-to-date data management plan so check if that's up-to-date so the data center know what they're after and sure you have are entitled to deposit all the data that you have or do you need permissions can it be all shared have you named your data appropriately used suitable long-term resilient file formats divide your data into meaningful units because large deposit may cause trouble during transfer so think about how big your deposits are and do you need an embargo period at the end of it or is it open straight away and once you have given the data to us NERC expect the access to data underpinning research publications to be provided and it's mandated by both publishers and funders so there will has to be in your publication a statement on how the supporting data and other research outputs and materials can be accessed by re-users proper citation references anything created by something else including the data the benefits of citing your data and making your data sets a data collection siteable well there's a lot of benefits but I've got a few here so the reuse acknowledges you as the author makes identifying the data easier it promotes transparency and more importantly the reproduction of your research results for NERC it allows the impact of the data to be tracked and it also provides a structure that recognizes and rewards data creators so we help you by providing DOIs for you which is an internationally recognized standard we expect that you come to us to ask for a DOI and don't have them minted somewhere else so we can monitor the impact of the data that we hold and we facilitate the citation using these in the last statistics and Kate may correct me if these are out of date by now we had nearly 3000 DOIs minted already last time I gave a talk and last year over 600 so this I think is on the increase and also where your data is found is in the NERC data catalog service which is a searchable and integrated web interface for all NERC EDS data holdings and that makes data easily discoverable it gives a worldwide exposure to a large audience and there are links to larger data portals as well so it's not the end of your data there are almost 12 000 datasets available so if you are looking for data for reuse please go and investigate there that's all I have time for today and thank you very much for listening I'm going to stop sharing now. Thank you so much Yana that was a really insightful and detailed talk there it's now an opportunity for those that are here and have any questions for Yana to post in the in the Q&A whilst you're doing that I've got a couple of questions actually Yana if that's okay and so my first one is how is it how can environmental scientists using the NERC EDS help ensure they follow fair data principles so that findable accessible interoperable and reusable so yeah how can environmental scientists using the NERC EDS ensure they follow those principles? We are enhancing our best practice all the time we don't stand on our laurels and one of the key areas is the communication between the data service and the grant holders and their teams so that's exactly why we want to engage with the grant from day one so we can ensure that the fair data aspect is taken into consideration as early as possible and integrated in the project and the data management so it's not coming as an afterthought. So it's integrated throughout and that's why it's important for grant holders to to engage with the process for it. That's it yes. Brilliant thank you very quiet in the chat. Kate's updated the DOI figures thank you Kate. Do you want to take that? We have over 3,000 now and even this year over 500 so that's brilliant work across the EDS. I have another question in directed to me in the chat so can data in the EDS be accessed with an API or does it have to be static file extracted? Good question. That is a good question and I'm sure there's a technical member of the EDS somewhere in the chat that might be working on the on that area. I unfortunately have not. So I might be able to answer this a little bit. So I think it depends on the data whether there's an API to access that so there are some data that are brought together through portals. I think there's the there's a soil portal for example and I think some of that data is shared as an API but I don't believe all data is available as yet as an API. I think that's a sort of a work in progress and it's based on priorities. Kate I don't know if you've got any thoughts on that just being from a different data center. Yeah certainly it's something that we're actively looking at. It's something that we want to do but in it varies across data centers. So each of the data centers holds quite different data, data different formats, different sizes. So it might be a bit more pickable to some data centers than others currently but it's definitely something as an integration activity across the EDS that we're looking at in future. Thanks Kate. So I've got a few more coming through the chat. I'm just trying to manage the QA bit. You can try and put the questions in the QA section rather than the chat that helps me out but I can see there's a couple in the chat. So Steve Lloyd is asking is there any recommendation regarding documentation etc associated with data being in a plain text format rather than something proprietary? I can definitely take that especially on behalf of the NGDC because we maintain a list of preferred file formats on our website and I'm sure other data centers are doing similar things. If you provide data in the formats on that list then that's the best practice. It doesn't mean we exclude others but we may want additional metadata to ensure long-term accessibility. So for example plain text format, yes that's definitely on the best practice list. CSV files, PDFAs, things like that. If you have no option but to provide a proprietary format then please talk to your data center and we'll discuss what additional information you may need to provide about the software that it was generated to the version etc. Thanks Shana. So another question. I'm interested to hear a bit more about the approach for providing DOIs for large data sets for example long-term monitoring data sets. What is the approach taken for issuing a DOI? Would the EDS typically provide one DOI for the whole data set or would multiple DOIs be issued for say different years or measurements etc? I'm sure Kate will have something to say about this as well but first I can say that the NGDC may provide you both. So an overarching DOI for the whole data set and separate DOIs for any collections, sub-collections of data. But Kate? Yep what Yanis said is correct. So we can offer DOIs at a collection level or individual dataset level and really it's up to the depositor in collaboration with the data center to decide on the granularity of what is a data set. So this is a question that comes up repeatedly you know what is a dataset what makes up a dataset and it's really up to the individual depositor how they think that users might want to access their data. What is a sensible chunk of data to be putting out there? Maybe it's based on the data that's been used in a publication for example. So that can be sort of drawn out when you're agreeing your deposit with the individual data centers but certainly we can do both things. Thanks Kate. Thanks Yana. So another question in the chat so do you have views on other pits besides DOIs e.g. IGSN arc IDs with 16 million BGS physical geo-science specimen it will be a lot of DOIs. So pits are regardless of what they are always a good thing to use because we can always link to the use of those and provide them as additional metadata but from the point of DOIs Kate have you got a comment on this? Yep so we use DOIs and we use data site DOIs mainly for datasets. We can use them for other things as well but maybe it's not sensible so I know that some of the data centers are looking at things like IGSN for specimens because some of the data centers and their host organizations hold a lot of physical samples so they might be looking at more appropriate pits for that sort of thing. We're not limited to just using DOIs so we could use other pits and again that is an area that we're keeping abreast of and seeing what the community wants really and what we can help provide. Thanks again. So another question I have is how do the data centers or the EDS handle IP in the datasets are provided? Are all the datasets open access or are there some that are closed? I'm not ruling out that sometimes there may be a requirement to restrict access to data but we prefer not to do it so the first thing is to try and make all the data openly accessible so it would be a case by case basis looking at what requirements they might be to restrict that data. Is there anything under the environmental information regulations that we can use to restrict the data and if somebody challenges that restriction in court would we still have to release that data? So another thing is to look at it data products are not in scope for what we take in so it's mainly raw data and the restrictions for raw data are quite rare. If the data is something that is licensed by say the host organization that obviously that is outside the scope for this. Thanks Seana and I think Steve's clarified what the PID is in the chat should anyone have a comment on that so thanks for that. I think it might mean something different. Yeah I was wondering the same. Yeah I think there's two things that there are yeah but someone's cleared it up in the chat so that's fine. It's persistent actionable identifiers yes I was going to say I'm not sure there's personal data in the borehole records for example so persistent actionable identifiers is what we need my PID there. Thank you Mike. So oh here we go Chester Sanne so does NERC provide access to data management training or I know you're on a cancer this particularly for early career researchers database concepts creation and management are skills many researchers are not familiar with. So at the NGDC we have an annual RDM training course for the PhD students that are hosted by PGS and we have plans to expand this training and it's something on our agenda to start sharing the training more widely because obviously the data types across the EDS differ so there will be some modules which I don't currently provide which would be required from other parts of the EDS but yes training is something that we are very interested in providing more widely. I think that's a really good point Chester and I think actually hearing from someone like yourself what you would find useful would be really valuable to us you know we want to try and tailor our services to to those end users so if you wanted to drop Janna an email with any more details about what you think you would really benefit you as an early career researcher presuming you are or on behalf of early career researchers yeah please do do drop one of us yeah put my email in the chat please drop her and we can then consider it because yeah actually understanding what what's required and needed by the end users is really important definitely and just to jump in as well we're hoping to put things like that on our new brochure website the NERC EDS brochure website so that's hoping to to release that early next year and that's hopefully going to be a place where we can put training resources so that not only NERC researchers but all researchers can access this kind of information. Indeed and we have also just sent to the NGDC web team a long list of resources that we are making available on our individual data center website. I don't think it's live yet but it's on the way of being live and this will be also linked to the main EDS website. Do we have thanks I've got some just comments in the chat that their questions are such um yeah I think that's a native point rather than a question in there um any other questions I'll just I'll just wait for a few minutes to see anything comes through. Do our panelists have any other questions Brianna? You can speak out loud because I should so before anyone comes in I should just say we've got a panelist of people from across the the NERC EDS and the Constructing Digital Environment here I forgot to introduce them so Kate Harrison for example is one of our data center I'm going to get it wrong Kate you're going to have to say at EIDC? Yeah EIDC operational manager for EIDC that's the Environmental Information Data Center so we deal with threshold and freshwater data generally. Thanks Kate and then we've got we've got Steve Hallett so Steve's going to ask a question I think. Yeah hi thanks for that yeah I'm Steve Hallett one of the digital champions um Yana thanks for a fascinating talk really interesting I just wondered um we hear a lot of talk about data ontologies these days and I just wondered to what extent um thinking was uh had evolved around um forming data ontologies for some of the data themes that you've you've mentioned in the uh across EDS I just wondered where thinking was about that. Again we have a team of people working particularly in that that space uh both at the data centers and across the EDS and there will be somebody who is much more involved in that work so I can certainly find out for you. Yeah I've been interested as well thank you.