 Welcome everyone around the globe to our short course today on efficient data handling throughout the data lifecycle. So you can either enjoy the whole course or you can jump to a certain section that should be linked in the description of the video down below. Okay, so the title of our short course is handling your data efficiently from planning to reuse tips and tools to save time and nerves. With this, we hope to make your life with data more efficient, enjoyable and successful. And additionally, we are aiming to provide useful links to get you started. And we have a very motivated team of conveners for you today and consists of Jessica, Bay, Alice, Nikolai and me, and we will introduce each other step by step throughout the presentation. So my name is Marcus Schmidt, and I'm an early career researcher at the University of Göttingen in Germany, and my field is soil ecology, and I'm also very interested in everything that has to do with data. We will go through a wide range of interconnected topics today, ranging from data management plans and version control to offer reproducible data manipulation, working on clusters and data publications. Our first speaker will be Alice Fremont, a scientific data manager at the British Antarctic Survey and co-chair of the World Data System Early Career Researcher Network. So welcome, Alice. And thank you, Marcus, for the introduction. So now I will share my screen. Yeah. So in this presentation, I will introduce you some principle of research data management and show you how it can help you not lose your data because who has not ever lost data? You may have accidentally deleted your file by clicking on a blank button and control your hardware or software failure or computer various, but the result is always the same. You have lost your precious data or code and it's a huge step back in your research. So research data management is an effective handling of information that is created in the course of your research, and I will show you some principles that will help you to preserve your data. So why is research data management important? So research data management will not only help you not lose your data, but also help you as a researcher. So research data management on a day-to-day basis will increase your research efficiency, avoid data loss, and in the long term, you will save time and nerves. With the publication of the data, you will be able to increase your reputation by increasing the number of the station. It will cross your scientific integrity as the publication of the data will help to verify research findings over time and you will meet requirements from printers and legislation. Research data management will also help science as all others will be able to reuse data, enhancing new scientific approaches for education or big data analysis. And it will finally promote innovation in your field. So you have on this diagram an example of a data lifecycle. It shows all the sequences that a particular unit of data goes through from planning, collection, preservation, publication, and reuse. All of these stages are essential to ensure a maximum impact of your research data. In geosciences, data sets are often unique and very expensive to collect. So focusing on all five of these stages in the data lifecycle allows the most value to be gained from data collected. So in this webinar, for each stage of the data lifecycle, we will give you some tips and tricks to help you handle your data. So in a first stage, I will give you the key to write a data management plan and apply in-project data management best practices. Then Jessica will show you how to make your data reproducible using R. Marcus will explain how to use Git for versioning. They will explain how to work on clusters and finally Nicolai will give you advice on how to publish your data. The ultimate goal is to make your data fair, findable, accessible, interoperable, and reusable. So as part of best research data management practices, the first step is to create a data management plan or DMP. A DMP is a document that outlines how data are handled both during and after a research project. It is usually created at the start of a new project before any data has been collected and used through the project, helping researchers to manage data at every stage. So most DMPs cover the same information, they may just use different headings and terminology. But the main section and topics covered in a data management plan are the same and consist of the following. So the first section is always a description of the project. Then roles and responsibilities are detailed to determine who will own the data and have access to them. The data acquisition section describes the data collected and the amount expected in terms of size, for instance. Then you need to describe the in-project data management approach, such as naming convention and formats. As part of the data description section, you will need to discuss the standard use to describe the data, such as the metadata, and the legal issues and ethics section describes the licenses to be used. So data management plans are not a required document to apply for grants and most research councils will ask for it. It is very likely that your project does have one. If you are an early career researcher, I highly encourage you to ask your scientific team about it so you can learn about sharing policies and standards that are specific to your project. A data management plan is a live document, it is possible to change it. If there is no data management plan for your project, it's not too late. You can still write one and it is now part of good practice of a researcher. There are lots of tools that can help you write one, such as DMP online, DMP tool or DMP OP door. So you have on the right panel an example of one of these tools, so DMP online. So the tool will help you design your DMP following the requirements of funders. For each section of your DMP that I described earlier, you will need to answer a set of questions in the text provided, in the text box provided. Guided notes are also given on the sides to help you answer the question. And a good advantage is that you can share the documents with the members of your team. And when you answer all the questions, you can download the document either in Word or PDF. As part of your DMP, you will need to describe your in-project data management approach and answer questions about backup and data structure. Data loss is not always linked to software or computer damage but can be linked to bad organization. So for example, you don't remember where you put your file or how you named it and you can spend hours trying to find what was the final version of your document. So first, I advise you to use folders and structure them directly. You have here an example of folder structure. So group files within folders, so information on a particular topic is located in one place. So start with a limited number of folder for the broader topics and then create more specific folders within this. It is also important to add to existing procedures and standards. In order to remember the name of your file, use naming convention. Be consistent when you name your file. The name of your file may include the date of creation, the name of a creator, a short description and the format of the file, as shown in the example in green. The problem of the name in red is that you won't be able to determine if it's the last version or not or who made the last change. Also, it is recommended to not use spatial characters and avoid spaces or full stop in the name as they might be interpreted differently depending on the system you use. Also, separate raw and working data. This is extremely important as you don't want to make any changes to the original raw data file. To avoid changes, I recommend to make your original data file read only so it cannot be inadvertently altered. And finally, use backup. Use a three to one rule for all your important documents and data. Keep at least three copies of each important file on at least two different devices or storage media and one of which should be off site. Now that your data management plan is ready, you have decided on your naming convention and file structure and schedule your backup. It is time to analyze your data and write your codes. So now, Jessica Clayton, PhD student of soil ecology at the University of Cologne in Germany and member of Bonaris project implement. We'll show you how can be used to make your data reproducible. Thank you, Alice. And thank you everyone for joining us today. As I said, I'm going to be talking about reproduced reproducible data manipulation in our, in particular using the tidy verse packages. I'm going to share my screen for you now. So why is reproducible data, data manipulation so important. To publish your results, you have to handle your data and perform statistics, which is a major step of your methods and it should be documented just like you would your lab report or sampling protocol. So if you have a documented data manipulation method means your method is reproducible, you will get the same results with the same data that you use. It's repeatable so that you yourself can repeat the analysis you did, or if your colleagues want to develop your research further. And it gives you a data integrity. So you have evidence of how exactly you got to that result. And that can be very useful if a publisher or reviewer wants to know exactly how you got to that conclusion. So if you follow me, if you followed all these steps and have reproducible method, then your data should be a very high standard, which makes it suitable for reuse in meta analyses, which gives your data a wider impact in the scientific community. So what is our, it's a well used and well renowned statistics program which uses the command line. It's open source, meaning that all the functions are transparent and customizable. It's been around since the 70s. So all the functions are tried and tested. And it has a huge online community in GitHub and Stack Overflow. It also comes with a very user friendly environment called Studio, which have shown here a screenshot. So in the top left, we have a text editor. This is where you save your code in the script, and then you can run lines of code directly into the console below. And the top right, this is your global environment where all of the data that you're using is kind of stored virtually within the program. And the bottom right, you can look at the plots that you're producing, install and load packages, access the help functions, and have a look at your file structure there. So if R is a statistics program, why do you, why would you use it for your data manipulation? Well, there are many packages with useful functions which allow you to do all stages of your workflow within R. For example, importing your data, transforming it to the right shape, analyzing, modeling, and visualizing in the powerful GG plots package. You can program it to automate and loop. And finally, you can publish. So all stages of your workflow are reproducible. You can comment throughout giving a commentary on your work. So what you did and importantly, why? There's no need to access your raw data. It's all imported kind of virtually inside so you don't make any changes there. So you avoid errors every step of the way and save your time, save yourself time and nerves. So I'm going to be showing you some of the packages which have very useful functions for data manipulation, and they belong to the tidyverse packages. And these are the ones highlighted here. So if you want to perform statistics or use the GG plots package in R, your data needs to be of the right shape. And this follows the tidy data principle. The golden rule is that every column is one variable, each row is one observation, and each cell is one value. But often when we create data or receive data from others, it's not always in this shape. So we have to reshuffle it somehow. And thankfully, in the tidy our package, there are functions which help us to do that. So we created some example data here of soil temperatures which were taken at four different, four different land use types on four different dates. And you'll see that the data frame on the left hand side doesn't fit the tidy data rule, because the dates, the date variable is spread over four columns, which means our temperature. In addition to the dimensions of temperature, there are four on every row. So, but we can use the gather function, which takes our dates here and puts it into its own column. And similarly with the numbers, our values, the temperatures, they go into their own column called temperature. And this is a very nice long computer friendly format. But if you want to get back into the human friendly wide format, we can use the spread function, which does the exact opposite. You'll also notice that the site column doesn't fit with the tidy data rule because there are two pieces of information in the one column. There's a land use name and a number, which is the replicate number. So I'll show you how we can deal with that next. There's another function in tidy our package called separate, which does exactly what it says on the tin. It separates your column into more columns and in this case we want to separate into land use and replicate and we tell it, it's separated by an underscore. Once we've done this, we have the data frame on the right hand side, where our variables are separated rightly into the two columns. Now, if you want to go back and have your data the way it was previously, you can use the unite function, which does the exact opposite. So now you have your data in the right shape. You want to start your analysis and the deep layer package with entire diverse has many functions or so many functions that will help you with all your data manipulation needs. For example, sub setting your variables filtering observations to a logical criteria and creating or modifying variables. So what I find really useful are the summarize and group by functions. So the summarize function takes a column of data and, and boils it down to one value. So this could be a statistic like your mean or some deviation. If you use it with the group by function, this will calculate a mean of whatever statistic over every single treatment that you have in your data. And this can save so much time. I don't know if you've ever use pivot tables in Excel or even try to do without a pivot table. This can get quite messy quite quickly. So you can do this in a couple of lines of code. And it's lovely. What I really like about the tidy verse packages is that the code has a type of grammar which makes it really easy to read and understand. So your whole workflow reads like sentences from a paragraph. And this is helped by the function pipe. You'll notice at the end of each line. There's a funny symbol. It's a type function. And basically what it does is it takes the result of the operation to the left and pipes it into an operation to the right or underneath. So you can start by reading in your data, gather into the right shape, separating out separating out your columns, and sub setting your data, creating new variables filtering and then finally grouping and summarizing. So there you can pipe your results directly then into a statistical model or into the gg plus function and make some nice graphs. I also wanted to show you how you can use our to reuse your script and automate your analysis. So in the playoff package, many of the functions come with suffixes, like underscore at each and all, which allow you to use a function over many variables in many columns in your data frame. And similarly, in the per package, there's a function called map, which applies a function over a vector. So this is much like a very elegant loop. You can make many functions to multiple data frames all at once. And I use this particularly to make multiple graphs using a gg plus function within the map function. And finally, a very useful piece of code is the map df function, which allows you to import and merge many data files into one data frame in your environment. It saves you typing out hundreds of lines of code. It makes your code look really elegant and streamlined and saves you a lot of time. So to end my part, I just want to give you a few tips to save time and nerves. The ultimate goal of your work should be to publish and share your method, which means you should document every step that you make. Even though that sounds daunting, just think somebody's probably tried to do it before. So there's probably already a function. So if you can Google, you can use our Make sure to make lots of comments. So you know exactly what you did and why and try to reuse and automate your code where possible. And this can be really fun so you can be creative. Now I'm going to pass you back to Marcus, who's going to show you how you can keep track of all your changes that you make to your code using git. Well, thank you very much, Jessica. So git is a great tool that you can use through almost all of the data lifecycle and it's actually perfect to use together with our Now this was of course where everyone was in the room. I would ask everyone if they've heard of git and probably everyone would raise their hands. But if asked who's actually using git probably not so many would raise their hands. So the reason I want to share this with you is because I've been exactly through the first three of these four stages of data loss. Denial, anger, depression with some sleepless nights over the safety of my code and my data, but luckily never acceptance. So this is why I found it. So what is it exactly and why should you bother consider using it. It is a system for version control of your code and data and tracks your changes and enables you to go back to previous versions or make tries in recoding without overriding everything you did. So it also facilitates working together and coding projects. So you don't have to spend time sending your files back and forth by email, which believe me also results in depression. So it's also free to read to use so there's no risk involved. And I truly believe that every minute you spend with git will save you 10 minutes later on. There's no more version confusion, losing your code that you had somewhere or transferring it home on USB stick. But a very often asked question is, how do you get get running. So how can you get it running on your computer. Now that depends a little bit on what operating system you're already using. In any case, you need to access the terminal, which you see on the right, which is your command line interface to talk to get. Now for Windows first download and install git bash, which is a combination of git and the terminal you will be using to talk to. On the Mac, your terminal is already there and you can find it in your utilities. Use git dash dash version, which should lead you to an update or tell you that you already have get installed. Now if you haven't already, I very strongly encourage you to switch your home computer or laptop to Linux distribution, because these operating systems are free, and they're much faster, more reliable and way more straightforward than the usual operating systems. I promise you that this will make a big difference in the way you work. And at least for me, it did. So the commands for installing git in Linux are written here, and they slightly differ depending on the Linux distribution. So please check out the link below for more details on these installations. Now you can use git just to track the code and data versions on your computer, but using GitHub additionally is much more convenient. Think of it as the housing facility of your project. It's an online repository where you can store your project, access it yourself or have others cooperate on your project. You can set up a free account and let git know your username and email, and these need to be the same as in GitHub. After this, cloning a project is quite easy. You just need to do the following steps. So in GitHub, you can go to the clone or download link of the project and copy it, or you can first create your own online repository. Once you have that link, going to our studio, for example, you can start a new project, file, new project, version control, git. And then you add in the link, and there you go and can work with git and GitHub on your R project. Now our studio, for example, lets you communicate with git in a clicking kind of way, but sometimes this can also be confusing. And a better way is to use the terminal directly, which you also find in our studio, and there you can put in your commands manually. This is giving you more control over what's actually happening. For this, I will introduce to you the six git commands you or in fact I cannot live without. So to understand the basics of how git is operating, please read a little bit first, for example, in the link that we're also providing here. So the first command you cannot live without is git at a. A is for all. So this stages all your changes and files in the current folder of project. So basically it lets git know that this is what you want to track. Mostly that's all your files in the project, but you can make exceptions in the so-called git ignore file. This can be useful if you're producing a lot of graphs, for example, that you don't all want to track every time. A very important command in git is commit with the following message. So this commit message will let you help later to reconstruct what you did and which files you change when. Now, even after committing the changes in your project are not yet in your GitHub repository online. For this, you need to push them up with git push. Now, once in a while, you may not know if you're remote, your online repository is up to date. And when this happens, you can find out by git status, which lets you know what you changed recently and which branch you are in. Branching is a very important aspect of git and it's also part of the git logo. It is mainly used for experimental coding, solving a certain problem or working together with others. Basically, you can think of it as leaving your main road to try something else. And if you like that and you want to incorporate it, you go back to the main road and pull it over from there. The command for this is git checkout be for branch and lets you switch to a new branch, though that your main branch usually called master stays the same until you're sure of what you did and you want to incorporate your branch into the master. For this, it's called merging and can be done with git merge develop develop is just the name of the branch, but it could be any other name of the branch that you set up before. There are several ways to trace your changes in git and that is the nice thing. And one convenient way if you're working in Linux is git g, which I'm showing a part of here. Now every commit is listed and you can go through them and actually see what was done. In this case, the red part was taken out where the green part was added since the last commit. So in this case, I changed the ggplot in R to execute the colors. So I hope that this information encourage you to try it GitHub or even Linux. This is a whole new world to explore and the real game changer. So for more info can be found, for example, in the free gift pro book, but also other people show you have the same questions you had, so your trusted search engine can help you. And finally, it is invaluable to know someone who's using git GitHub or Linux. One field where git is often used is development of climate models, which produces large amounts of data and code. And for this topic, I would like you to meet our next speaker, my good friend Faye, who is a PhD researcher of climate and extreme events at the Free University of Amsterdam. And Faye is also a visiting scientist at the Royal Netherlands Meteorological Institute. Okay. Hey, everyone. Thanks for joining us today. My name is Faye and thanks Max for the nice introductions. And now I will show my screen. Okay, so today I'm a climate modeler and working with climate models and our system models today I'll be talking about how to work on clusters and how to deal with large climate or meteorological data set. And last but not least how to book keep your data set. So to start off, working on clusters. What is a cluster? A high performance computing cluster is a collection of several or many different computers or servers we call nodes that are interconnected and can perform parallel tasks, where as a user you can just log in remotely from your own device via SSH secure shell line, which I show on the left bottom as an example. And when working on clusters, you have to keep in mind two important considerations. One is that a temporal requirement and another is size restrictions, because you want to have an designated folders for specific purposes. Every data set has their own properties. For example, you want to have your home or permanent folders for the file that's small in size, but you want to keep permanently a longer, whereas for scratch folder, you can perform your simulations or analysis with that large data set and also run computer simulations, climate simulations, but do keep in mind scratch gets erased frequently so you have to babysit your scratch folder a little bit. So basically it boils into one question is how long you want to keep the data and another is how much code and size you have for keeping those data sets. So after you know where to put what on clusters, as a climate modeler or other kind of modeler work, it's often important to do pre processing work before you actually start your analysis or visualization for your data set. So for pre processing, and normally the raw output from the models are not in ideal format, and the set can be huge from gigabytes to terabytes, which is impossible to work on your local computer. And often you'll also have joined project with other collaborators you want to have standard names for your files and data. That's why you have to take care of those parts before you actually start doing analysis or visualization of your data of the results. And as for someone working in climate or meteorological field, I would love to introduce you a tool that I often work with which is called a CDO. CDO stands for climate data operators and were developed by DKR Ted and the Max Tank Institute for Meteorology in Hamburg. It is widely used in the climate and meteorological community because it has more than 600 operators that help you to pre process, process and manipulate your data efficiently in an elegant way. It is so popular as the command lines are so straightforward in my opinion, and often the CDO comments are well documented with examples and supporting communities who are ready to help you answering the questions. So how to get CDO started on your local computers or clusters, it depends if you work already on cluster or terminals, you can simply just use module load CDO. Normally it's already in your modules. And for Mac user, you can use port or bureau functions in your terminal to get CDO. And other platforms and for also for quick introduction of CDO so you can follow the link I provided here to get some more information about how CDO works. Here I also show you some quick examples of how I normally work with CDO on my data set. So on the left terminal parts and whenever you have any type of data or the data just outputted from your model, you want to have an overview of the information of the data set that you have or if you get some data. And it's quickly and simply done just by CDO as info and input is the input file. With this command, you can already see the file format which is great and great coordinates which is Gaussian reduced grids and vertical coordinates can be in pressure or height. And it gives you different levels and even the time step which in my case is six hourly output data. This all just down in three seconds. And then when I want to proceed to daily mean from my six hourly data, I can just simply your CDO they mean my input data and output data. As you can see there quickly put process in just one third of a second. Then you have daily mean data already with the information of your variables. And on the right side, I give you some examples of CDO comments you can use to select your data. If you have a specific study domain, you want to select in your grid boxes or do some regression analysis or us and more information on how to use CDO can be found in the CDO reference cards in the link below. So after you pre process their data and did some analysis of your data, you want to share your data with our collaborators. And if you're working and joined the project and you also want to archive and backup your data along the way, because sometimes the run could be wrong. You have to rerun it. And sometimes there are so many and different operators you have to share your data with. So it is important to book KPR data. I will give you an example from my own case on the left is a screenshot from my own note and made for my simulations. So there are three columns. You'll see the experiments name if it's controlled like ensemble run and the time frame for my data and the folder that destination that I have for those data. So for, for example, for ECFS, it's a place where I store my output to permanently and easy time is a temporary folder as store my raw simulation for about three months where I regularly retrieve back to the data and check it. If it's everything's okay and each age in this case is a place I want to share and transfer my data to and I take each box of the data already archived and transferred or backed up. And those empty boxes means the work still need to be done I left. So in this case, you have a good overview of what you have achieved and what still need to be done and this will save you a lot of energy from overwhelmed work running models. So last but not least, I want to share some tips and tricks from my own experience for do's and don'ts do regularly touch your file on scratch as I said, the scratch can be erased regularly. So this command will help you a lot in touching all your files and change modify the date of your file and do check your quota frequently. You don't want to explode your folders for your rent and do keep detailed documentation on your simulation like which year, what kind of simulation you did, and what kind of parameters are there. And then don't forget and that do not store your scripts on scratch as your scripts takes a lot of time to write and if I get erased by scratch that system. It's just nightmare and do not give data confusing names. So in this case, actually, it's not a simpler is better but actually the more detailed name the longer the name is for description, the better for understanding, especially for later days when you want to retrieve your data. So that's all from my part for working on clusters and how to book keep your data and next to coming up is Nikolai he will talk about how to publish your data and Nikolai is the data creator of the bonus repository, which is a platform for publishing soil and agricultural data. Okay, welcome, Nikolai. Hello everyone. Thank you, Faye. So, at my topic, it's about data publication, or how to make your data become fair. Everyone should agree. Just a moment where to do. I want to share my screen and there it comes. So, I hope everyone can see now the presentation. There it is. And so everyone should agree that data should become fair. What means findable accessible interoperable and reusable. And you can imagine that this is a very easy way by handing over your data to a proper repository and publish your data. So, I want to, I want I now want like to put you into the position of a scientist who want to make a model study. She doesn't have her own budget for field research, but still has a good idea for the project. Therefore, she is looking for soil and environmental data. This data should be in a high quality, high spatial and temporal resolution and among other things. This data should be able to be reused in a legally sound way. This data now can be found in the bonus repository in which the scientific community provides its data for free reuse. With this data now, she can parameterize her model, generate values and last but not least write a paper. The generated data is then submitted to the Bonerace data repository for publication. Her data then afterwards is reproducible, citable and reusable. All in all, her data goes fair. So, now I like to introduce you a little bit more into the theory behind data publication. The principles of the Bonerace repository regarding to data publication are reproducibility, citability, the free reuse, especially those of data, especially those generated with the support of public funds and the user friendliness. So, if you keep this in mind, let's have a look to the data preparation. I think, think about publishing your data, please take your time and prepare data. Use a high level of aggregation to foster the reuse of your data. It means combine meaningful data sets, combine values, choose a proper aggregation level, include all necessary attributes and take a closer look to related data. Like in the example I've listed below, the researcher has different tables like yield data from the field, nutrient data from the lab and some information about the location. All these tables here stick together to one highly aggregated data set where everything what is necessary for reuse is included. So good so far, but I can guarantee that all this fascinating data is worthless for reuse if there is no proper metadata available. Metadata in contrast to the data are descriptive information to your values because without proper metadata, your data becomes unfair. So as Alice and Jess introduced before, keep in mind to follow your data management plan and collect data and collect metadata during all stages of the data lifecycle. Take your time to describe your data. Use tools to save nerves. And last but not least, use keywords. Keywords are the key to all your data and in this way are an essential part of the metadata. Keywords are necessary to find and explore data. They foster the reusability and they ensure interoperability. If you use keywords, please do not invent new and individual keywords. Now use controlled vocabularies, for example, from the AgroVoc. There it is possible to make a clear data assignment to avoid misspelling like colluvium, not colluvium. And keywords could be, if you use, for example, the AgroVoc. Multi-lingual, what means you can use your native language and it will be translated automatically to other available languages. In the example, I want to show you is how it works in the Boner Race repository. There you can see search for and click the keyword from the AgroVoc and it is automatically transferred and filled into the metadata editor and the XML script is listed below. There is included not only the information about the word itself but also some codes where it comes from and from the thesaurus used. So next to the keywords, proper license is very important because licenses, they define the terms of use for research data. They allow the reuse of research data under flexible and legally sound conditions. Licensing of research data reduces uncertainties and reuse. We recommend to use the creative commons because at the first step, this is an international standard. We like international standards and it is translated into several languages and ported to the national legal systems and it is human and machine readable. So, last but not least, it is fine-granulated so everyone of you can find a proper license for the research data. So all data published in the Boner Race repository will be attached within DOI because you can imagine this is an international standard. It is described by metadata. These are automatically generated out of the Boner Race metadata and due to the DOI, your data becomes trackable. That's good for the reputation. Data citations are trackable and it is one important element of the fair data principles. So, what to do with the DOI? How it is best utilized? It is best utilized if you can link your data to a paper and vice versa. That means if you publish your paper, include the DOI to your data published in a proper repository. So, when another researcher reads the paper, he has direct access to the data for reproducibility and when he finds the data in the repository, he has direct access to the proper journal and get more information. So, now I've talked a lot about how important it is to publish your data and describe it with metadata. You have been able to get an impression that it is time consuming and I know that your resources are limited because you want to do the research but not want to do metadata collection and data description. So, save your nerves and use tools that can support you. In the Boner Race repository, we have developed several tools to support the publication of your data and to make this workflow as user-friendly as possible. As you can see here, the starting point is the submission of your data. Later on you can edit your metadata and finally you can find your data and other published data in our repository. So, let me talk about the submission. If you want to submit data to the Boner Race repository, go to our website. There you can enter some very general information, metadata, like the title and the short description. In this example, some researcher prepares data about plant growth during the vegetative phase in France. Actually, you're limited to 500 megabytes. If your files are larger, please call me. After submission, we contact you with the prepared metadata. When you open the pre-filled and prepared metadata, there are information about the title. You can remember about this and the description, what you entered before in the submission set included and a lot of other important things like the disclaimer, like a license suggestion and the complete data model, what we extracted out of your data. There you only have to check these pre-filled information and add some more additional metadata. The last step to save your nerves is to publish your data. If you're looking for a proper repository, you can have a look to redata.org, what is the registry of research data repositories or if you are dealing with soil and agricultural data you can submit to the bonarist repository to make your data become fair. Thanks for your audience. Now I will hand over the screen to Markus again for the last words. Thank you very much, Nikolai. By now, everyone should have a much better overview over what the data lifecycle is actually about. Finally, we would like to give you a short note on meta-analysis because this is very much connected to everything we have been talking about. So meta-analysis is basically the golden crown of data-handling. It is often high impact, so it's good to write them, which needs some experience, but it's also good to be cited in them. So however, this only works if your data is openly available and clear to be understood in many ways that we have shown today. Now thanks so much everyone for being here. It was a lot of fun and we hope that there's some ideas that will assist you in your future handling of data. Thanks everyone and also to the organizations that have been supporting us. We hope to see you all next year in person and it was a lot of fun doing this. Thanks very much.