 Okay. Thank you very much for hosting. It's a fantastic experience. I just gave a package demo two days ago. It's great to have the real audience in the auditorium in the lecture room while I'm doing virtual. So today I will present a package called ReuseData as an open source open development tool for reusable and reproducible genomic data management. To note that ReuseData is now available on GitHub and we are in the final stages of wrapping it up to submit to Bioconductor in the next one or two months. So a typical data analysis flow usually starts with the data. So there are different data types. The first and most importantly, experiment data within a specific experiment design. So for example, RSIG data for a specific disease. And the second is the reusable genomic data sets that we use in many different research projects, such as the reference genome or a gene of variant annotation files that you add annotations for gene functions or things like that. So today we will focus on the second data type and talk about how to effectively manage this data locally and how to effectively reuse this data in your data analysis. Okay, we show all of them together. So traditionally if you want to use one of the, here I'm showing the reference genome from 1000 Genomes Project. The first steps, the first step you will need to download the data and do some basic data creation such as to index the reference genome. The second step is for the manage the data locally and put them in a common lab practice to be use a designated folder for a specific projects or use a shared folder within a research lab. And the third step is to use the data in your data analysis workflows or in the exploratory data analysis using R. So there can be many challenges in the data management without a standardized local data management system. The data files can be disordered, hard to find or even redundant from repeated downloading due to a lack of tracking of meta-information such as the sort of data structure. And usually the data are same data set on many different research projects. So repeated processing the same data set in a similar way in different projects can result very like a very inefficient data storage and computing resources. So to adjust these challenges we have developed a package called reuse data. So with the aim to improve the data reusability and reproducibility for reusable genomic data resources. And also we aim to create a shareable data sets that are ready to pass into genomic analysis workflows especially this will be used to be shared within the same research lab or a shared facility so that they can make use of the same data sets without like those inefficient repeated downloading. So the main feature of the reuse data is that we use a workflow based data recipe to manage those data processing data downloading and data processing scripts. So this will be based on the CWL workflow and be used to standardize the genomic data management. So with the data recipes we have functions to it creates a local cache of the data recipes so that you can search the recipe and use the data to generate the data locally with the full annotations. So for reuse data first step is to use the pre-built data recipes which includes the scripts for the downloading curation and all the data processing tools such as the like the indexing tool we'll be using the Docker so that the recipes are more self-contained and reproducible. And it has some beauty in code for adding the meta information. So when you generate the data from the recipe there will be keywords and meta information added automatically. So the second step is to generate the data locally. So it will create a local cache for all the data and which are more trackable and searchable without like here at argument notes you can add all the keywords over here which can be used for the later data search. The third step is to use the data. We can search the data using keywords by simply adding the like meta to be 37 over here and the specific feature of our package is that we use the hashtag to denote a specific tool that can return all the data set that can be passed to this tool. So we have functions to add a tag for when you generate the data you can add a tag for the download for the downstream software so that you can search it. I'm looking at the tag. So and also if we use the data we have a convenient functions to convert the data into specific format such as JSON so that they're like ready to be passing to workflows. Okay, take home message. So reuse data use recipes to standardize the data access and preprocessing for the reusable dynamic data sets and the data recipes are CWR based for reuse and reproduce and it's not only for public data sets that can be reusable it can also be used for data management of your self experiment data from your lab and it creates ready to share curated data sets and reuse data provides the streamline functions to pass the data into downstream data analysis workflows. And we have a special feature of software based data recipes which you can very can search data that can be passed to a specific software tool. I'm not too fast. That was good. Thank you. Do you have any questions for Chi-an? The questions in the chat? I can... You have a question or if you think of some, you have a question, yeah. Yeah, so it's just gonna ask about the RCWL kind of infrastructure. How does it handle like jobs that fail? Does it reschedule it automatically? Is that easy to do? Yes, I think we have like at some point we have added this functionalities. And if it is... So I will have to check like how it works now. I think that can be absolutely added since the CWL supports like the workflow language support that function and we can add it to RCWL. Oh, great. Thank you. Are you gonna give Natesh a stop? Yeah. Excellent. So yeah, Vince is gonna give a talk for Natesh. Can we get this guy to...