 Hello everybody. My name is Saskia Hilteman. I work at the Erasmus Medical Center in the Netherlands, and today I will talk a little bit about storage management. And in particular data libraries. Data libraries provide a convenient way to share datasets with your users. This is great for commonly used datasets such as reference data or data needed to follow GTN tutorials. Users can access this data by the top menu bar under share data and then proceed to data libraries where they can browse the different data libraries available to them and import datasets directly into their history to start using them. Another nice way that users can access the data and data libraries is directly from within the tool form. So instead of first importing to their history, they can open a tool, browse datasets directly and choose one of the data library datasets to use as input for their tool. One of the main advantages of data libraries is that it avoids duplication of data. So there's only ever one copy of the file on disk no matter how many users import this file into their history to use it. It is also nice for the users because it doesn't count towards their quota so they can use large datasets without it eating up their quota. Libraries can either be publicly available so shared with all users or be specific only to certain users or groups. So permissions can be managed on the library or the dataset level by using roles and groups. Only admins can create libraries, but then they can delegate the permit permission management to ordinary users. So they can add data sets to this themselves. There are multiple ways of getting data into a data library. So the first one is from history. You can just upload data to your history and then import it into a data library. This is very simple, especially for if you have small datasets. Another way is from a directory on the server that belongs to the user. Admins can import from any directory. And if your data is available somewhere by URL, you can import directly from the URL as well. There are a couple of configuration settings that you will have to set in the galaxy.yaml file for this. There's the user library import variable which specifies directory of files where users can upload data from. So this directory should contain subdirectories which are named with the user's email address. So very similar to the way the FTP upload directory is structured. And you can also have these points at the same location so that users can upload data by FTP. And then if they have been granted permission to manage data library, they can import this data from their FTP directory into a data library. Then there is the allowPathPaste option which will allow admin users to import data from anywhere on the server that the user that is running the galaxies process has access to into a data library. So this is the main overview and now I will just give a demo of how this looks in action. So now I will quickly demo how this looks for the users. So the users, if they have their galaxy server and they go up to share data and data libraries, they can browse different data libraries that have been made available here. And click on one of those to see its contents and then select any files they want to import. They can select the ones they are interested in like this and export directly to their history. And then they have a choice of whether to import it as individual datasets or together as a collection. They can send it either to their current history that's default or create a new history. And then they just hit import. Now you see that these files have appeared in my history and now I can start using them. The other way that users can really access this nicely is directly from the tool form. So let's say we want to do some text reformatting. We have the file not yet in our history here. This button that says browse datasets. There we can, it shows us the files in our history, but also the top here data libraries. So again, we can browse around here and say and choose which file to use as input for this tool. And you see that this file is now selected as a tool input, even though it's not uploaded to my history first, and then we can run the tool. So that is very nice for the users. And next I will show you how this looks for you as an administrator. So now I am logged in as an administrator. And if I go to this shared data and data libraries menu now, you see that here I see a list of all the different data libraries that exist on this server. But now we also have here some extra buttons. We can edit the description and the name of this data library. We can manage permissions. So here you can see that we can define which users and roles have permissions to manage the permissions to add items to this library or to modify the library. So here we can select users which we allow to manage this data library for us. And if we go to one of the data libraries you see here there are options of adding more data sets to it. So at the top here you have this plus data sets button. So you can either select files from history, or since it's enabled here for admins, we can specify a path on the server which to import. And if you specify a directory, it will keep all the directory structure within that directory. And this is a very nice resource for sharing large data sets with your users or data sets are used by a lot of your users. So in the hands on portion of this tutorial, we will show you exactly how to configure this, the setup so stick around for that. So now that we've got things set up, we will show you how to add data to your libraries. We'll show three different ways. The first one is uploading data that is in your history. Second one is uploading data from the user directory we just configured. And third from the import directory which is only available to admin users. So if you switch to your galaxy, you can access the data libraries from the top menu share data libraries. So here would be our list of data libraries we don't have any yet so we'll create one now with this plus icon. Just give it a name, example library. You can give it a description to for your users and save that. So it's created a new library for us. We can click on it to add data to it. So if we want to organize things into folders first we can create those folders by the menu here. So let's say we create one with the name of history files. Save and this directory will upload some data sets from our history. So to add data sets you click the plus data sets button. And here you see the three options from your history from a user directory or the import directory. So let's start with showing the history. Click there you see that it'll show you all the different files in your current history and you can also switch to between different histories here. And here you can just select whichever tools you would like to import here. And when you're done you say add. And you see that these files are now in the data library. So in here at the top you can browse up and down different folders. So we go back up one level we see here this is the folder we just made. So let's make another one and repeat this process except uploading data sets from the user folder. So user, userJer, save, click on that. And let's add some data sets this time not from the history but from the user directory. So here you will see all the options. So if you remember when we looked at the library folder on our server we had here this example file. So here we can select the files we want and tell them add some metadata if we want about the reference genome for example, the file type just like the regular upload button and we hit import. This is all pretty easy and we can repeat the same thing for the admin import location. So let's say we want to add a data set here from the import directory. It looks the same as from the user directory except here you see it's only the files that are accessible to the admin folder. So the admin wild type here and import. And those are the three main ways to get data into your data library which users can now use. And in the next session we will show you how you can script this, how you can automate this, if the files are available on the web somewhere by URL you can also automate this process. Okay, so next we will show you how to automatically populate data libraries from data sets available on the web. So for this you create a little YAML file that describes data sets you want to import. So we have included a example YAML file here in example repo we cloned before. So let's look at the content of that one. So if you look in that file we see that it has some metadata we can give the library a name and a description like we did before manually. And then we list the items that we want to include so here we can say the URL of where the file is located online. We can give it the file format and then the other information we might want to associate with it. And we can create directories in the same way and populate those with the several files. So here you see we define three files into directories. So to be able to use this you will need your Galaxy API key, your admin API key. If you don't already know this you can read here how to get one. I will show you if you go to your Galaxy instance under user preferences. You see here a lot of user settings including one manage API key. And it'll give you the API key and if it's not set yet you can click this button to create a new key. But copy this value because you will need it to create the data library in the next part. Okay let's do it. So this is going to use the ephemera tool. If you've already done the tutorial before you can reactivate that virtual environment. If you haven't done that yet or the virtual environment is missing this will show you how you can create it again. So let's go ahead and activate that. And then next comes our command to install or to populate the data libraries. So this is an ephemera command set up data libraries. You provided the link to your Galaxy instance, your API key and the YAML file. And we gave it some extra, couple extra parameters or training, set some defaults that it's mostly only useful for GTN materials relating to how to make descriptions and things like that. The legacy flag allows newer galaxies to still use the older API. And in this case it's a little bit nicer to use the legacy API to replace libraries in place. So if we update the YAML file and run the playbook again it will not create a second data library with the same name but it will change the existing one in place. Which is kind of nice. So let's do this command and see what happens. So remember to get your API key and to get the location of your Galaxy server. So for me that looks like this, set up data libraries, dash G the link to my VM, dash K my key, and the link to the example library YAML file. Miss type dash K should be dash A for API key. There we go. You see it's making this library name a sequencing project and I'll import the URLs next. Okay. This command is also safe to rerun. So if we do this again. It'll just detect that this library already exists and it won't do anything. So if we would like to add say a single file to this library can update the YAML file, add that file, run the playbook again and it'll only add, it'll detect the new file and add that. So let's see in our Galaxy, if we can see this now. So let's go to share data, data libraries. And we see that in addition to the one we created manually earlier, we now have this most sequencing project which we defined in the YAML file. And it contains here the one file and the directory with the two other files in it that we described in our YAML file. So this is a really nice way to automate the population of shared data libraries. So this automated population of the data libraries is very useful. It is used by Galaxy Europe, for instance. They want to support all the tutorials on the GTN. So they want to make all the data needed for those tutorials available as shared data libraries. This means that users don't have to update the or upload this data set every time. This will avoid a lot of data duplication and also is very useful for users that maybe have a slow internet connection. So they created a shared data repository that defines the data, this YAML file for all the GTN data. So that this can be easily updated when the GTN also updates. And this has also been expanding to other usegalaxy.star servers. So you will see the same libraries across multiple servers. And if you would like to also, on your instance, support this GTN trainings and offer the data sets on your server, you can join this initiative. You just have to provide them with a non-admin API key and the data library to upload data into and then they can sync it automatically for you. So I hope this has shown that data libraries are a great way to share data with your users as an admin or to allow users to share data with each other and will avoid a lot of data duplication and make it easier on the users as well. Thanks for joining and don't forget to fill in the feedback form if you have any comments about this tutorial. Bye.