 Hello, everybody. My name is Saskia Hilteman. I work at the Erasmus Medical Center in the Netherlands, and today I will talk a little bit about storage management, and in particular data libraries. Data libraries provide a convenient way to share data sets with your users. This is great for commonly used data sets, such as reference data or data needed to follow GTN tutorials. Users can access this data via the top menu bar under share data and then proceed to data libraries where they can browse the different data libraries available to them and import data sets directly into their history to start using them. Another nice way that users can access the data and data libraries is directly from within the tool form. So instead of first importing to their history, they can open a tool, browse data sets directly, and choose one of the data library data sets to use as input for their tool. One of the main advantages of data libraries is that it avoids duplication of data. So there's only ever one copy of the file on disk, no matter how many users import this file into their history to use it. It is also nice for the users because it doesn't count towards their quota, so they can use large data sets without it eating up their quota. Libraries can either be publicly available, so shared with all users or be specific only to certain users or groups. So permissions can be managed on the library or the data set level by using roles and groups. Only admins can create libraries, but then they can delegate the permit permission management to ordinary users. So they can add data sets to this themselves. There are multiple ways of getting data into a data library. The first one is from history. You can just upload data to your history and then import it into a data library. This is very simple, especially for if you have small data sets. Another way is from a directory on the server that belongs to the user. Admins can import from any directory, and if your data is available somewhere by URL, you can import directly from the URL as well. There are a couple of configuration settings that you will have to set in the galaxy.yaml file for this. There's the user library import der variable, which specifies directory of files where users can upload data from. So this directory should contain subdirectories which are named with the user's email address. So very similar to the way the FTP upload directory is structured. And you can also have these points at the same location so that users can upload data by FTP. And then if they have been granted permission to manage data library, they can import this data from their FTP directory into a data library. There is the allowPathPaste option, which will allow admin users to import data from anywhere on the server that the user that is running the galaxies process has access to into a data library. This is the main overview, and now I will just give a demo of how this looks in action. I will quickly demo how this looks for the users. So the users, if they have their Galaxy server and they go up to share data and data libraries, they can browse different data libraries that have been made available here and click on one of those to see its contents and then select any files they want to import. They can select the ones they are interested in like this and export directly to their history. And then they have a choice of whether to import it as individual datasets or together as a collection. They can send it either to their current history that's default or create a new history. And then they just hit import. Now you see that these files have appeared in my history and now I can start using them. The other way that users can really access this nicely is directly from the tool form. So let's say we want to do some text reformatting. We have the file not yet in our history, but here this button that says browse datasets. There we can, it shows us the files in our history, but also the top here data libraries. So again we can browse around here and say and choose which file to use as input for this tool. And you see that this file is now selected as a tool input even though it's not uploaded to my history first, and then we can run the tool. So that is very nice for the users. And next I will show you how this looks for you as an administrator. So now I am logged in as an administrator. And if I go to this shared data and data libraries menu now you see that here. I see a list of all the different data libraries that exist on this server. But now we also have here some extra buttons. We can edit the description and the name of this data library. We can manage permissions. So here you can see that we can define which users and roles have permissions to to manage the permissions to add items to this library or to modify the library. So here we can select users which we allow to manage this data library for us. And if we go to one of the data libraries, you see here there are options of adding more data sets to it. So at the top here you have this plus data sets button. So you can either select files from history, or since it's enabled here for admins, we can specify a path on the server which to import. And if you specify a directory, it will keep all the directory structure within that directory. And this is a very nice resource for sharing large data sets with your users or data sets are used by a lot of your users. So in the hands on portion of this tutorial, we will show you exactly how to configure this. The setup so stick around for that. Thanks back everybody. In this hands on tutorial, we will show you how to set up data libraries on your galaxy instance and show you how you can add some data. So we're first going to start with adding a pre task to our playbook to clone an example repository with some example data sets. So we will edit galaxy YAML file and tell it to download this example repo from GitHub. So here we have just configured some directories with some data in it. So you can copy these lines. And we will add those to our galaxy.yaml file. We will add it after the existing pre task. And the second change we'll make is in a group far is galaxy servers.yaml file. We will configure the directories to use for this. So the library import directory for the admin users and the user library import directory for the regular users. So add these two lines to group far is galaxy servers, just under the galaxy config and galaxy. And now we just run the playbook. Okay, and now that it's done, we can look at our new directory that we created the slash libraries. And here we see that this data is now present. Great. So now that we've got things set up, we will show you how to add data to your libraries. We'll show three different ways. The first one is uploading data that is in your history. The second one is uploading data from the user directory we just configured. And third from the import directory, which is only available to admin users. So if you switch to your galaxy, you can access the data libraries from the top menu share data data libraries. And here would be our list of data libraries. We don't have any yet. So we'll create one now with this plus icon. Just give it a name, example library. You can give it a description to for your users and save that. So it's created a new library for us. We can click on it to add data to it. So if we want to organize things into folders first, we can create those folders by the menu here. One with the name of history files. Save. And this directory will upload some data sets from our history. So to add data sets, you click the plus data sets button. And here you see the three options from your history from a user directory or the import directory. So let's start with showing the history. If you click there, you see that it'll show you all the different files in your current history and you can also switch to between different histories here. And here you can just select whichever tools you would like to import here. And when you're done, you say add. And you see that these files are now in the data library. So in here at the top you can browse up and down different folders. So we go back up one level we see here. This is the folder we just made. So let's make another one and repeat this process, except uploading data sets from the user folder. So user, user chair. Save, click on that. And let's add some data sets this time not from the history, but from the user directory. So here you will see all the options. So if you remember when we looked at the library folder on our server, we had here this example file. So here we can select the files we want and tell them, add some metadata if we want about the reference genome, for example, or the file type, just like the regular upload button. And we hit import. This is all pretty easy. And we can repeat the same thing for the admin import location. So let's say we want to add a dataset here from the import directory. It looks the same as from the user directory except here you see it's only then the files that are accessible to the admin folder. So the admin wild type here. And import. And those are the three main ways to get data into your data library, which users can now use. And in the next session, we will show you how you can script this, how you can automate this. If the files are available on the web somewhere by URL, you can also automate this process. Okay, so next we will show you how to automatically populate data libraries from datasets available on the web. So for this, you create a little YAML file that describes datasets you want to import. So we have included a example YAML file here in example repo we cloned before. So let's look at the content of that one. So if you look in that file we see that it has some metadata we can give the library a name and a description like we did before manually. And then we list the items that we want to include. So here we can say the URL of where the file is located online. We can give it the file format and then the other information we might want to associate with it. And we can create directories in the same way and populate those with the several files. So here you see we define three files into directories. So to be able to use this you will need your Galaxy API key, your admin API key. If you don't already know this, you can read here how to get one. I'll show you if you go to your Galaxy instance under user preferences. You see here a lot of user settings including one manage API key. And it'll give you the API key and if it's not set yet you can click this button to create a new key. You can copy this value because you will need it to create the data library in the next part. Okay, let's do it. So this is going to use the ephemera tool. If you've already done the tutorial before you can reactivate that virtual environment. If you haven't done that yet or the virtual environment is missing, this will show you how you can create it again. So you can reactivate that. And then next comes our command to install or to populate the data libraries. So this is an ephemera command, set up data libraries. You provided the link to your Galaxy instance, your API key and the YAML file. We give it some extra, a couple extra parameters. So training sets some defaults that is mostly only useful for GTN materials relating to how to make descriptions and things like that. The legacy flag allows newer galaxies to still use the older API. In this case, it's a little bit nicer to use the legacy API to replace libraries in place. So if we update the YAML file and run the playbook again, it will not create a second data library with the same name, but it will change the existing one in place, which is kind of nice. So let's do this command and see what happens. So remember to get your API key and to get the location of your Galaxy server. So for me that looks like this, set up data libraries, dash G, the link to my VM, dash K my key, and the link to the example library YAML file. Miss type dash K should be dash A for API key. So here we go. You see it's making this library name my sequencing project, and I'll import the URLs next. Okay. And this command is also safe to rerun. So if we do this again, it'll just detect that this library already exists, and it won't do anything. So we would like to add say a single file to this library can update the YAML file at that file run the playbook again, and it'll only add it'll detect the new file and add that. So let's see in our Galaxy, if we can see this now. So let's go to share data, data libraries. And we see that in addition to the one we created manually earlier, we now have this most sequencing project which we defined in the YAML file. So here's the one file and the directory with the two other files in it that we described in our YAML file. So this is a really nice way to automate the population of shared data libraries. So this automated population of the data libraries is very useful. It is used by Galaxy Europe, for instance. They support all the tutorials on the GTN. So they want to make all the data needed for those tutorials available as shared data libraries. This means that users don't have to update the or upload this data set every time. This will avoid a lot of data duplication and also is very useful for users that maybe have a slow internet connection. So they created a shared data repository that defines the data, this YAML file for all the GTN data so that this can be easily updated when the GTN also updates. And this has also been expanding to other useGalaxy.star servers. So you will see the same libraries across multiple servers. And if you would like to also, on your instance, support this GTN trainings and offer the data sets on your server, you can join this initiative. You just have to provide them with a non-admin API key and the data library to upload data into and then they can sync it automatically for you. I hope this has shown that data libraries are a great way to share data with your users as an admin or to allow users to share data with each other and will avoid a lot of data duplication and make it easier on the users as well. Thanks for joining and don't forget to fill in the feedback form if you have any comments about this tutorial.