 Welcome to this tutorial on automating Galaxy workflows using the command line. The aim of this tutorial is to demonstrate the functionality which Galaxy provides for running workflows, not just in the graphical interface, which you should already be familiar with, but also using the command line. So if you've ever had to deal with a kind of analysis which require you to execute not just one or two workflows, tens or hundreds or even thousands, then this tutorial should be interesting for you. So the tutorial has two different parts. The first is a hands-on session, which we'll demonstrate in this video. We'll show you how to use this Planema Run sub-command for running workflows in the command line. The workflow which we'll use for the tutorial is not a real scientific workflow. It just contains a couple of text manipulation tools, so it's a handy way to get to know this Planema functionality, but without needing to use a large amount of compute resources. You can run the workflow on whatever Galaxy server you prefer, so either a local server like I'll do in this video, or alternatively on one of the public Galaxy servers which are available. The second part of the tutorial is a self-study section. It will not be demonstrated as part of this video, but the idea is that you use the tools that you learn in the first part and then transfer them and apply them to solving a problem by yourself. We'll provide you with some data and a real scientific workflow using the Pangolin tool. Using the Pangolin tool to classify VCF files associated with different samples of the SARS-CoV-2 virus and to assign each of the samples to a particular viral lineage. So you should spend some time in this part thinking about how you can automate this workflow using the tools that you already learned in the first part, and in the end effectively creating a small bot which can run the workflows for you. Okay, so let's get started if you're a brief demonstration of some of the features which Plenima has to offer for running workflows. So the first step will be to download the workflows and the data which we'll use during this tutorial. So we provide them in a GitHub repository. You can see the link here. So to get them you can just run the getClone command which is provided here. So I'll copy this link into my terminal and just paste. And then I can just cd into the repository. So you see here that there are two subdirectories. This is the example sub directory and I'll show you it just now. And the Pangolin sub directory which contains the scientific workflow that you'll work with later on. So I'll show you first of all the example sub directory. So at the moment it just contains a single file, this tutorial.ga file, which is the Galaxy workflow which is pre-downloaded and which we provide for this tutorial. So the next thing that we need to make sure of is that we have a recent version of Plenima installed. So it's recommended that you install Plenima into a virtual environment. So I've activated my virtual environment just now. Here it is.plenima. If you haven't installed Plenima yet there are instructions in the Plenima documentation which is linked here. I'll just show you what it looks like. So here under this pip section you have instructions for how to do a traditional python installation of Plenima using pip. And then the final thing that we need is to have a Galaxy server on a Galaxy server, which you can use to run the workflow. So in this case I've got simply a local Galaxy server, which is running in a different terminal. So here and I've logged in. But in principle you could also run this tutorial on a public server like usergalaxy.eu or usergalaxy.org. Then the final thing that you need to be able to access your Galaxy can programmatically, like we're about to do with Plenima, is to access the Galaxy API key. And this you do by going to user. Then you click on preferences and on manage API key. And if you're accessing the API key for the first time, it says API key not available. So then I just create a new API key and copy it. And then use it as part of the command which I'm about to run. Okay, so maybe I'll start by showing you the workflow which we're about to run. So now we're back in the terminal. I'll show you the contents of the tutorial.ga file using FIM, but you could use any text editor you want. So it's a JSON file which provides a full definition of the workflow of all steps and inputs, which are contained within the workflow. So it provides definitions of all the tools, of all the parameters, which you need to run the workflow. So if I now show you the representation of the workflow in the graphical interface, it's this one here. Then you can see it looks like this. So it's really simple workflow with just two dataset inputs and a single workflow parameter. So what this workflow does is that it concatenates the two input files, which are just text files. And then it selects a certain number of lines randomly. And the number of lines which it selects is defined by this number of lines parameter here. So it's not a scientific workflow. It doesn't do anything useful, but it's just a small workflow to demonstrate some of the planemo functionality. Okay, so returning now to the terminal. What we have in the tutorial.ga file is a definition of all the workflow steps. But in order to run the workflow, we need to also define all the inputs and all the parameters that we want to use for a particular workflow invocation. So for this, we need a second file, which is referred to as a job file. So this defines all the inputs and the parameters which are needed for running particular jobs. So in this case, a job would be workflow invocation. So what planemo provides is a helpful supplement workflow job init for creating a kind of template job file, which you can then fill out with inputs that you require and parameters. So I'll just type this into the terminal. So planemo workflow job init tutorial.ga and as the output we want tutorial job.yaml and then it completes. And I'll show you the contents of the file now. So it's very simple. We have our two data inputs, data one and data set two, which we saw in the workflow. And this number of lines parameter is also defined. So in order to run this workflow, we need to replace these paths with paths, which point to a real dataset, which we have saved locally on our computer. And here we also need to replace the number of lines parameter with a particular number of lines that we want to use for our invocation. So we can replace the number of lines parameter already. We can set it to three, for example. And for the datasets, we need to first create the files that we want to use. Okay, so to create our datasets, we can run some commands like these. So you can create these files however you want to. You don't have to do it this way. You can use a text editor or any other way you prefer. The key thing is that we now have these two text files, dataset one and dataset two, which have some contents. So you can put into these files whatever you want to add. So the next step is to update our template job file to point to these two files that we've just created. So let's reopen the job file. We need to update these two paths, which are just shown as placeholders at the moment. So at this point, we have all of the placeholders filled in. So we should be ready to invoke the workflow with Planemo. So let's go right ahead and run the workflow with the Planemo run sub command, just like it's destroyed as it's described here in the training material. So we need to use a command like this and I'll show you how that works now. So we start by typing Planemo run. Then we type the name of the workflow, tutorial.ga. Then the name of the job file, tutorial job.yaml. And now we need some information about the server on which the workflow is going to run. So first of all, we need the URL of the server. This I can just copy. Then we did the API key, which you can get from the Galaxy user key here. So again, I'll just copy that. So this command is enough to run the workflow, but it's helpful to provide some extra information so we can specify, for example, the name of the history which we'll create to execute the workflow in. So we'll call it Planemo test workflow. And then I can also specify some history tags which should be added to the history. So this is just a comma separator list of tag names. So in this case, I just want to add this single Planemo tutorial tag. So now I press enter. And if I navigate back to the web browser, to the graphical interface, I see that a new history has appeared. And you can see it has the right name, the name that we specified before, Planemo test workflow. I mean, if we navigate now also to the workflow invocations page here, then we see that the invocation hasn't yet started because the two input files have not yet uploaded. But now that's the case, we refresh the page and we can see that a new invocation is visible. So it's been invoked a few seconds ago. So navigating back now to the terminal, we can see that the command that we entered has not yet completed. So in fact, the default behavior of Planemo is that the command doesn't complete until the galaxy invocation which started has also completed. So now that's the case, as we can see in the right-hand side in our history panel and our command has also completed. So this is fine for a really small workflow like the one that we're dealing with. But if we're dealing with a really large workflow which might take a few hours to complete, then this could be a bit of a disadvantage. You have to wait for the entire workflow to complete before the Planemo command completes. So that leads us on to the next part of the tutorial, which is to run the Planemo sub-command using the no-weight flag. So it's exactly the same as the previous command, but we just type no-weight at the end of the command and we press Enter. So once again, we see that a new history has been created with the same name. We could have given it a different name actually. It might have been a good idea to have done that in the commands we created. We have to wait for the two datasets to upload. And what we see is that as soon as the datasets are uploaded, the invocation can start. And the Planemo command terminates immediately. So as soon as the invocation has successfully been scheduled. So just like you can see here, it waits for the invocation to start and then you get this message run successfully executed and Planemo tells you that it's simply going to exit without waiting for the results of the execution. Then you see that the history continues to update until we have our output file which has been created here. So returning now to the training material, the next stage is to start to run workflows using Galaxy workflow and dataset IDs. So we've now executed the same workflow twice. And what you can see if you go to the workflow page here is that for each execution that we make of the workflow in Planemo, the workflow gets re-uploaded each time. So we now have four identical workflows which have been uploaded four separate times because we've run the same command four times. Likewise, if we view all of our histories, we can see that this workflow has now been run four different times. And again, these two datasets, dataset one and dataset two, have been newly uploaded each time. And in this case, it doesn't really matter. This is just a small test workflow, but it's not really ideal that this upload has to be repeated each time for datasets and for workflows which are just identical. So what we can do with Galaxy is that we can, in our job file, we can point Planemo directly towards the Galaxy IDs of datasets and workflows which already exist on the Galaxy server. So I'll try and demonstrate how to do that. Let's go back to the main page of our server. So in our history, if I expand these two datasets, dataset one and dataset two, if we click on the few details icon, then you have various information. And if you look at this row, so in the section, dataset information, you have this row history content API ID. And this is an ID which Galaxy has created which is associated with this particular dataset. So what I can do is I can copy this. Now return to the terminal and open up the job file. And what I do now is type Galaxy ID and I can paste the dataset ID which I just copied. Now we do the same for the second dataset. Select the dataset ID. We can close in. So what we've done here is that we've changed the job file so that the next time that we run the workflow with this job file, then Planemo will know that instead of uploading datasets from a local source that it can just take these datasets which already exist on the Galaxy server and it can run the workflow straight away. So we can skip this dataset upload stage. The other thing we can do is to use the workflow ID directly. So to do this, we go back to the workflow page in the Graphical Interface. We click on Edit. Then we have in the URL, we have this part of the URL. So after ID equals, it looks very similar to the ID that we used so far. We just copy it and return to the terminal. So what I'm going to do now is just run the same command but instead of using the name of our workflow file, tutorial.ga, I just replace it directly with the workflow ID which I just copied. So what this will do is it will run the workflow again but it will run the workflow without uploading it. So Planemo will know that instead of needing to upload this tutorial.ga file for a fifth time, that it can simply use the workflow which already has on the Galaxy server specified with this ID. So let's go ahead and try it. Now you see that these files have uploaded or rather not uploaded much, much faster. You can see that this upload step is skipped so these datasets are made available immediately. It looks like I might have got the datasets the wrong way around actually. One dataset too. But that doesn't matter and then the invocation starts immediately and because we've got the no weight flag specified then the command terminates as soon as the invocation has completed. So the final thing that I'd like to show you is the concept of Planemo profiles. So I return now to the training material. We're onto this section here using Planemo profiles. So what a profile is, it's an idea to combine multiple flags together into a single profile which you can then append to a command and thereby use all of the flags which are associated with a single profile. So this helps you to simplify long commands commands that you would run multiple times which is kind of the aim of this tutorial to start to run multiple, to start to run workflows a large number of times automatically for example using a bash script then it might be handy to be able to combine command flags into a single profile. So the best way to explain this I guess is to demonstrate. So the command that we use is profile create sub-command then we type the name of the profile so in this case Planemo tutorial and then we can add various tags that want to be associated with this profile. So in this case I'll add the URL and the API key so just copy these and press enter. So that causes an error because I should have used underscore instead of a hyphen but now it's being created. So what we can do now is to run the run command again the run sub-command but what we can do is we can now remove the API key and remove the URL and instead we can append to the command profile Planemo tutorial and by doing this Planemo will use the information which is saved in this profile so the server URL and the API key to run the workflow so let's try it and we can see once again that a new workflow has been invoked in this history. Okay so that brings the first part of this tutorial to an end the part in which we showed you how to use the Planemo run sub-command to run a galaxy workflow from the command line the second part is a kind of self-study section so we will provide you with a scientific workflow and the idea is that you should set up some scripts which run this automatically several times so all the instructions provided in the tutorial you can just go through and check the solution if you need to I'll just tell you a little bit about the workflow first before you start so it's this BCF2 lineage workflow the idea is that it takes as input various BCF files which describe the genetic variation in various samples in this case they're samples of the SARS-CoV-2 virus and then the variants are filtered by this parameter here which provides a minimum earlier frequency for this filter step then we construct a consensus sequence in which we apply the variant calls which we've taken inputs to our reference genome and finally we use the pangling tool which takes the faster files generated and uses them to classify the samples and assign each of them to a particular viral lineage so that's available for introduction to the workflow there's more information provided in the training material if you're interested in the scientific background so with that I think that the tutorial is over and good luck implementing another bot for the BCF2 lineage workflow