 Mapping jobs to destinations. Now that you've got Galaxy set up with a distributed resource manager, the next question becomes, how can I decide where all of these jobs go and what sort of parameters they get? How much memory, how many CPU cores, things like this? So we'll learn how to configure that for Slurm as well as some dynamic job destination things that we can do and how to give more control to your users when they run jobs. So this tutorial builds heavily on the connecting Galaxy to a compute cluster if you have not completed that first, you need to. Let's get started with statically mapping a job to the Slurm destination. So we're going to create a testing tool to start with. This will just be a very simple tool that tells us how many cores a job is running with. We'll create this directory file, galaxy tools. And within that directory, we're going to create a testing file files galaxy tools testing.xml. This will be our testing tool and we'll paste in some tool content here. Okay. The tool ID is set to testing. The name of it is the testing tool that's how it will appear in our toolbox. Inside this command block, it just tells us echoes running with Galaxy slots threads and then it writes this out to a file, this output file that Galaxy will collect. It has an input file as well, which is not currently used, but will be in the future. We'll expand this tool over the course of this lesson. Okay. Now that we've got the tool, we need to deploy it to our Galaxy. So I'm going to come into the group servers, our group vars Galaxy servers. I'm going to add this new section, Galaxy local tools. Just anywhere here underneath the, or with the other Galaxy variables. And the answer role knows about this directory, the files, Galaxy tools directory. And it can look in there for tools and it knows also where they should go in the server configuration. Galaxy, the answer role takes care of really a lot of magic here that you're missing. It also sets a local tool configuration file. So Galaxy needs this extra configuration file saying this testing.xml has been located on the server in this location and it's added this transparently in the background to our configuration. It makes it really easy for administrators to add tools like this. Okay, let's run the playbook. And this will run through everything. It'll copy the tool over to the Galaxy configuration directory. This is not the primary way you should install tools into your Galaxy, right? This is, I'm an admin and I need to install a testing tool to just test that the cluster works or something like that. As an admin I often want to know, does this cluster work? So I should just send a quick test job that just echoes, yes, I'm okay, hi. And then I can use that as a test to track over time, is my cluster up or is there something I need to investigate before users can report it. If you want to install tools, read the ephemeral and tool management guide. I believe that was covered Tuesday. That covers a lot more information about how to install tools in a much more reproducible way. Okay, Galaxy will be restarted shortly. And then over here in Galaxy, I'm being a little impatient, but you can see that Galaxy gets restarted and our testing tool shows up. It's down here at the bottom. And okay, not very exciting. Go ahead and click execute to run the tool. And it exits pretty quickly with running threads one. So we know that it's working. It looks good, right? It's just run this command echo running threads equals however many has Galaxy slots been set to and it's written this out to this file for us. So fantastic. So this tool doesn't actually use the cores, right? Other tools will and they'll probably have some special parameter, oftentimes P or N, something like that for a number of tasks or threads. And Galaxy, the individual tool command will be a little bit different, but it'll always have this Galaxy slots variable if it can access those additional resources. So let's add a new destination to our job configuration. Config, or templates Galaxy config job conf. And this will be a slurm to core destination. So this will look a lot like our normal slurm destination except it has this new native specification which says one node, one task, but CPUs per task too. And that will just say that this testing tool or whichever tools get sent to the slurm to core destination we'll get to CPUs per task. Okay, and we're going to configure our tool with ID testing, the same ID that we wrote in the tool configuration file to go to core destination. Okay, and we'll save that. When we run the playbook, Galaxy will get updated and we'll see a new output in our Galaxy. We'll see that now it's going to be using two cores instead of one release. That's the expected output is running with two threads. And if we see that, we know we've successfully set up a second destination with the different resource parameters. We can, this is a very basic destination, right? But you can definitely like expand on this. You can set memory parameters. You can set cores to a higher number. You can start defining destinations that look like what your tool needs in order to run it successfully and efficiently. Galaxy is restarting again. Fantastic, we'll refresh and when it reloads we will rerun this tool. Hopefully it'll say running with two threads. Look at that, perfect. So now we've got this new destination and we're ready to do more exciting things. You can already imagine some things you could do with this you could define like big memory destinations but what's more exciting for us now is the dynamic destinations. The static destinations are great. We use them a lot, a lot. However, the dynamic destinations are the next step if the static definitions aren't sufficient for your use case. So we're going to create a new directory files galaxy dynamic job rules. And in that I'm gonna create a file called my rules.py. I'm gonna paste that in. So what this does is this is a bit of Python code that will be part of Galaxy configuration. Galaxy will load this at runtime. It will load, it'll import some things from Galaxy itself like the job destination or the job mapping exception and it defines a function. This function takes in the app, the Galaxy app itself as well as the user email. And we use this to say, okay, let's get from the apps configuration, the Galaxy configuration itself the list of admin users and we split this up. And if the user who is submitting this job is not in the list of admin users then we'll raise a job mapping exception that this user is unauthorized to submit the job. Otherwise, it'll just go to the slur destination. We also need to configure the Ansible role to be aware of this. So I'm going to edit my group bars Galaxy servers again and I'm going to add in the dynamic job rules. Ansible is again taking care of a lot for us here. It's configuring not just that the file is copied over from the dynamic job rules directory that we've created the templates over to a specific place in the Galaxy code base. It's also taking care to add that into the Galaxy configuration for us. We also need to find a new job configuration. So this will be a new destination that our jobs can go to where when they go there, they're processed not by one of the existing destinations or plugins but instead by this dynamic runner which is always available. So the other ones we've had sent to specific runners like Slurm or the local plugin this time we're sending it to the dynamic runner. We've given it its own destination ID admin only here and it's going to be a type equals Python function and function name inside the Python file is called admin only. We're also going to map this tool to that destination. I'm going to remove the old one. Okay. With that, we're ready to run the playbook. This will run again. It will now configure this new destination. It'll configure the Python file with the code that we've written to manage admin users versus normal users. And it should run perfectly. So by default, admin users will be able to run this tool but normal non admin users will not including anonymous users. So what we're going to do is I'm going to open this in a new tab or a private tab that's not going to have access to the same cookies. So it's not logged in. You can also just log out. That's another easy way to test this. And we'll run the testing tool again. And this time it should tell us whatever the galaxy has been restarted with the new configuration. It should tell us you're not authorized to run this tool. Galaxy should be restarted here. And then we're going to go ahead and submit that testing tool job as a... Oh, we need to upload a data set first. I'm going to paste in some nonsense. It doesn't really matter here. And we'll run that testing tool. If everything's configured correctly, this should fail with the message unauthorized. Again, note that we're not logged in. So we're not an administrative user. And that's exactly what we expect. You can use this sort of thing to implement some sort of like job authorization control. If I do run this tool again as the admin user, you'll note that it succeeds with the default output we expect to see from it. And if you have tools that are restricted for some reason or another, this lets you prevent users from running them. It's a very... Galaxy usually doesn't restrict users from running tools, right? So the tool will still appear and the tool panel users can still run it or still try and execute it, but it won't actually finish executing. If you have needs for this sort of authorization control, you can use this dynamic job destination to do that. Hopefully you don't have those sort of needs. But you start to see the things that you can do with dynamic job destinations, right? You can make decisions on what resources it gets based on, who's running it. You can make decisions on where it should go. If you look in the template or in the dynamic job rule that we wrote, we returned the destination slur. So when this job was executed, it was sent to the slur destination. You could also use this, for example, to control where jobs go based on things like, does this job need a specific database? Maybe I have only one cluster with that database available. I should send the job to those databases if it requires a specific tool that's going to need that database and if it uses that specific database. You can use this very, very freely to build very complex systems. For most of the times, you won't need something this complex and we can use the much more simple dynamic tool destination system. This is a fantastically easy system. All we have to do is create a YAML system and all we have to do is create a YAML file with how we want tools to be mapped to destinations. So here we've got a list of tools. The testing tool has some specific rules to it. It has a default destination of SLARM and we apply a single rule to that. So if the input file size, we have a rule type file size. If the file size is between 16 bytes and infinity bytes, then we send it to the two-core destination. Otherwise, it will go to the SLARM destination. Failing that, we'll just send it to the default destination, local destination. Again, this should match what's in the training with what is in your job configuration. It currently doesn't and we'll fix that in the training. Actually, I'm going to make a change right now. I'm going to have it run on the SLARM destination. We've told you that things shouldn't run in the local destination in any case in the training when the best practice is to always run it on something like SLARM, always run it on a distributed resource manager. So now that we've got this tool destinations file with all of these different rules for specifically the testing tool and also failover rules, if it doesn't match that, we need to specify how Galaxy should handle this. So let's open up our Group Fars Galaxy servers again. And we're going to come up here to the Galaxy config section and we're going to paste in this tool destinations config file. It should go out to the config directory and then tool destinations. And then over here, we need to manually copy it over with the Galaxy config files setting. I'm going to paste that in. And so here you can see the same thing we did earlier that this file should go to a location that's determined by the actual configuration itself, which saves a lot of errors. If we have changes in one place, we don't need to update them in multiple places. Okay. And lastly, we'll need to configure our job configuration to take advantage of this new dynamic tool destination. So I'm going to come down here to the end of the destinations. I'm going to find the new DTD type destination. It's very similar to, it's another dynamic one, except it has different types. Instead of Python function, it has this built-in type DTD. And then I'm going to change the dynamic admin only to run on DTD by default. So what this should do is, given a job input, if it's less than 16 bytes, it will go to one destination. And if it's more than 16 bytes, it'll go to this alarm to core destination. So let's run the playbook and then we'll test it quickly. So our playbook is done. We're ready to go. Let's test the dynamic tool destination. We can again run the testing tool. We've got pasted entry three. Let's look at how big that is. I can't tell. So let's upload two files. One that's going to be very short, just A. And then we're going to add another file that's going to definitely be more than 16 bytes. Okay. I'll just rename these for short file. I guess small and big would have been better terms. But so we're going to run this with both seven and eight. This is a nice feature for your users. You can run a single tool over multiple files at once if you want. So let's run it with both our new testing tool with the short and long. And we should see that it'll end up in different destinations depending on if it is a small or big file. Okay. Small file ran with one thread and fingers crossed. Okay. The big file ran with two threads. Okay. So let's look at what we want to see. So we've written this dynamic tool destination. It has these rules for file sizes. And based on the file size, it can send jobs to different destinations. This is a very easy way to write this. And the dynamic tool destination makes it very easy to write simple rules and send things to different locations. If you need more freedom, if you need more control over how you're doing that, then you need a dynamic job destination. If you need more control over how you're doing that, if you need more control over how you're doing this, like within these regions, send it to this location or so on, then the dynamic tool destination is so easy. And we can strongly recommend that. There's a lot of documentation on that. And the Galaxy documentation site. Just check that out if you have questions on what, what all filters and rules are available for you. And lastly, let's talk about job resource selectors. For example, if you don't, you as an admin don't have a rule for how this job should behave. You know, does it need one core, does it need four cores? Or your users want more control over how the job runs. They say, oh, well, especially if you have a cluster that has accounting, the user can maybe choose to say, oh, well, I can pay for more cores or so on the, out of their budget and they can be willing to run it with more cores to get their results faster. As a result, you need some mechanism to provide the user with an interface for how do I select between these different resources. And you can use Galaxy job resource selectors for that. These let you define a little bit of the Galaxy tool interface that will let the users control how a job behaves. So first, let's start off by creating this new template file, the job resource parameter. And this is going to have a little bit of XML that I will walk through in a second. So it has some parameters, two different parameters, a core and a time parameter. The core parameter lets the user select between one, the default number of cores or two, an optionally larger number of cores. You can find any number of different parameters you like here. We've just provided cores and time as an example of a really commonly used one. Additionally, there's the time parameter. This is defined as an integer, which has a minimum of one and a maximum of 24. So this will be treated just like a wall time. You can imagine saying to the user, okay, if you know your job is going to run for an extra large amount of time, maybe you want to give it more wall time. And basically give more control to your users. If you're running a private Galaxy, especially one within your Institute, something like this, then giving your users more control can be a great thing. And it can help convince them that Galaxy is the way to go. If you're managing a public Galaxy, it can sometimes be undesirable to give your users more control because you don't know them and you maybe don't trust them. All of these should line up if the indentation is not correct. Just make sure that all of the minuses are lined up like that. And that looks good. So now in our job configuration, we need to inform Galaxy that we would like to expose some of the resource parameters to find in that configuration file. So we'll go over to our templated job configuration file. And in there between tools, destinations, we'll add in our resources group. And here we'll say there is a resource group named testing. Any tools which go to this resource group should have the parameters, cores and time available to them. If you wanted to find different subsets of things that are available for different tools, maybe sometimes you only want the cores or only the time to be accessible. You can do that as well by defining different groups. And we're going to update our tool, our testing tool here, to no longer use the DTD. Instead, it will use this dynamic cores time, which we've not written yet in case you're wondering what that is. Additionally, we'll pass resources testing to it. So that tells the tool and Galaxy, hey, this tool will need the testing group of resources, the cores and time exposed to it. So we've assigned the testing tool to a new destination, which doesn't exist. So let's define that now. The destination ID will be dynamic cores time. We'll come back up here next to our admin only function and say, okay, there's our dynamic cores time. And you'll note that it looks exactly the same as the admin only except cores time is the function name instead of admin only. So let's write that out. We've got most of this setup. It's a lot of things to configure. We've set up the job resources, the cores and time parameters, how they're defined, how the interface will look like, what the minimum and maximum are. So let the user select that. We've configured Galaxy to deploy that file. We've configured a new destination to use it, and we've configured the new resources. That'll be available. So now we need to actually write the function. Let's go ahead and create our files. Galaxy dynamic job rules. We're going to call this one something different. We're going to call this map resources. Oh, that looks better. Okay. This is a much, much bigger, more representative dynamic rule. They're usually quite complicated. So here we've got the job mapping exception that we imported like last time. We have two destination IDs that we might want to send jobs to either Slurm or the Slurm to core. We've also defined a failure message. When jobs are submitted to this destination, we will get access to the app, the Galaxy app itself, the tool, the job and the user email. By default, we'll send them to the Slurm ID. And we've set the destination to none right now. First, we'll get the parameters from the tool. This selects all of the tool form information. And if inside this parameter dictionary, we have the job resource selector. And if it's not available, then by default, we will just return our default destination ID, this Slurm that we have to find up here. So for any of the tools that aren't using this, they'll still be able to execute and run successfully. They'll just go to the default destination. Right now we've only assigned the testing tool to this destination, but you could imagine assigning multiple tools to this destination, but not exposing the resource parameter stage. Next, we will start by trying to handle the job resource parameters. If the cores and time variables are set and available, then we'll load the destination ID by the number of cores. So we defined up here at the top that the Slurm has one CPU and anything requesting two CPUs should go to Slurm to see. So we can see that here, we've selected the core number and then we're pulling out the destination ID. And we've used this convenience function to get information about the destination from the destination ID. And next we're going to set part of the native specification. So if Galaxy should need extra information like time, then supply this here. So with the cores, we've already selected between Slurm one and Slurm two core. So we know that that'll be correct, but we haven't specified the time anywhere. And we need to do that here. So we pulled out the time information and we're going to add that to the native specification of the tool. We're going to template that in minus, minus time. And this will be the wall time that's submitted. If there are any errors, we throw a job mapping exception. Otherwise we return the default destination. When possible. Or this destination that we've configured here in the job resource parameter section. So I hope that's all clear. If you have questions, just ask us. It's important to note that you were responsible for parameter validation. So if you have invalid states, the job will fail. And you need to be aware of that. Lastly, we need to set up the map resources in our group variables. We have an existing dynamic rule and we'll just put map resources. And you'll note that we didn't actually include the term map resources anywhere. It's just handled for us. It can be called anything you like it to be called. And then we run our playbook actually called the dynamic resources mapping in use galaxy use the sorting hat because it gets to sort the tools into their different destinations. So you can call it anything and it'll work. So we should be able to rerun the testing tool. And this time we should have a resource parameter selection interface, which we'll see shortly. Galaxy should be restarting. Okay. And when we get this restarted, we should now have this new form that's part of the testing tool. Go to the testing tool, you'll see job resource parameters. If you look at any of the other tools, you won't see this form. So if I look at the mapping tool, I don't see any of any testing, any job resource parameter form. But under the testing tool, you'll have this and you can use default or you can specify your own. And here we have the option to set one, leave it as default or set two. If we want to have more cores and additionally we can set the wall time. So I'm going to do that. I'm going to set two cores and 11 hours of wall time. I'm going to go ahead and view the details here. We can actually see the job API ID is three. But the slurm job ID is 14. So I'll be able to come over here and see SQ. I believe command, oh, S control show job 14. But, and mine is magically 14 as well. Or yeah, we saw that in the interface here in the destination parameters. You can always find the job ID there. And you'll see it ran with two threads, which is what we told it to do. But over here under the job information, we can see time limit 11, which was the number we specified. So this job has been submitted with these additional resource configuration. And it all works as we expect. If you have clusters that have these additional options, you might want to set them. If you have, you know, you trust your users to have control over this and not to just blindly set 24 hour wall time for every job. Especially if you have users or accounting for users and they can know about how is this going to affect other jobs? How is this going to affect their budget? That sort of thing. Use galaxy.org. I believe uses this for some of their tools. They let the user selects between sending a job to a couple of different clusters. Some of the clusters have higher memory and a longer queue and the user can say, oh, well, I know this tool needs a lot of memory. So I can just submit it there. And I know that I'm going to have to wait a little bit. Or if they have very small file, they can easily just choose to submit it to the short cluster with less resources and a smaller queue. So there's a lot of further reading again. A lot of our job configuration files are publicly available. If you want to see how we're doing things, there's a lot of information on the dynamic destinations. And if you have any questions about this, please let us know. Job resource parameters are pretty cool if you have users who are knowledgeable about the tools and compute resources. As always, please let us know if this tutorial is useful, if you have any questions or had any issues following along with the tutorial. If you have any issues with a specific video, though, please let us know in Slack instead. Thank you so much.