 Hello, everyone. My name is Franziska Bohnert. I'm today's host. And with us is Chris Hackard. And he is talking about how to implement custom scripts into your next flow pipeline. After you. Awesome. Thank you. And thank you for the introduction. So, yeah, today I'll be talking about custom scripts and how you can add them to your next flow pipeline. I think this talk was inspired by questions that we see on Slack occasionally where people are having trouble implementing a Python script or an R script or a Perl script or some other script as a part of their pipeline. And quite often there's not necessarily mistakes but things that can be done to make pipelines a little bit better and more readable. So today I'm just going to try and outline some of these things and hopefully everyone will be able to walk away with a little more understanding of how to do this with next flow. Next slide. There we go. So today I'll be talking a little bit of background. It's kind of outlining the problems and how we can solve them. I'll introduce my first pipeline, which is a next flow script that I've very quickly written to sort of demonstrate how to use the bin and templates directory to store your custom scripts. I'll quickly talk about managing dependencies and some of the things that you might need to consider when you are sort of packaging those together as well as a really quick summary. So background. I don't think it's a secret that in real world pipelines, you often need to use custom scripts which can be written in different languages. So bash our Python Perl as well as others. With next flow, you can integrate any scripting language into your workflow by adding the corresponding shebang to the code blocks or the script block. And I'll demonstrate this really quickly in the next couple of slides. And you can avoid keeping large code blocks in your main workflow by executing them as custom scripts. So while most scripts, some scripts can be really short, some can be really long and to improve readability. It's recommended that you store these elsewhere and then execute them using next flow rather than just having a big, big release of troublesome code block, which can be quite difficult to get through if you are trying to read through someone else's code. So this is my first pipeline, which is a next flow pipeline that I've written. As you can see, it contains a single process called my script, which is going to take string values as an input and give standard output. In the script block there, you'll see that all it's doing is taking the strings defined with the name str from my input. And it's going to turn all the lower case letters into uppercase letters. So nothing too complicated. So this is a really simple, in this case, single line of code. In reality, your script could be much, much larger and also written in a different language. In the workflow block down the bottom here, we have this, that and other being the three string inputs that we're taking from the channel. I'm piping that into my process and then just using the view operator to show that in my output, in my screen terminal window. So as I mentioned earlier as well, you can just easily add shebang to the top of the script block to change the language of that script block. In this example here, I've changed the script from sort of bash to an R script. And what I've done here is just re-written my sort of my job or what I was trying to do, which is turning lower case letters to other case letters using this to upper function, which is a base function in R. And then just catting that. So I'm printing it out to the screen. Nothing else has changed in this pipeline, apart from this here in the script block. I just wanted to point out that shebang again, because in this case it's an R script, but you could also include Python, Parallel, like I said, any other script in this. Any other shebang to decide which type of scripting language you would like to use. So this is just a really short animation of what this pipeline is actually doing. And as you can see, it's just running the pipeline and printing that other in this all in capital letters. So this is just those three strings that I've included there being printed out to the command window. And just once again, so it's just executing that pipeline and printing those out. So while this is a short single line of code, if it was much larger, you might automatically think, okay, I want to remove this and have it as an executable R script somewhere else in the pipeline or someone else on your system. And this is what I've done here. So I've decided to call my code block, my first script R just because it's an R script. I have changed this slightly, which I'm using command arguments to allow for an input to be taken as this script is executed executed in my script block. So that's what's happening up here. We've got my first script taking the command arguments. True. So it's going to take the trailing arguments and absorb them as a part of the script. Down here in my pipeline, what we'll see has changed is that we've got this full path to my first script. And then it's using this str, which is the input or the named input as part of this process. So what I've done here, and this will work on my system, assuming that full path to my first script is a real, real file path. This will work on my system, but it's not overly portable with next flow. I think one of the greatest strengths is the portability of the pipelines. So there are other ways to sort of do this so that if you were to try and share this pipeline with someone else, you wouldn't need to hard code in this file path again. You would just be able to do this automatically using using. Which I'm about to talk about. One thing I did want to point out here is that if you are making a script from your code block like I have done here, you do need to make sure it's executable. So you need to run this on your code to make sure that it's got the right permissions. So the first way that you can store your scripts is using a bin directory. So instead of trying to sort of include your first, my first script as a part of the directory, the same directory as your pipeline, what you can do is create a folder called bin and then store your scripts in there. So what next flow does is whenever you execute pipeline, it will look for the bin folder within the directory of your pipeline. And if it's there, it will mount the files or the scripts in that folder to your path and they'll be automatically executable in your pipeline. So what that will look like is here you've got my first script and the bin directory. This isn't the same directory as you'd expect your pipeline to be in. This is the same script as before. But the only difference here is that it's got a different, you're not having to specify the whole file path. You can just have it here in the bin directory. And your script block, the only thing that's changed is instead of having your whole file path, you've just got my first script dot r. And what next they will do is automatically, like I said, mount this file, mount this script in your bin, and it'll be executed executable automatically, which is a really powerful way of storing the script. You can have lots of different scripts in a bin directory and they can all be executable automatically. There's also another way to do this, which is using the templates directory. So this is very similar to a bin directory and that you can have templates, a folder called templates next to your pipeline file with your script stored in there. Now there are a couple of differences that you might notice while of course it is in a different folder such as in this templates directory. And here I don't need to specify arguments or the arguments command that you saw previously because what next they will do is treat this. I guess it'll treat this exactly like you'd expect a script block to be specified and what it will do is using this template. It'll just look in the templates folder and execute this as if it was a code block included here in the speechmarks. What you'll also notice is that here I've included this named input and it will automatically be able to use this straight away. You don't need to use arguments like you did with the bin directory. So the next thing is dependencies. So the scripts I have been executing worked because I've r-installed on my system locally. But if you are using it on a different system or you want it to be 100% reproducible on a different system, you will need to consider dependencies and how they're managed. So dependencies with a custom script are managed in the same way as other tools and I'll show this very shortly with some examples. But of course with a custom script like this you might expect to have one or more different tool packages which can add complexity to how these are sort of integrated and stored. As with other tools, if you are using multiple tools in the same module or same process, you can store them in a combined mull container. While I won't go into it today, there are helper tools and documentation available. These slides will be available and both of these are clickable links where you can sort of read a little bit more about this, about how NFCore has a modules mull function that can help you find mull container with the dependencies you're looking for, as well as how to package multiple containers in one mull container. So this is an example of a mull container, a mull container of how to use dependencies with a custom R script. So this is directly from the RNA sneak pipeline. The process is called stamina summarized experiment. And what you will notice is that down here in the code block we have stamina summarized experiment. This is an executable R script with a couple of argument inputs. And at the top here we have the condor and container declarations. In this situation it is just a single R package and tool. So this is R-basis a part of this package already. So you only need to specify this once. And this is also already have the Galaxy project and biocontainers images containers available. So this is relatively simple example with just the one tool. But you can also have an absolute monster. Also from RNA seed pipeline here we have the DC2 QC process. And what you can see here is that we have a very large number of different tools that are implemented as a part of this. In this case, a mull container has been created, which contains all of these. This is kind of a, probably not a great example because the versions of these tools haven't been pinned to the condor tools. And that's because there were conflicts as this is getting created. But normally you might expect to sort of see some version numbers after each of these. Again, this is probably a bit more of a monster of a script. I haven't shown this here, but you can find it by going and looking into the RNA seed repository. And what you can see here is that this is an executable script again with a number of different inputs that can be taken as arguments. So just kind of summarize what I've covered today is that next flow can use custom scripts written from many different languages. Scripts can be stored in both the bin or the templates directory and both of these will be available to the next flow script. Meaning that you don't need to specify an absolute or a relative path as you're executing a script. And it's really fantastic to do this because it makes your scripts much more portable and usable by others. Dependencies can be managed using condor and containers in the examples I've shown. You can see that it can be quite simple or much more complex with the use of multi containers to help you store all those together. The singularity and condor images. And with that I will finish. I think we're probably about where I thought would be for time and I'll finish on this. And if you have any questions, I'm happy to do my best to answer them. Thanks very much. Thank you very much. So anyone can now unmute themselves if there are any questions you can just ask them or put them in the chat and I will read them out. There is one question in the chat. If you're on a newer next flow version. There's also and a link to module binaries. Yes, exactly. So I've included probably some simple examples here using the bin and templates directory. You can also store these scripts another way in the along with modules. But I haven't gone into that as much through his documentation on this on the next play website. But today I just wanted to focus on probably what I think are the more easy examples, the simple examples, but you probably don't need to do the more complicated systems with complicated techniques. Okay. John is potentially asking a question. Yes. Yeah, we can hear you. Great talking. I was just wondering if there are small scripts or routines that one use often and in many different pipelines. Is it possible to put them on a guitar repo and then pull them in or to have a script stored some common area. Good question. So I think at the moment, most customer scripts are stored locally and executed locally. So compared to integral modules, which kind of the shed where you can sort of download them directly. Most examples of custom scripts I think are stored locally, but with as I mentioned very briefly just before. You can store scripts locally alongside a module with templates and that could also be downloaded at the same time as a module. I don't think that's really being done yet. But there's nothing stopping you from sort of copying and pasting these scripts into your own next slide pipeline and executing them directly from the bin directory or the templates directory. So I guess I guess the bottom line is not that I know of, but it could be done. And there probably are examples out here that people have done this. I just don't know what they are. There's another question on the issue of portability when using custom scripts. Do you mean if the custom script is not within the directory tree or the base dear. So with a custom script, the next flow stages, all of your inputs for execution, if the script isn't defined or isn't included, it won't necessarily be found. So if you were just to include a script alongside your main next flow script, it wouldn't be staged because they wouldn't be able to be found when their pipeline is being executed. So if it's just in the pipeline directory, no, it won't be found. But if it's within the bin folder or the templates folder, then it can be included and will automatically be sort of staged because it's in the path of the script or of the tool, which is a kind of some sort of question again. Yes, if it's not in baster, it won't automatically be found, although you could probably use that to specify a relative path. So I wouldn't necessarily recommend that when you've already got the bin and templates folders available for you. Okay, it seems there are currently no more questions. If you have more questions, you can always go to Slack and Bite Size channel, or you can contact Chris, I guess directly. I'd like to thank Chris and, as usual, also the John Zuckerberg initiative for funding these talks.