 Hello and welcome back to the Nextflow and NFCore online community training event. My name is Chris and I'm a developer advocate at Secura Labs and I'll be the one taking you through the training material again today. We'll first start off with a recap of what we did in session one and session two. We will talk about what we'll do today as a part of session three and then look forward to what we're doing tomorrow as a part of session four. In session one we started with a welcome and an introduction to the Nextflow ecosystem. We started to get to know the Nextflow language by examining the hello.nf script and then we started to expand on this by developing our own proof of concept RNA seed pipeline. In session two we were introduced to NFCore and we looked at NFCore for both users and developers and looked at some of the documentation and tooling that's available. We finished off the session by looking at modules and some workflows and how these can be used and shared between different pipelines. Today as a part of session three we will be expanding on some of the concepts that you've been introduced to as a part of session one. So how you can manage your dependencies and containers, channels, processes and operators. You have an introduction to the groovy language and we'll expand on modularization of modules. In session four we'll continue to expand on some of the concepts you've been introduced to. So the configuration of pipelines, different deployment scenarios, cache and resume, some ways you can troubleshoot and we'll finish off by getting you started with Nextflow Tower. If you have any questions during this event please direct them all to the dedicated Slack channels. We have a number of community volunteers who will be there to help you during the event. Okay so let's get started. What I would like everyone to do is head back to the training material that we use as a part of session one at this link shown on the screen. Okay so here I am back on the Nextflow training website. If you're still looking for this again this is training.nextflow.io so if you type that into your browser you should better find this site. For anyone that was using Gitpod I'd like you to click on this button here that says Gitpod open and Gitpod and this will open up a new Gitpod environment that we can use as a part of this training. If you're using your local system please just move back to the Nextflow material that you had opened previously so the Git repository that you cloned. What you're seeing here on my screen is a list of the running workspaces. So Gitpod actually saves the environments that we had used previously. In this case though I would like everyone to just create a new workspace. I have already done this which is why we have a couple of these listed here already but mine is already open here. It's quite nice to just open a brand new environment. This is because any changes that you might have made in previous environments won't be included here. This is brand new. This is that base image that we started off from last time. So like I said if there's anything that's happened in the background or if you've changed anything that you weren't supposed to it's totally okay. This is a new environment and we're all going to start off from square one again. But because we are starting from square one we also need to add some things back into this environment that we did in session one that won't be here now in session two. And the first thing is adding a version of Nextflow so we can do this here. So you might remember that as a part of the environmental setup we exported the version of Nextflow that we want to use. I'm just going to go back to this environment and do this again. So I've just added in that version and then we can do something like Nextflow minus V for Nextflow version just to check that that is being installed correctly. Cool. That looks like it's worked properly. So you can see down the bottom there that is 22.4.5. What I will also do is you might remember that as a part of session one we added an extra line to our config here which was Docker enabled equals true. Just going to save that as well. So this is something that we did just so that whenever executing a script from this folder it would look at this the Nextflow would look at this Nextflow.config file and it would see that you want Docker to be enabled every time. So that's just stops us from having to use with Docker whenever you execute a command. Okay. So that's everything you need to do in a new environment you get it back to a place that it was at the end of session one. What we'll do now is jump back to the training material. We'll go down here to managing dependencies and containers. This bit at the top here is quite a nice explanation of why you should use something to manage your dependency and containers. I think it's quite a nice summary. So computational workflows are really composed of single scripts and tools and we saw this as a part of the introduction in session one. There's lots of different tools and scripts included in that DAG diagram that we showed you. As such, installing and maintaining such dependencies is challenging and it's quite time-consuming as well and a massive source of irreproducibility in science. So to overcome this you can use things like containers and container management software to manage your tools and libraries for your data analysis applications because all of the tools can be calculated in one or more self-contained, ready to run immutable Linux containers which can be easily deployed using things like NextFlow. Okay. So that's all the introduction I want to do. We're just going to jump straight into some coding and the first thing we're going to do is look at using Docker. So Docker is a management tool built to build, run and share container images and there's lots of information. I doubt this online. What I wanted to do here is just show you how you can sort of work through some of these ideas of running, pulling containers and running them in interactive modes, sort of build up to creating our own container and using it to run one of the scripts that we've already used as a part of session one. So going back here to the Gitpod environment, what we can do first is type in Docker, see that Docker is installed and there's a bunch of information on different options and commands that we can use. What we can do is Docker run hello world. So this is just an example of actually running a container from this online library which contains hello world. I tried to find it locally. It couldn't. It pulled it from library hello world. Here's the pull complete with the digest and the status and just the output of hello world. So we know Docker is installed. We know that it can talk to the internet and it can pull down these containers from public libraries and that's a really great place to start. What we can also do is sort of use Docker to pull down like a base image of a Linux system which we can then use to develop on top of to add all the tools and software that we need. So in this example and in the training material under 4.1.2, under pull a container, we have this line of code here which is Docker pull, Debian, which is a version of Linux, stretch slim. So it's just a slimmed down version of it. So here it's just pulling it. So bringing it down and loading it into our system. We're making it available to us locally. And this is telling us what it's doing. It's pulling it as complete. Here's some checks that is done to make sure that everything is working as well as where it's stored. So if you were to run something like Docker images, you can see that we now have this downloaded or being pulled to our system. So we have repository Debian, it has a tag, an image ID in the size here, which should be quite useful to track. We also see hello world, which is what we've just pulled or was pulled automatically when we ran it just previously. So that's very, very small as well as this RNA seek NF image, which we've used previously here as a part of the next load config, for example. Okay, so we now have this image. We can use Docker images to actually view images we have available to us locally. This is all still in our system. What I can show you next is how you can actually run these containers in an interactive mode. So before I do that, what we can do is just show that here we are sitting in our repository, NF training folder, and these are all the scripts and things that are available to us. We can run Docker minus IT, so that means it's an interactive mode, and then specify the repository image that we're using, and we're just going to use it with bash. And what it has done is essentially allowed us to go into this container. So we're now operating inside this Docker container in an interactive way. So if you do things like you list the contents, you see that we're doing completely different sort of file directory. This is completely separate to where we were. So all the files that we had in our Gitpod environment don't exist in here. This is a completely different operating system. If you want to exit this, you just type an exit, and it'll get you out of that container. Okay, so there's a very quick example of that that you can actually enter these containers in an interactive way. What I want to do next is actually build my own container. So currently, all we've done is use containers that are available online. I want to create my own. So to do that, I'm going to use code Docker file, and this will create a Docker file at the top here for me. This is the same as something like Vim or Nano. I've just used code to automatically open this up on my browser. What I'm going to do next is actually copy out everything inside this code block here and put it into my Docker file. So what this is is just saying from DBN Stretch Slim. So this is the base image that we're using, and we're going to build on top of this in this container. So we have some information here. So this could be like my name, for example, it doesn't really matter for now, but you can add some labels here. We are installing an additional piece of software. In this case, it is called Kalsay, which is just a fun tool that you can use to create a cow that says something on your screen. And down here, we're just sort of exporting, making sure this is available on our path. So I'm just going to save that and make sure you do save that. You'll see it's popped up over here in my Explorer. And what we can do now is build the image. So this will be down in 4.1.5. We have this build command. So we can use Docker build using minus T to give it a name, white image. And don't forget this little dot here, because that just means that it's happening in the current directory. So as you can see here, we've got steps one through five. A step has been created for every line in this Docker file. And as you can see here, it's downloading and including everything it needs to and putting it inside this container. So you can see successfully built and it's been successfully tagged with my image latest. So now if we were to do the Docker images, which Docker images, you can see that this new my image has been created. It's been given the tag latest. It was created 20 seconds ago and it's 130 megabytes. Okay. So that has now been created. And we can now use this image. So we can run this image. We could just run it directly from our current, from our current directory using Cal State. So Cal State is a function. We can do hello, next flow. And there we go. So what's happening here is we're using Docker to run the image. The image is getting run with the Cal State function and Cal State is saying hello dot next flow. We can see that Cal State isn't actually installed. Hello, next flow. This isn't actually a sort of my system that just fails. So Cal State only exists within this container image. We have to use that container for it to actually work. So that's really nice. But as you'll be aware, Cal State isn't only useful for a biofactor pipeline. But what we can do is we can rebuild this image with a tool that might be more of more use. So under 4.1.6, so over here we can add a software package to the image. So what I'm going to do is I'm going to copy this and I'm going to add this into my Dockerfile. So this will be online 10 to 12. If you align slightly different, it doesn't matter too much. So all this is doing is just saying basically curl this tool and install it onto this, into this image, into this container. So what we can do is I'm just going to save that again. Then we can do Docker build minus T image dot again, and just rebuild this image. What you can see here is we've still got steps one through five, but we now have an extra step. So step six of six, which is running this code, this new code that we've introduced up here to download and install salmon within this image. So now we can do Docker run. We can do my image salmon. So because salmon is available, we use salmon and scripts one through seven, yes, part of session one. And we can just check the version. And we see salmon 1.5.2. So there's a version of salmon that was installed. You can see that included here as well. And also up here that is 1.5.2, the version that was installed. As well as this, we could just run this in interactive mode. So we can actually enter this container and use salmon within it. So we could do Docker run minus IT, my image at bash at the end there. So again, we've entered this. This is a different file system. You see that we now have this salmon included here that we could just do salmon. So we might end. G typos there. We can see that we get the image. We get the version of salmon printed again, which is cool. So images are a way that you can basically include one or more pieces of software on top of a Linux base image. And as you have all that package together, which is really cool. So that is nice. That is cool. What we will do next is I will just show you a little bit about file system mounts. So if you were to run this image, this line of code, this bit of code here on your system, you'll see that it actually fails. And you'll see that this is because this transcriptome file provided by the transcriptome does not appear to exist. So as already alluded to and mentioned a couple of times, the file system that exists within the container is separate to the file system that I'm currently sitting in as part of this get point environment. So when we're listing the files that are available, the Docker container is separate to what we're actually working in right now. And because of that, we need to include basically the volume or give Docker a way to talk to the current file system. So there are a couple of different ways sort of described here. The first is that you can just sort of specify the exact file that you want to include. So we can just copy this line of code here. We can just paste it again there. And what you'll see is a big long output that is saying that this has worked and that this is, this file was available. This is really because what we're doing here is we're just mounting we're mounting current working directory, and then a relative path to this file. And we're providing that again to the container when it is being run. That is if you're doing one file, one file, excuse me, alternatively, you could sort of mount the parent directory as an identical one in the container doing this PWD colon PWD. So that's just sort of saying this working parent directory to the current directory will be specifying the parent directory within the container. And that works as well. So that's just an easier way in that you can sort of include everything that is included locally to the container as well. You can also set a folder doing this. So you can set, in this case, it's called data, and you can specify it again doing this. Unfortunately, I don't have a lot of time to dig into this more detail and show you how things break and work together. But I just want to highlight that you do need to actually mount the data that you want to use as part of your container because it is a different file system. There's a little bit extra here about how you can upload your container to Docker Hub. So if you have a Docker account, you can create these for free and you can log in, provide a few details and actually share this. So others could also use your images as well, use your containers as well. Okay, so what we're actually working up to here was running the image that we've created, so an image that contains salmon with one of our scripts. So you might remember script two from session one. Script two is quite a short script. It only really has the one process, which is indexing the transcriptome.fa file. So this is the file up here that we've included as a parameter. We specified it in here and it's just running salmon index on it. As you can see down the bottom, it works quite nicely by specifying our new image. One thing I do want to point out here is that we're using with Docker and the MyImage file or the MyImage container. We're doing this in the command line rather than in the config because we already have this next slide RNASeqNF container here. So by doing this in the command line, we're effectively overriding what's happening here in the config. As you might remember again from session two where we talk about the different places that you can configure a pipeline. Okay, so that's really Docker. What we will do is we're going to skip singularity because it's not installed on the system and jump down here to the condor by condor packages. So condor is another popular package environment manager. All of your next flow pipelines will most likely have condor environments that you can also use when you are executing the pipeline. So you can say with Docker condor singularity, condor is another way to use these tools, but it's slightly reproducible because you're not sort of building on top of these Linux images. So you can have different things happening and versions changing in the background. The first thing we need to do is actually just run this code here, which is condor init and bash. So this is just because we need to add basically a line of code for shell interactions. And this wasn't done previously. So what we need to do is just do condor init and bash. And it's just modified our Gitpod bash RC file. And then we just need to run bash again just to actually apply this. Okay, so again, we'll work through this quite quickly. I do apologize about that. But what I want to show you is you have this environment.yml file. So this is sitting in here. You can open this as well in your Gitpod environment. As you can see, it's got a name, some channels and dependencies. So these are all the tools that we used in our script seven pipeline that we developed as a part of session one. And what we are going to do is we're going to create an environment using condor. So condor environment create using the file env.yml. So what this will do is create a condor environment list. So at the moment you can see what we've got there is base. We're going to create another one with the file environment or env.yml which we can contain all of these dependencies. This will take a little moment to run. So what I will do just to keep things moving is just sort of point out the next couple little bits of code that we're about to look at so that when we actually come to doing this in the command line here that will move through it quite quickly. So as already shown, you can use this condor env.list and this will list all the different condor environments that are available locally on your system. So originally we had base. You can tell that this is one that was active by having those little asterisks here. We're creating NF tutorial which is the name that we have given this environment here. And this will be the location that is created. We can run this after the environment is created to look at this. We will also see that this is inactivated. So we can run condo activate and then look at the condo environment list. Again we'll see this little asterisks has moved down here meaning that this is the one that has been activated and then we can actually use this next flow run script 7 with condo. So this is much like with Docker but we're going to use with condor instead and then this will be the path to the environment that we've created here. So what this is doing just to kind of recap and bring it back to kind of the big picture here is that you can use effectively a recipe to create an environment and then you can execute your pipeline with that environment using next flow with this with condo flag. So this is still taking a little bit of time to run. So what I'll do is I'm actually just going to pause the video and once it's finished I will come back and show you the rest. Okay again my apologies about that just to keep things moving I decided to pause it there for a second. What you can see on my screen is that we've run this command condo environment create. I'm using the stable file and then down the bottom a couple of warnings there but nothing to worry about but we've now got this environment that we can look at using condo environment list. We can see that we now have this NF tutorial which is the environment that we've used this recipe for and we can activate it here just by using something like condo activate. Cool so that's how you can create an environment using condo but probably what you're wondering is how you can use this in next flow. This can be done quite simply and easily by using something like next flow run your script name and then just changing from with docker or whatever you're using to manage your software to with condo and then just specifying the path to this environment that's being created. If you're doing this locally it might be a little bit different but if you're just using gitpod like me you should be able to type that in and it should work pretty quickly. And as you can see here it started running and everything seems to be working quite nicely. So one other thing that you can do is you can create a condo like environment using micro number. Again you can use a recipe like this and then you just modify your docker container a little bit so that you actually just build this within the container using micro number so a different base image there as well. Unfortunately there's quite time consuming as well and we don't have time for it today but if this is something that does interest you there's some really nice documentation here about it. One other thing I can show you is that you can also pull containers directly from an online community initiative such as biocontainers. So for example most tools if not all tools that you'll probably come into contact as a biocontainer. A lot of them exist already on biocontainers and you can pull directly from biocontainers the repository instead of having to create your own image. So just to show this you can just pull so this is docker pull much like we did docker pull of the demian stretch slim image. This is just downloading pulling straight from the repository and then down here there's quite a nice exercise where you can actually use this image that we've just pulled to execute the script to again in this case using salmon but up here this is this is fast you see so it's a little bit different but down here you can actually do this exercise which is which is quite a nice one. Down here there's a little bit more of a complicated exercise where you can actually specify the container as a part of your process so this is going that we do a part of your nf core so you can specify a container as a part of every process every module it can have a separate container so that everything is kind of like a substitutable unit and everything kind of moves around together so you don't have to worry about creating one big container for your software you just have lots of small ones which can be quite nice. Okay so I think that's finished downloading what we can do it's just docker images you can see that now exists down there as well so it's pulled from the biocontainers fast you see repository with a version and of course when it was made in the size there's probably much more recent versions of this considering was created four years ago okay so we will move on to the next bit of the training material which is channels, processes and operators okay so for the next 60 minutes or so we will go back and talk about channels, processes and operators in more detail these are all things that I've mentioned and given a one or two examples of as we're developing other scripts other workflows as a part of this training but now it's really the opportunity that we're going to go back and dig into these in more details and hopefully I'll give you some more explanations and examples that will help you understand of how these things work and what they mean so starting with channels as a reminder channels are key data structures that allow for reactive functional orientated computing excuse me computational workflows based on the data flow program and paradigm so what that essentially means is that they're used to pass data from one task to another so here we just have task alpha and beta and this channel I was passing files z y and x between them something we haven't spoken about is that there are different types of channels the first being queues and the second being values queue channels are asynchronous unidirectional FIFO queues that connect to processes or operators asynchronous means that they are non-blocking unidirectional means that they flow from a producer to a consumer and FIFO means that the data is guaranteed to be delivered in the same order as it was produced so first in first out a queue channel is implicitly created by process output definitions or using a channel factory such as channel of a channel from path so I guess by default you probably expect a lot of your channels to be queue channels consequently if you were to do something like this in a simplest form I would be using snippet.nf being pasted in there and save it and then we can just run this so this is just going to be next flow run snippet.nf again this is just the name of the file the excuse me next flow is executing and we can see the output here is one two and three the second type of channel is values so a value channel is also known as single channel and by definition it has to be a single value and it can be read unlimited times this is quite important that it can be consumed without limit a value channel is created using the value factory method or by operators returning a single value so first last click count min max reduce and sum so the examples are that and the main thing to remember is that it's really just a single value you will never be able to have a value channel of more than one value so to really show you how this works and how this relates to channel queues we will play around with this script here so I'm just going to paste this on top of the snippet.nf you're very welcome to do the same so if we were to run this we have channel of one two and three and channel of one making up channel one and channel two so sorry quite a few numbers there but we have this process block which is essentially taking both of these channels so channel one and channel two giving a standard output and inside the script block we are getting we're adding these two together so you're expecting these numbers to be added together down here in the workflow block we just had the process sum with the two channels and we're viewing those the up the resulting output so again we're just going to run this and what you might remember what I said earlier is that queue channels can only be you know the parts of a queue channel the elements can only be consumed once so in this case we have channel one two and three and only one element so one in this channel two because we have more in channel one than channel two it means that only the the addition the sum will be calculated for when there are two channels to be combined so that's why we only have two because this one is getting added to this one and then it has to stop because there's nothing for this two and three to be added to so to show this in a slightly different way I'm just going to add in one extra number then we execute that again and what we'll hopefully see is two and four because the one is getting added with the one and the two is getting added with the two because the order of these are maintained again so we can see two and four now if you're going to add three you can do that if you're going to add them four you know this will work the same way one plus one two plus two three plus three and this four is going to be left out because it doesn't have a pair yep so I was only executed three times because we only had the three elements to be passed through okay so that's fine when it's a channel the channel will only be used if there's in this case another channel for it to be paired with for this function but what happens if we have a value channel so we can get rid of these numbers and change this from channel of to channel value so remembering value channels can only contain one element and as a part of that can be consumed multiple times so we can run this again and you can see here two three and four so we're having one being added to the one one being added to the two and one being added to the three producing these three outputs here so you might be wondering how this how this works essentially what's happening is under the hood so sort of happening behind the scenes the next flow q channels contain what's known as a poison pill so once it's consumed this poison pill is hit and it stops this is different to a value channel which doesn't have this so what you can do here is actually run these little bits of code but run them with this DSL one flag so DSL one is an old version next flow but it also has this capability of actually printing this to screen so we can't do very easily with the SL2 so you can you can run this again just with channel of one and then channel dot value one and print these lines and you'll be able to see that you do get these differences here but it's more just for those that are interested there's this kind of poison pill happening under the hood which is why the processes stop okay here's just another example we've actually used the first operator like I said you can use these operators to basically produce a single value and those single values are thought of as value channels rather than q channels so we're doing this here actually I think it's probably nice just to show this so it's channel two dot first up here it is a channel of so it should be a q channel but by running it like this what you'll see is that this q channel has been turned into a value channel because of this operator meaning that it can be used multiple times great so that's that's kind of like the differences between q channels and value channels what we can do now is just work through this list of channel factories so there are lots of different types of channel factories of course some of these produce q's when it's the value channel for example it does produce the value type of channel so what we have here is channel one channel two and channel three probably what's most interesting about this is that you'll notice here that this is actually a list so you can include a list as a single value so that if you were to view this I'm just going to append view on the end there we're going to run this you'll see that this this is printed as one but if we were to add another element so another thing outside of this list you'll see that this will fail because it'll be an invalid data type I assume yeah invalid method indication value so not a data type there's a it's this because of the invalid method you know you could always just try and include there as a part of your list depending on what you're trying to do there we go and then I'll be flying in so I've talked about values I've talked about using of so channel dot of we've done this in the previous script one thing different here is that just added this value at the start here and what that'll do is just print some text at the start of your outputs so here one three five and seven each of those it's getting used as an item sort of generated dynamically because of this dollar sign and you can see that's getting printed out to the screen there okay so you can also use things like ranges to sort of generate slightly more interesting lists not excuse me lists channels so here for example we've got channel of one to twenty three x and y you can sort of execute that again and we'll see that it becomes a big long list rather than having to specify your tree number individually we can generate for those for you there's also channel from list and this is quite a nice way that it's that you had a list of things that you wanted to include hello world again so we can take this from a list you know it's a list because it's got the square brackets and we can view it you can run that again and this should give us the channel output which is each of these separate separate lines yep hello world again we've already used hello path quite a lot so this is how we specify a path or in this case a file here for example we can do something like this so we can include that in a pipeline or in this this little bit of code here so the channel from path data meter dot csv so we know that data meter dot csv there's a couple in there it's picked up both of those because we've got this glob wildcard near so there's a little bit more information here about the different types of ways that you can find files using from path these are all quite dynamic and can be used in really useful ways if you are trying to specify files and sub files in different folders okay we looked at using from file peers so this is the same as what we're doing in session one when we're trying to when we were pulling in excuse me we're pulling in multiple files using this this one to pattern so the paired files we've already looked at this in some detail during our development of our RNA-seq pipeline yep so we've still got the gut liver and lung with the paired fast queue files that was picked up using this pattern so we've also got from sra so this can pull straight from the sra repository online i don't have this set up today and just because the time it does take a little bit longer to set up so we'll move past that for now but what i do want to show you is that you can also bring in sort of data from text files for example so here for example we can have channel from path which is a text file and then we can use this split text and this will split the text up into sort of usable chunks so here for example we can go back to this path let's copy that so we've got from path data meta random text so data meta random text and we can see that here we've got seven lines with some text in there what we can then do is look at the output of this so you can look at the outputs of this channel you can see here it has been split into multiple lines this can become more dynamic and you can sort of keep developing this in different ways here for example we're going to split the text by 10 and change it to uppercase so you can imagine that if you did have like a sample sheet you could bring that in and potentially split it up and sort of use different parts of that file in different ways depending on how that that file was originally structured so again we're going to bring in this random text we're going to split it up into chunks of 10 or split it up into 10 chunks five six seven eight so 10 chunks one two three four five six seven I guess it's all been split I'm not sure why that did that actually by 10 it's probably a good example to actually go back and read into the documentation but for now we'll keep moving through this here we go split files into chunks of 10 lines and transform them into capital letters I guess because it doesn't have 10 lines it hasn't three four five six seven there'll be there yeah so that's probably a bad example but if we had more lines of code there it would split it into slightly better chunks you could also do something like this so this is just going to add a count to the start of every line so count starts as zero and as you're trading through you're adding onto that count so here for example we can run that again and that'll produce counts of the start of those lines yep so we got zero through six so it is zero well it started at zero here so that was the first one then it became higher with every iteration so this is just a very simple example with the text file but you can also use CSV files here for example we've got some patient data which has been brought in and then split and then sort of manipulated using row went to different rows and we've been indexed so different parts of that file will be shown so for example we can go and dig into this detail so we've got patient one so we've got patient ID, GERA ID, S3 directory, number of samples, manual failed regions all of this is separated by commas and next slide will automatically pick this up but what we want to do is actually just only look at certain columns so we've run that again we've used the index to actually just pull out the patient ID and the number of samples so that's just those two columns there again just with the excuse me commas as separators there are other examples here so you can do different things to specify different columns or add different column headers as you go there's some more information there about using tabs as separators but just because of time you won't dig into that similarly what I can show you here is that there are more complex file formats and there are ways that you can import functionality to handle this for you so these are all quite nice examples but this will really have time for four channels just to recap there are the two main types which are queues and channels queues and values and the biggest difference is that the queues asynchronous unidirectional and FIFO where these value channels are one value can be used multiple times which is in contrast to the queue channels which each element can be used once okay yes so processes in next flow a process is the basic computing primitive used to execute the sort of functions that we're trying to include as a part of that pipeline so they could be custom scripts or tools as already sort of noted and shown the process to start with the keyword process followed by the process name and finally the process body dominated by curly brackets by convention the process name is commonly written up case letters can be seen here with this process say hello and this is really just for sort of making it easy to identify processes as part of your larger pipeline a basic process only has to have a script as a definition block it might look something like this but in reality they can be much more complicated and include up to five definition blocks which are directives inputs outputs when statements and the script down here we kind of have an outline of how this is all sort of shaped by convention you'll probably have the directors at the top input outputs your when statements and the shell script I think largely you do need to keep to this sort of format if you play around with it too much you might find that things start to fail looking at a real world example we can look at the salmon index module so this is an integral module and you can sort of see the same format again we have a process which is named salmon index we have some directives at the top we've got input output a when statement and a script with a few definitions and a script wrapped up in here with these double quote marks so let's start off by examining the script block and how this works so here we have this example which is process example we have the script block with the double quotes and we have a series of different lines so you can sort of include different lines part of your script block it doesn't have to be one big long line down here in the workflow we're executing this we can show this as an example over here while this won't print anything to screen what we can do is actually look in the work directory where this process has been executed and as you see if these files have been created so here example has been executed we now have this hash number so we can go and look in the work directory to see what's there and we can see that we have chunk on the school one and the chunk archive zip file which has been created here as a part of the script so we know it's worked which is great what we can also do as a part of the script block is actually specify what language we want it to be interpreted as and we can do this using a shebang declaration so here at the top of the script block in this example we have this python shebang and we have some python script in here down here we have the process being executed as part of the workflow so again i'm just going to copy and paste this in just going to save that and we can execute this again obviously to keep executing this is next flow run snippet.nf so just keep writing on top of this file here just to make it a little bit easier to track and for me to execute quickly so again what i want to do this time is actually look to see if this is executed because there aren't any files being created so i can look into the work directory and i'm going to look at the command.sh file so this is a file this is a code that was actually executed for this process and you can see here that the shebang was carried over so that when this was executed it was interpreted as python because of that okay moving on 6.1.1 is about script parameters so this is just noting that they can be defined dynamically as a part of your script so here for example we have prams.data and in this case it is world and this has been included as a next flow variable here using the prams.data and the dollar sign to show that it is a variable i'm going to copy and paste it in there click save again so this won't actually print anything to screen what i should do is i'm going to include this debug statement so debug is used to debug your code we have an echo statement it can be quite nice to actually show what is being generated without having to go into the work file and look at the command.sh so i'm just going to run that again we can see that foo was executed successfully last time but this time we're going to debug we have the debug equals true and we can see that hello world has been printed to screen so the world has been interpreted as a variable by the script block one thing very worth noting is that because the next flow uses the same batch syntax for variable substitution and strings batch environment variables need to be escaped using this the backslash character so just to show this i'm just going to copy and paste it in again so we can just run this snippet i'll actually add debug in there debug true just so i don't have to go and look in the work directory you can see that is executed successfully with debug this should print to screen yep so the current directory is in this case the work directory so where this is actually getting executed but if we want to escape well if we don't want to escape we can just run this again and we can see that this should be the work directory which i'm currently in where i've executed the script so you see that it's the work space work space git pod nf training and i can run that again to show that this is where i am that can get quite complicated though so if you do have lots of different bash variables that you want to sort of use in the same script you might want to consider sort of flipping everything and specifying your next flow variables using a different syntax so here's an example of this where we're using a shell block so this is one of the differences here to actually do this you need to change script to shell instead of double quotes you have single quotes so we can copy this again i'm just going to put this in here and just hit save and just run this so because of the explanation and curly brackets this will be interpreted as a next flow variable rather than a bash variable like everything else in the script is being interpreted as so here oh i do want to add in a debug debug true i want to see this as in the command line so that's just like i said it's as debugging so it's a nice way to print it to screen because there this has been interpreted as a variable if we remove this and try to do it as we normally would with any slow variable what is having the dollar sign in front we will find that this fails yep so here we have an error message that the prams is an unbound variable so it wasn't wasn't identified okay that is something to consider whenever you're writing your script that if you are trying to use something to you know address your current directory or something else about a relative path you do need to be a little careful about what is their bash variable and what is that next flow variable moving on to 6.1.2 which is conditional scripts in this example we have some if our statements so this will be kind of like your typical if this else do that here we are using the parameters compress so if it is gzip it will execute this block if it is bzip 2 it will execute this block and if it's neither of those it will throw this this error again we're just going to copy and paste it in here just so we can show this we have the gzip at the top I want to show you the results here so I'm going to wrap these in some echo statements and use debug add in my debug equals true so I'm just going to run this again because this is gzip this has been defined as his parameter we should find that we get this echo gzip command which is what we're doing but if we want to change this to compress bzip 2 we can quickly as an easily change this change the execution and we'll find that this second command would be executed which is the bzip 2 cool so that's just something you might consider if you do sort of have these sort of divergent points where if you have one file type you might want to do one thing if you have another you might want to do something else so that's quite a cool way to do that okay so moving on to inputs as a reminder when we have our processes everything that is needed for that process needs to be sort of moved into the same directory it needs to be sort of put there by channels because everything is happening as sort of isolated units so each process is happening in a separate work folder and if the file or information isn't there then it won't be available to be included as a part of that process so because of that we need to specify our inputs generally our inputs will have a input qualifier as well as an arbitrary name want to say generally they should all have a qualifier and an arbitrary name and we have this example here so we have num which is a channel of one two and three we've already got debug in here and we have a value so there are different types of inputs that you can specify one of them is values the other is path which are probably your two main types of inputs that you'll be using here we have echo process job x so this will just print this to screen because we have the debug and we can run this down here so I'm just going to copy this and show you this as an example and we can get rid of the compressed gzip although I probably wouldn't have failed if I did include it so again this is just taking channel of one two and three and it's going to echo process job x the x is defined here is the variable name which is a value so here we have one two and three printed out as process job as you expect here based on the script we can also do this with paths so the last one was a value but if we want to use a path so this would actually be specifying a file not just a piece of information here we're taking a path data gg gal so everything here that is with the glob pattern and then .fq in this folder we are including this as a path so sample.fastq is what is getting named and then here we're using list to include this as an out file this is a bit of a weird example actually because it would only print sample.fastq to the screen so I might just skip over that and focus on this which is how you can use variable names when you are sort of specifying your in this case printing to screen so this will list everything with the next variable sample which is here specified as a path which is going to take reads from the parameters reads from the channel from path so to actually look at this and explain it in a slightly clearer way I'm just going to run this again and what we would expect here is the data gg gal which is going to have all of the files in this folder which have .fastq at the end .fq all printed to screen so the sample is being dynamically generated or used based on the sample name or the input name in this case it was a path to the sample so all of those are included there yeah so you can also get a slightly more interesting example by using a slightly different command but again this is getting used dynamically from the input the only difference here is that this is getting collected into one channel for the output okay moving on to combining your input channels so this is something that we've shown and it was already sort of outlined sort of tentatively here for example that you have multiple channels as inputs and each of these are essentially positional so here you have channel one and channel two and you have channel one and channel two here so these are two different channels that are getting used as the input to the process and you just specify these on separate lines as your input so again we can copy this and paste this in here we can run this again this is getting this x is used for channel one and this y is getting used for channel two and both of these are excuse me values so here we have one with a two with b and three with c so each of these channels is getting paired up we've addressed this a little bit already when we're talking about the value channels and q channels so down here for example when you look at this you'd expect one to be paired with a and two to be paired with b that's what you'd expect so one and a and two and b but because we don't have anything to pair with c and d because of that poison pillar the new q channels you wouldn't expect this to be printed to your screen however here in this example because we have a value this number this this value one we get used multiple times and you'd expect to see one and a two and one and b and one and c okay so moving down to input repeaters so I said previously that there would there are kind of two main inputs which are paths and values but you can also have this each so this is a qualifier that allows you to repeat the execution of the process each time and you eat an item in a collection every time your data was received so in this example we have the sequences which are coming from this data props folder so going back over here we can see data props folder and we have six different files in here all of these would be getting included as a part of this path with this channel and here we have three different methods so for each of these files it would be executed all this this process will be executed for each of the different methods so we'd expect to have 18 different sort of outputs from this to actually show this what we can do is paste this in here so this is a good way for example that if you wanted to have different parameters or a slightly different execution of a process on the same file you can sort of bring in the files and then say do this using each with each of these parameters so here you can see that we've got each of these files and it is getting shown here three times so we have this 0027, 0027, 0027 with regular the PSI coffee and espresso okay so that's really input repeaters there's an example here which you can sort of play around with that where you have different methods to actually run this command what I will do though is move on to outputs so outputs are what we're kind of specifying or what we want to use as an output channel so specify what we want to come in outputs is what we want to come out and these can be sort of defined here so again you have an output qualifier which you'd expect to be properly values or paths you can give it some output name so this can be arbitrary anything you want it to be but it is named this is where this sort of variable and it can be dynamically named we also have this emit here we don't really talk about emit too much as a part of this training but it is used to name the output outside of the process so for example if you were to look at this emit index you would be able to specify this by saying salmon underscore index dot out dot index and this refer to this particular channel which in this case is just salmon the path to salmon and that's how that that output would be named I think we'll come back to this a little bit later on we talk about modules okay so here is an example we have these three methods which are a list and if you were to execute this what you'd find is that this would echo this list as a file which is a bit of a strange example but I guess we can show it so again this is going to be a list this is in a channel this is just going to be taken as a value and it's going to be given out as a value output so this will just be a block here which is just going to be the values which are included here as a part of this list so we see received and then we have the list in there however in reality you might want this to be a little bit more dynamic a little bit more sort of flexible you can do something like this which is taking a path qualifier so it's taking or it's going to use the results dot text as an output and this will be taking a random number which is going to be put inside this text file what we're actually doing here is actually just printing it dot text so this will be printing this to screen anyway but let's show it again I'm just going to copy and paste this in and save it and we can run this I'm not sure if this will work actually I think you need to escape this yep so we've got an error there which is a good excuse to troubleshoot so I think this needs to be used with the backslash there like we talked about earlier just with that sort of next flow variable versus bash variable you have to be a little bit careful about that sometimes yep so received now we have a number in there so yeah if you're trying to run this straight out of the example that'll fail without without that so yeah as it explains here in the above example the process random number creates a file name result dot text containing a random number and that's recent over the receiver dot channel when the task is complete so it could be passed on to a downstream process here this is an example of when you have multiple output files so we sort of touched on this a little bit during the hello dot nf example as part of session one that when you have multiple outputs you can kind of do something like add a globb pattern and then use this operator flat map to group all of these together here for example it's going to replace all of that again so again I'm just copying and pasting this on top and using it as an example here so when we run this it's just taking me a moment all this is getting printed to the screen because we've used flat map we can see it in this nice format if we were to remove this from here we will see that the output is a little bit different so we'll expand on operators more very shortly this is just an example that you can use and operators to kind of like really define your outputs dynamically yep so you see here file chunk aab and ac and ad has all been sort of grouped together in this list without the without the flattened or the flat map rather okay I've touched on this a little bit already but you can name your outputs dynamically so here for example in this block we have this value x which is also specified here so it is being used as a variable to name the output so again this this value x which is being taken as an input is being used to name the output here and we can also see it down here so this is when it is being used in the script block to give it a name based on the input or the name of the value input and it's also used here as the output so this is a good example of what something you might do if you're trying to name a sample using a piece of like a sample ID or a piece of metadata so you might kind of mind that from a value or include it in a value as an input and then you can sort of name it in the output dynamically yep so you can see here dog cat and sloth so these are the species that are included here this is included as the value here and then it is being used here as a part of the output okay so this is the next section 6.3.5 is composite inputs and outputs and this is where we sort of start talking about tuples a little bit more so tuples are when you have sort of multiple elements, multiple parts of your input and outputs in this case and each of these can have their own qualifier so in this case we have a tuple so this can be like a channel so here the channel from file pairs if you remember from session one we had kind of that base name at the start and then we had two files included in a list at the end so we have a value which is going to be that base name or the sample ID and here we have the path to the actual files that were grouped and from file pairs and we have the same here an output tuple we have the sample ID again but we have this path to sample.ban which is being generated here as a part of this process so here we just have foo which is going to execute this process I'm just going to view the channel outputs so again I can copy that I just paste it in here or on that again and we can see here that's the type that we've actually got the two files listed at the start so these are values and at the end here we have the actual BAM file that is that has been generated okay so we will continue on down to the when statement so when declarations they should find a condition that must be verified in order to actually execute the process here for example we have this faster.name and we're sort of requesting that it has to be this pattern to be executed this is a relatively I guess complex example but you can sort of see this pattern here is bb11 but without that it won't be executed all the files won't be used so here we have this bb11 and if we were to hypothetically change this to add a few more ones in there and we try it again the when statement won't be satisfied so it won't run that channel won't be filled the process will probably appear but it won't be executed okay moving on to directives so directively creations allow the definition of optional settings that affect the execution of the current process without affecting the semantic of the task itself so here for example we have cpu's memory and a container and this example this example won't actually work but we have seen these in the past when we use task.cpu's as a part of hello.nf was a hello.nf or one of the one of the scripts we looked at in session one and there are a number of different options here that are defined that you might be interested in using to help really tune the execution of your process using a real-world example again this is the seminar index module you can see the tag here a label as well as a contender excuse me condor and container declarations directors rather and some information near to actually execute it using the information provided okay moving on to organizing outputs so we're seeing this a little bit already in our development of the RNAC pipeline that we use published here to specify where the files should be stored i'm just going to quickly clear this and show you that we do have i'm gonna get rid of that transcript index okay so that's being generated by me by mistake i'm gonna get rid of my work directory i'm just show you that again so we know what's there so clear that what i want to do now is actually copy this so what's happening here is we have this publish dir so this is the the directive of where this this information from this file should be published we have the prams dot outdoor so we've seen this previously as well when we're developing our RNAC pipeline we've got this defined up here as my results down here we have the band dot file so this is where the file will be and with the option mode copy so this is where this will actually be stored going across the next low documentation there's a lot of options here about how you can sort of publish this or have have this stored and created as a part of your results rather than work directory so let's go back here and have a look at it so you can see what you are expecting next flow run snippet.nf so what we're going to expect is that we will have a folder which is called my results with a band dot files folder inside it and we can do this by going ls my results you can see this bands dot file and the side of that we can see all of the files that were created and a slightly more complicated example we can also use semantic subdirectories using the same published directory published directive so I'm just going to copy this but as you can see here we've got these patterns and these are used to sort of collect the outputs into these folders so this code here is a little more complicated as well but what's happening is that these files are being created and then up here using the published directive they're being sort of sorted and organized into output folders again I'm just going to quickly run this so this has been run five different times and now if we were to look in our my results dot folder you'll see there's much more in here and we have these different files or different folders that have been created and sort of stored in different places because of what we've defined up here so counseling outlooks for example we can see those here okay I think that's the end of processes so what we will do now is move on to operators so as a reminder operators are the methods that we use to really sort of manipulate the channel by transforming the values in some way to help them sort of match between two different processes or apply some user provided rules there are seven main groups of operators we've been exposed to some of these operators already but if you do want to use look at these in more detail there's a lot of really great next flow reference documentation but these groups are filtering transforming splitting combining forking maths and other what we will do now is look at some basic operators so these operators that we've either talked about already have been sort of used or shown in some way but we haven't properly explained as well as some that you might expect to use quite frequently as a part of developing your own pipelines using next flow so this is a basic example we have a channel of so this is a channel factory one two three and four we have the channel which we've called nums now we have nums.map map being an operator and what this is doing is taking every item this doesn't have to be it it could be some other name that you've called it but it is taking every item and it is squaring it by itself or it's times it by itself to become a square we've called this square or this channel square now reviewed it so instead of saying one two three four you might see one four nine and sixteen as an output to show this we always copy and paste this over here again let's click save then we'll run the snippet again so again we're taking the channel of one two three and four we're taking every item and squaring it by itself what you'll notice about this is we've actually copied this example here because you don't actually need to specify each of these as a new channel each time you can just sort of chain these together to create sort of a more simplified what I find more readable way of using operators on a channel so going back here we do see that square so it's one four nine and sixteen produced by one two three and four we've also used view a lot so explaining what this does in a little bit more detail the view operator prints the item submitted by a channel to the console standard output depending a new line character to each item for example when we have a channel of foobar and baz and if we were to view that you would expect to see foobar and baz as separate lines in the output or on the terminal an optional closer parameter can be specified to customize how items are printed for example here so using the view we've actually just added in this little hyphen in here so that when this is printed you would expect to see the hyphen before every item we've also used map a lot so the map operator applies a function of your choosing to every item is by a channel and returns the items obtained as a new channel this function applied is called the mapping function as I've used it extensively refer to it extensively and it can be expressed with a closure as shown in the example below so here we have a channel of hello and world we're taking each of those as items and in this case we're adding the reverse operator so this is going to be reversing hello and world so they'll be printed in reverse so again I'm just going to copy and paste that over here and execute it so you see that those are just printed in reverse if you remove this for example you would just sort of print every item again sort of show that so that dot reverse was the part that was actually making it go in reverse there wasn't any special magic there and that can also associate a generic tuple to each element it can contain any data so here for example we are taking hello world and we're using the map take world so sort of item we've called this word rather and we're sort of making this into this this list where we have word and word dot size word dot size will generate a number based on the size of the word and here in the view we have this sort of tuple structure where we have the word and the length or length which is going to take word and word dot size and then the word will contain the length or length of letters this is quite a cool example actually so I think we'll put that in here and look at the output so one thing to remember here is that this because this becomes an explo variable because it's inside the the speech marks you do need to have the dollar sign here hello contains five letters and world contains five letters cool something I think we I think we did this in part of script seven we have the or maybe script six we have the mix operator which combines items emitted by two more channels into a single channel so here we have channel dot of one two three channel of ab and channel of z we can have c1 and then mix in c2 and c3 being the three name channels and then view the output which would give us something like this again I think the output will be what we've just seen but it is quite nice to see live and then what we will do is just show you that you could you know change this and it wouldn't throw anything off too much well just a consistency and here actually gonna get rid of c2 so as you can see down the bottom there we have channel one which has been mixed with previously it was c2 and c3 we're just going to do this again but just show you that you can quite quickly and fixably edit these in your pipelines yep so one one two two three so one thing to remember here is that this can be sort of put back in any order this doesn't have to be sort of in a set structure that's something to be mindful of if you are trying to match or pair things yeah so that's just a warning there kind of explaining that in more detail we also have this flatten operator so the flatten operator transforms the channel in such a way that every tuple is flattened so every entry is emitted as a sole element by the resulting channel so here for example we have the channel of foo and bar which are these two lists up here and we have the flattened so this is they're being treated as tuples we're going to get flattened so that they are emitted as sole elements again what we can do is we'll just show this over here up so I'll just copy and paste that in haven't changed anything there's taking foo and bar being these two sets of numbers and we're flattening them yep so we have one two three four five six showing out there in the terminal there is also a collect so collect operator clicks all the items emitted by channel and a list returns them returns the object as a sole emission so here for example we have channel of one two three four and it's being collected into one so we used this when we were doing what we were using multi qc as a part of script one script and session one script six or seven so this is a way that everything was collected into one channel and it could be put into the into the process at once so again we can copy and paste this here save that and click run while that's running we can have a look back over here and see how we used so here we have like mix and click these two operators which was first to mix the quant channel with the fast qc channel and then we collected it at the end here which is what the same thing that we're doing here i'm going to cite a different way so that everything is being sort of collected for lack of a better word into one one channel group tuple is also quite a useful operator so the group tuple operator clicks tuples or lists of values emitted by the source channel grouping the elements by shared that share the same key and or emit a new tuple object for each distinct key collection system example here we've got channel of one a one b two c three b one c two a three d all of these have been grouped and you'll see that they are sort of grouped here as an output there's quite a nice example here using base name okay moving down to join so the joint operator creates a channel that joins together the items emitted by two channels with a matching key the key is defined by default as the first element in each item emitted so here for example we have x y and z x y and z and the output is just join these together in this case it is left join right because we had left and right here and we are viewing this what you will notice that pair is missing because it wasn't able to join with anything finally you have branch so branch is another i think it's quite a cool cool operator it allows you to forward the items emitted by a source channel to one or more outputs based on some sort of test with selection criteria here we have channel of one two three four eight fifty and we can use the branch operator we can specify that as either small or large by taking each item and testing if it's smaller or larger than 10 and we can see the results this is very similar to what you might expect using the emit for an output that we can use the result dot small so this is specifying the results that are small versus the results that are large and we're viewing those and we can use the item is small or large so what we can do is pop that over here and paste it terminal so what you can see is that these have been printed we've got result dot small dot view if you're to move one of these hypothetically you wouldn't get all of these printed out but you can see that the branch just put them into the small or large here just showing the small you want a small two or small three or small rather than having them sort of mixed and blended in there like that so this is a situation where you might sort of specify this or do this and then sort of push that data down one way and push that data down another based on whatever criteria it is that you've selected okay that's the wrong documentation so as I said earlier there's actually a lot more there are many many more operators that you can use we have a huge number listed here many of which I haven't even touched on all of these are listed here so filtering reduction passing text combining care combining channels forking channels maths and other these are some of the groups I've mentioned before and there's lots of really good examples and explanations of all these are the documentation so that's quite small but as you can see here there's a huge amount of documentation and lots of different ways that you can use operators to really sort of modify manipulate your channels to to test them and and sort of share them between different processes okay with that we will move on to groovy introductions okay so now it's an introduction to groovy so as already mentioned as a part of session one next flow is our domain specific language implemented on top of groovy you do not need to be an expert in groovy to write your next your pipelines with next flow although having a little bit of understanding does definitely help so what we'll do now is just kind of work through kind of some of the basic structures and indians for groovy just to kind of give you a little bit of exposure and hopefully it'll make some of these things that we've written previously might have been a little bit strange make a little bit more sense so starting off with printing values we did this already as a part of I think it was script one here so it is printing printing alien or printing line in this case we were just printing the parameter dot reads printing is a really nice way of just showing something to the screen it is different from viewing a channel obviously viewing a channel is viewing how your channel is structured so the finals and values that are being passed around whether this print line can just show you a string printed to your window so here next flow run snippet.f I'm still using snippet.f to do all of this this is going to print this string this is now bigger so you can quickly just modify that and show that your stored has been able to print this straight to your screen there isn't any magic involved in this we've also talked a little bit already about comments so you can either use single comments or these multi-line comments using the slashes and the in this case in asterisks so you can just keep adding in lines here this is still the same comment and none of that will be included as a part of the script when it's executed so this is just like normally commenting stuff out so that it isn't picked up by the by the the program that's running it cool variables obviously we've used variables quite a lot already using variables with groovy you can define it by simply assigning a value to it so here for example we have x equals one x equals the date x equals i think that's pi x equals the value of false and x equals the string high and again we can just sort of dump all this in here and this order is printed screened because we've got this print line and near as well so again we've got one the date i've got what i think is pi false and high okay we can also use this def keyword to define local variables so def equals equals equals foo excuse me def x equals foo you can see that these are used quite extensively here as a part of the script block just to define some arguments for example and as a rule def should always be used when defining variables local to a function or a closure so for example as a part of that process in this case it was seven and x you can also create lists so list object could be defined by placing the list items in a square bracket so here we have list 10 20 30 40 so we do iterate over this a little bit so i'm going to replace it up here so we have a list at the top you can also use indexes to get different parts of this list so here for example we have list 0 and list dot get to index you can use square brackets or this dot get with round brackets so we can put both of those in here it does start at zero so 10 will be 0 21 30 243 so just as an example i will add one there as well and then execute this so what we should see printed the screen here is two tens and 120 yep um you can also sort of test using groovy so here for example we are asking for the size of the list and asking to print that to screen so we will replace this and put this in here we're going to print the line which is going to ask us for the size so again we can just run next load snippet dot nf and this should give us the size of this list cool we can also use the assert keyword to test if the condition is true similar to a function here groovy will print nothing if it's correct so if we were to assert 0 equals 10 what we should see is that next flow in this case will run but we won't have anything printed out because the assertion is true however if we were to change this to another word so in this case we're just going to call it assert equals next flow 1 and run that again we will get a error message because the assertion is false it has been tested and has failed yet no such variable next flow 1 but of course if you would just try and test it with another number so there is actually looking for the variable which didn't exist you can also have different numbers in there or treat this as a string as well with some numbers and we sort of get varying error messages depending on what was tested and why it failed great so you can also do slightly more sort of interesting or more dynamic ways dynamic ways of testing so here for example we have lists of zero one and two we are looking at minus one so this would be the first element from the reverse end or the first item from the other end so this case would be two so list minus one equals two here we have minus one to zero so this would be doing everything in the opposite direction and is testing if that list equals reverse so this is just kind of showing that you can do these quite sort of complex assertions to see if they are true or not so there should be a new list so we can get rid of that and this should respond without any issues but what you can also find if we were to change this to this so this is actually asking for the second in the reverse in the opposite direction this should also pass because we've got minus two equals one which is the second and from the other end cool so here's a big list of different assertions that you can make different ways that you can test them there's a full list of extension methods provided by groovy and you can find those by clicking on this link here so maps are also an important part of the groovy language that is used extensively in the next flow so maps are like lists they have an arbitrary key instead of an integer therefore the syntax is very much aligned so here for example we have a map which has got a zero b one and c equals two and you can access this in a conventional scare bracket syntax as shown in here so map a so you can specify a which would be this zero equals zero so you can sort of test it this way you can use map dot b so this is a little bit like we'd use with the alt dot emit as I mentioned previously we can check that b equals one and here we can use the map dot get c equals two so if we were to run this I think we need to copy this first so here we've created a new we've created a map here we will copy this in and as you can see that this runs reasonably well so okay by not getting any error messages is showing that the assertions are correct okay down here is just a way that you can show that you can actually modify the map and it's in a similar syntax to adding values to a list so map a equals x so we're replacing the value of a with x here we're replacing the value of b with y and here we're replacing the value of c with z and then we can check that this is true by asserting map equals ax by and cz again we'll just quickly run that just to show that that is indeed successful so again there are lots of different ways to kind of replace a value in a map these are just some of them here or three different ways here at least what's happening there snippet line one okay so this is because I deleted the list so I deleted this line here and nothing was there to actually replace so that should work this time okay 8.6 here we have a little bit about string interpolation so string literals can be defined by enclosing them with either a single quote or a double quote here we have fox equals quick fox color brown and we can print that together so I'm just going to copy that out and show you here so these are just ways that you can sort of stick your strings together but again you can just run that using next flow run snippet we've used this join here to join everything in fox color together so all of these different individual strings have been joined together when we get the quick brown fox printed quite nicely okay something else worth noting is that you do get some interesting effects of using forward and back slashes so something to be mindful of is that using the backslash before a t will create a tab in groovy so when you look at the two of these because this hasn't been effectively escaped it is going to be interpreted as a tab sorry excuse me and that's why you get this sort of weird elongated effect because tabs have been introduced between these letters and the t's have been emitted okay moving down to 8.7 we have multiline strings here you can specify multiline strings much like we did over here is a part of I think of a script too where we introduce the log we've got these multiline strings here and that's all we included as a part of in this case it was logged in info but it's an application of why you might want to use multiline strings for a series of lines of text okay if statements so we looked at statements a little bit already as a part of the operators when we looked at if else different script blocks so that was down here I think it was under scripts conditional scripts yep so we got if and else or else if going back to the introduction to groovy you can see here that you can kind of use these statements again to test something and then if it's true do this and if it's not do that so if and else here for example we have x equals 1 and if x is less than 10 it would print hello there's no else statement here but down here for example you could use it in a slightly more advanced way so you have a list of 1 2 and 3 and you have list does not equal null and list is of a size greater than 1 you can print the list else you can print that this list is empty so in this case it would print that it would print the list but if it was empty I would say that list is empty so this is quite a good way there are other ways to do this of course but you could test if a channel was empty and if it was you wouldn't bother executing that process or even trying to execute it but if it was full then you could try and do something with it down here for 8.9 we have a four statement so this is really like your four loops so this is just going to iterate through in this case it will iterate printing hello world a number of times because it is going to execute each time for each iteration of i so we're just going to go back here I'll close those two and replace this here so for the integer i equals 0 go through this increasing each time so as long as i is less than 3 it'll iterate through 0 1 and 2 saying hello world with i which has been defined up here as the as the end okay so functions it is also possible to define custom functions into a script so you might need to do this occasionally is that you might want to sort of specify custom script just to test something regularly or apply something to your data so here for example we have this example which we test so this is all quite complicated but really what it's doing is just manipulating these numbers and seeing if it equals or is testing for this equation is true it's probably all we really need to know so again we can just test this and then we can assert the function that we have defined and testing if the number which is an integer so our number equals what we're explaining it to which in this case is 89 if we're to sort of change this to 489 we expect it to fail so this is a slightly more complicated example but it's just returning n so that's testing if this number is less than 2 and then it's either going to say it's 1 or do this equation on it so for example if we were to replace this with 0 we can check if that equals 1 so this is still correct because in this case it was n is less than 2 and it'll be given 1 because it was true rather than what would be given if it was false okay um closures so closures are something that we kind of addressed a little bit on and off um closures are the swiss army knife of the groovy nexla programming and simply a closure block uh closure is a block of code that can be passed as an argument to a function um a closure can be also used to define an anonymous function so here for example we have square equals it times it and then we can use assert square dot call 5 equals 25 so it's testing um we are calling square and seeing if 5 equals 5 times 5 equals 25 um here is without the call and we can still test that it equals 81 um so just to show this we can put this in here and this is the the closure that we've created so this is kind of like i said the anonymous function um and we can see that square equals call 5 equals 25 and square 9 equals 81 um because it's true um everything is passed and we don't get any error messages of course this can also be applied um slightly more um complicated way again so we're going to collect um everything in this list so one two three four um we're going to square them all and we can print that to screen and what you'd expect is one four nine to 16 so again all of these are getting squared um we'll use this um sort of expand on this again so by default closures take a single parameter called it to give a different name use the the slash sort of kind of like a little arrow syntax um so we have square and we have num equals num times num so it's taking um in this case the different name it could have been it we've called it num and then we've times it by itself again there so um again you can sort of like create these more um complicated methods um so for example when the method each is applied to a map it can take a closure with two arguments um to pass to which it passes the key value pair um for each entry into the map objective for example um a and b so we're going to print um a or the variable a with value b uh we have a bunch of values here which are maps so ii ii u mark williams and suda kamari um and when we print those we get um printed like this so this is kind of it's getting used like a tuple um so a and b um it's going to be separated out they'd be named separately and then when if you print them um we can use them as um kind of like variables cool um so it's worth noting a closure has two other reported features first it can access and modify variables in the scope where it's defined um the second closure can be defined as an anonymous manner meaning that um it's not given a name and it's defined in the place where it needs to be used and here's an example showing both of these features um where the result is zero and we have the values china one and your two us a three um we've got this key set where each result and value is being used so when we copy and paste this into the browser that we can run this again so remembering that we've defined result um which started off as zero and then it's being added to an equals um the value of it so we're adding in the values from it so the value is specified here so one two and three um and we are adding those together because we've used this key set um and each um that is slightly more advanced so please don't worry about that too much if that doesn't make any sense at all okay um i'll finish this section by saying that um there are many many more resources um so you can follow either of these links um and there's a lot of information here back you can start with groovy in more detail as well as this book um which uh if you're me i'll probably just start off with the website but if it is something you want to deep dive into this is quite a nice resource okay uh so now i'll move on to modularization okay so to finish off today we will look at modularization in more detail so we have been exposed a little bit to modularization already as part of session one and session two um particularly in session two we're talking about modules and sub workflows and how they can be sort of integrated into your scripts using the inner core tooling so uh what we will do now is actually take a little bit of step back and we'll look at the hello dot nf example from earlier um and we'll sort of convert some of its modules that were included as part of the main hello dot nf script into modules stored in a separate modules folder um so what we'll do is we will jump back over to git pod um and we'll reopen the script so again this is the hello dot nf script that we've used um previously um we've got the parameters brought at the top here which is a greeting and we've got these two processes which are split letters and convert to upper the first thing that i will do is i'm going to cut those and remove them from the script make sure you do cut them rather than delete them because we're about to paste them into a different folder a different file rather i'm going to create a modules dot nf file so this has popped up in my browser i'm going to paste it in there knowing as your formatting is needed just make sure that your processes have both ends of the curly brackets i'm going to save that and you can see that it appeared over my explorer what we can do now is include these in our script or in our main script by using the include statements so going back over to uh moduleization over here in the training material we can use these statements here so include split letters from modules and include convert to upper from modules this modules file is one that we've just created um and it is in the same directory as our hello dot nf script um so this does need to be relative to where you are relative to this main script i have now saved this and what we can do is next flow run hello dot nf and what you'll find is that this should run and it'll have the same results as what we had initially um in session one where we split the letters into chunks of six and we converted those to uppercase um capital letters in the second process convert to upper okay so that's really cool um we have now removed our processes from our main script and this has really improved the readability um we have both of these stored as two different processes in the one modules file but because they are in the same file we could also make this a little bit easier on ourselves and actually have both of them in the same include statement so this is just saying include split letters and convert to upper both of them separated by semicolon from this modules file so we can just run that again and you'll see that both of these are still brought in and run successfully great so that's what i've just done here in a 9.1.2 for multiple imports here in 9.1.3 um this is an example of using multiple aliases so in some situations you might want to run a process twice so you might want to use the same tool to do something to two separate files as an example so or as a simple example what we'll do is we're just going to run split letters twice we will use the same channel as an input but for this scenario i want you save or have it as a separate channel for some for some reason so here i am going to ask the split letters process to be run twice as a part of the same workflow and we will see an error so the process for it is sort of being used if you need to use the same component include it with a different name or include it in a different workflow context so this is considered to be the workflow context as we will find out you can have processes included again as a part of a separate process excuse me separate workflow and it can be treated as a sub workflow again we'll explore this very soon but what i can show you is this example here where we use aliases so in this situation we're including split letters as split letters one and two and convert to upper is convert to upper one and two and we are treating both of these or we are using both of these by using the aliases in the workflow so again i'm just going to copy out and paste this in and what you can see is that everything up here in the workflow is going to run each aliases are going to be run once but because these aliases are bringing in the modules convert to upper and split letters twice we do get all of the outputs twice because everything is run in duplicate now okay so we also have output definitions so we're sort of touched on this like a little bit but not in great detail so here is an example of the workflow as it is here is an example that we have basically specified or given each channel a name so the channel odd prems up greeting goes into greeting so channel the output of split letters goes into letters dot channel and so on so on but what we can actually do is explicitly you find outputs from one channel to another using dot out so we can remove the channel definitions completely so here this has been turned into this and that we keep using this dot out rather than actually putting it into a new a new scope excuse me into a new into a new definition so by using this dot out we can show that we can directly substitute i just sort of reverse this and i'm going to replace this here so i'm still bringing in the modules from the modules any folder so bringing in these two processes but here i'm just using dot out rather than creating these channels as shown here okay so i'm just going to save that again we'll just run it so we're only see hello world once because i've removed that second second usage because of those aliases and everything still runs so here's just note as well saying that process that i'm defined if the process defines to my output channels i mean channel could be accessed by indexing the out attribute so out zero versus out one using using the square brackets in our example we only have the the zero output so we can copy that in and only look at the the first or the zero output so in this case we're only see one of the two outputs okay so we're seeing both of them because it's been flattened i guess okay so here's just another little note as well alternatively the output definition allows us to use the emit statement to define a name that can be used to reference the channel on the external scope for example try adding the emit statement to convert to upper in your file so here we're going to add the emit upper so this is naming the output we've talked about this very very briefly already as well so this is going to be back in the modules.nf file we can pop that in there very good couple typos there now when we use the workflow block you see here we've got out dot upper so we're using the named upper which we've given it using the emit we can copy that out and oops paste that in here so again the big difference here is they're actually using the upper to define it or to select that channel yep so we have hello world so I'm still using this channel which is the standard out if we had a separate channel in there you would you might want to name it name it separately with something else there's examples of this here this is again a seminar index unifical module we have the index and the versions being the two channels that are emitted and they've been named index and version respectively so if you were to sort of specify this and your pipeline would be something like salmon underscore index dot out dot index or dot versions okay 9.2.1 as an example of how you can use pipes to join up all of your different sort of the outputs from one channel into another you can pipe them so as an example of that if we're just to replace this out we'll substitute this for our workflow you'll see that just the output of one channel is being piped into another this works here because we don't have too many channels as outputs but if you were doing that you wouldn't need to make sure that they're named and this becomes a little more complicated but for now this this will work because it is a relatively simple pipeline okay so as alluded to earlier we sort of touched on this a little bit in session two we talked about sub-workflows as well you can actually contain one entire workflow inside another so here is an example of a workflow which has been called my pipeline and then there has been executed down here as part of a separate workflow scope so I'm just going to copy this I'm going to just sort of pop it back in here and here I'll show you this again so this workflow has been written we have everything that we've shown previously so the channel being created and the split letters and convert to upper this is still about the dot upper so if you didn't do this as part of your module you'll probably get an error what we can do is we can clear that and just run this again so this entire pipeline is being executed here as a part of my pipeline is being included as a workflow inside the second workflow so you can see here that this is being run and this is a way that if you were trying to include these modules more than once you could sort of specify this doing this my pipeline two and then we can do this and this so now this is an example where you could have multiple workflows multiple my pipelines included in the same in the same workflow or executed as a part of the same workflow so as you can see everything has been executed here twice and I didn't have to set any aliases for my processes okay so with workflow inputs these can be defined using the take statement so for example instead of using this as a part of the healing the prams.greetings a part of the channel.of you can just say take it as a greeting and here so what I can do is just what exactly have I copied okay the entire workflow again this isn't massively different so I still got this two this dot upper so your modules.nf still has this emit again all that's really different here is I'm taking greetings so you're taking greeting here and this is being used in the pipeline so again this is actually a little bit different here as well so this is the named pipeline so we have the workflow pipeline and we actually need to include the prams dot greeting take is just a name this is arbitrary as well this doesn't have to be greeting it could be greet or ing or or any other word that you want to put in there I'm just going to save that and execute it again so next flow run hello world and then this workflow which is named my pipeline that's getting run here the channel of prams dot greeting which is up here that should execute successfully okay much like take you actually call your output something a bit different in the workflow so here you use emit so take an emit these are new words as well as sort of main so main being where we specify the actual code that you're executing so take as much and emit it's like your outputs here we have the named output convert dot out to upper so this is actually what it's going to emit as a part of this workflow so we can copy all of this across and replace it here so this is what I am replacing just the workflow my pipeline because we now have this emit convert to upper we see that the main has shrunk a little bit so we no longer have this as a part of the main sort of script block or main block as we've sort of come to know it has that copied in it has two so we don't need the second one that one can go so again we can just run this and this is just going to have this emitted by the pipeline so we can see that this is the one thing that's being emitted is to convert to upper to upper dot out dot upper and this has been specified here by just using out because this isn't named yep so that's just kind of explaining that in there as well so here's an example as well how you can use just calling named workflows with sort of touch on this a little bit as well so my pipeline one my pipeline two has just been this has been done twice like was shown previously but down here you can use entry flag to say just start at my pipeline one so this would be an example of where you don't want to potentially run you know preprocessing or something like that or some sort of quality checks you just want to sort of jump in halfway down the pipeline for some reason here we have a little bit more about parameters scopes we've sort of done this in some detail already the main thing to remember is that you need this dollar sign to make it the the next flow to make it identify what next flow is the next flow variable we've sort of got this here as well how you can specify say hello and then say hello is from modules and up here we've got the say hello which is bringing in those parameters already okay here's just an example kind of expanding on this as well how you can add parameters kind of used as defaults so here for example if you're to run this I'm just going to put this in hello.nf you've got fill and bar this is getting included from this modules here which we haven't actually added so I need to go back and do that here so this has been included in the modules.nf and this has been defined by saying hello rather than as a module so this is a little bit of an interesting example but what I wanted to show was that you can run this again and this is being included from the modules file and it sort of picked up on these parameters and in this case it's actually overwritten the parameter because of this add prams option which is cool this is probably more advanced usage if you're kind of like a basic user just getting started this is something you might consider down the line but this is probably a slightly more advanced usage cool and that's the end of it so like I said earlier we do have the DSL2 migration notes here previously my next flow immigrated from DSL1 to DSL2 so the language changed a little bit generally everything moving forward is DSL2 so you don't really need to worry about DSL1 anymore unless you're trying to execute or use a particularly old pipeline everything that we've done today is being focused on DSL2 so please don't worry about DSL1 too much with that I think that is the end of today's session as a recap we dug into managing dependencies containers we talked about channels, processes and operators in more detail and gave you more examples we had a brief introduction to groovy because next slide was written on top of groovy and it is a deep domain specific language we finished off by talking about modularisation and how modules can be written outside of your main script and imported using the import include statements so that will finish it off today tomorrow we will talk about the configuration deployment scenarios and next flow tower in greater detail great thanks very much and we'll see you all again tomorrow