 Okay, maybe I'll just get started. Yeah, so my name is Rieke and I'm a PhD student at Cubic, and I got exposed to NFCOR last year-ish. So I'm excited that I now get to give a talk here. So for the past nine months or so, I've been running Sarek a lot, so I'm more involved in running the pipelines than developing them. So hopefully today I can share some pointers and to answer the how questions and how to run them. And either by my own experience or by things that I've learned from other people in the hackathons, but that we don't usually do here. So yeah, I will turn off my video now and then get started if I can. All right, I know that doesn't let me turn off my video, so I'm just going to leave it up now. Okay, first things first, I guess Phil has already shared this, but you don't really have to worry about installing any software that you need for in the pipeline processes itself, but what you do need is to install NextFlow and then one-off Singularity Docker or Condor. And with that, you are set to go. And if you've joined the NextFlow tutorials this morning, then you've already learned how to install NextFlow, I guess. All right, then the next question I would ask myself is how do I get to the pipelines? How can I fetch them? And the good news is you don't need to clone a repository or download them at all. NextFlow is going to do all of this for you. There are two commands, NextFlow pull and NextFlow run. And they will recognize the GitHub repository name. So, and of course, and slash the pipeline name, for example, Sarek or RNAseq, and then clone it into your home directory dot NextFlow. And you can specify which release or which branch you want by using minus R and then, I don't know, 1.3 or so. So to be exactly sure that which release version you used, which is good for reproducing results later on. And usually we do NextFlow pull first just to be extra sure that there's no clashing and you don't accidentally run an older version or a different version of the pipeline that you didn't intend to run. All right, in case you don't have your cluster so on doesn't have access to the internet or so you can also pre download your pipelines. There's NFCore tools which helps you do this. So you just can type NFCore download and then your pipeline and this will download the entire pipeline for you. It will also download all kinds of conflicts. You can specify, for example, that you want a singularity container or a Docker container and then we'll also bundle everything for you into a tar or a zip file and so on. So you can then easily transfer it over to wherever you actually want to run your pipelines. Yeah, next thing is how can you configure your pipelines to run and there's a large number of conflicts. There are three, the default or base config, core profiles and institutional profiles which kind of come with a pipeline. And then you can also have local config files to specify specific parameters. And I will go over all these conflicts now. So the default base config is stored in a folder called conf in each individual pipeline repository. It is automatically loaded when you run the pipeline and it has like sensible default resource allocation so how much memory or how many cores or so on this specific process needs to run. It doesn't specify any software packaging and it also can't do any job submissions to tackle that. To get the software packaging, you can then specify core profiles. These are for example, singularity, Docker or Condor and you can just set them by typing minus profile and then this specific software packaging you'd like. And also each pipeline comes with a test profile which runs with a teeny tiny test data set to check whether your pipeline runs at all on your system. So you can just type minus profile test comma singularity for example, to execute the test run with a singularity. Then next, there come institutional profiles. So if you're lucky, somebody at your institute has already run enough core pipelines and they written an institutional config that you can find on GitHub in the configs repository and this specifies the sub job submission. It can also specify the software packaging if you want everyone using your system for example, to use singularity and they work for all the pipelines for all the users in your system. So, and you only have a single point to update. So if you want to set them up, you can probably, you can run all the pipelines without having to worry about this anymore. And then there's one more thing you can also have institutional pipeline specific profiles. So some pipelines need special settings but everyone who uses these pipelines on your system will need to set these specific parameters for the pipeline so you can also specify those to not having to worry about this every time you want to run Zarek or something. All right, then there are local config files. So sometimes you may want to run a pipeline and for whichever reason your data is so big that it doesn't work with the pre-allocated or pre-determined resources. So in this case, you can also have a local config where you can specify your resource requirements for your specific project and you can add it to your run by specifying minus lower case C and then your config. And the lower case is important. There's also an upper case C which overrides all the previous configs and the lower case adds your config to the existing one. So you still get, for example, the core profiles to use them. And in case you don't have an institutional profile yet you can here also specify your scheduler and so on. And then after you figured out all the right settings be sure to add it to the end of core config so you can reuse it anytime you want and everyone else also can reuse it and doesn't have to go through the same process as well. All right, then there's something like a personal config. So in your home directory there's a folder called that next flow and which is a config where you can, for example, set your email address or for next flow tower, your access token. And this will also be run every time you execute that pipeline and then you, for example, get an email on success or fail for the pipeline. And then last but not least, there's the parameters file. So you can either specify all your parameters on the command line by typing minus input, minus output, minus some specific parameter for the pipeline but you can also save them all into one JSON or YAML file and this is great for rerunning your pipeline later on because you just have this one file that you can, I don't know, store with your results and so on and you don't have to go through to try and determine which parameters you used. This is also possible but more work in my opinion. And there's also a helper tool from tools called end of core launch. And I guess Phil will also talk a little bit about this on Thursday in his tools talk. Okay, if you now got too many configs in your mind don't worry, for most of them you don't have to worry or you only have to worry about them once. So the default and core configs, they are already set up and you just need to use them. The institutional and personal config you may need to set up once and then probably don't have to worry about them or maybe just update them in case something changes. And then the local configs so the one that contains your project specific resources you hopefully won't need to set up but in case your data is somewhat special you may need to set it up and parameters file. Of course, you always need to add your input parameters if you use this option. Okay, I already mentioned this briefly but I just wanna reiterate this. So each pipeline comes with a test. So you can use this test to figure out does your pipeline work? Is it me or is it the data is it the pipeline that doesn't work? And these test data sets are really, really tiny so you should probably be able to run it on your laptop even. They also work with GitHub Actions with very restrictive limitations on resources. Yeah, and you can specify them by using minus profile tests. And then for example, use your institutional config or Condor, Docker, Singularity, whichever you prefer. Okay, now how to actually run the pipeline? You've got your configs, you test it whether it runs. So you type next flow run and of course your pipeline you specify the profile just as before and you can add now either the parameters file or your specific parameters. And then you can, for example, run all of this in a normal terminal session. The problem here is once you close the terminal the next flow run or the name of core run quits which is not useful for having runs that take days. So instead, you can also submit next flow with the minus biggie tag or you can open a screen session or some can also, some HPC classes can also allow you to submit it via an interactive session. Just choose something where you can, that you can close and come back to later otherwise your progress may be gone. Each pipeline comes with extensive documentation which will help you choose your input parameters and not just the input parameters are documented but also all the output parameters. And in this case, for example, Sarek they are also information about specific tools here for us cut in Centenion, for example. So definitely take a look at the documentation. It's very useful and very detailed usually. Otherwise you could also type next flow run and then minus, minus help and you will get all the parameters printed to take a look. Okay, so I guess now you've started your pipeline. So next question is what's your pipeline doing while you walked away? Is it running? Did it fail? What happened on one way to figure this out as you could, for example, lock back into your screen session and there you see that I don't know two out of two pro processes have completed and 117 out of 133. So you're still waiting for 16 of them. The other option is that you set up email notification on success or fail. This is documented in the next flow docs. So whenever something happens with your pipeline and exits, you get an email basically telling you what happened and it will also display the error message which is useful. Or you could set up next flow tower which gives you this nice visualization where you can see what's happening to your pipeline and you can also see all your previous runs which can sometimes be useful. Okay. So now your pipeline ran but it unfortunately failed and something went wrong. What do you do now? So you get an error message either via email or also just in the terminal that gives you a hint. It provides you an error message. Sometimes they are not that useful and you may need to navigate into the work directory. So you get the work directory in which the process failed down here. You can navigate into it and then there are a bunch of dot command files displayed. So definitely make sure you type LS minus A or to also show your hidden files and then you get error messages for example and the command lock which can be useful to take a look at to get a first study about what went wrong. And also there's the dot next flow lock which also contains information that could help you figure out your specific issue. If this didn't help then maybe take a look at the documentation again, maybe just oversaw something or definitely always ask for help. There's a Slack channel for each individual pipeline called the pipeline name. If you haven't joined Slack yet, you can go to the NF Core website and slash join where you can send out an invite. Okay. So you now have figured out your issues with your pipeline and you want to run it again but you've also already computed data for five days and you really don't want to recompute all of the stuff that has already been computed. And next flow saves you here by having the minus resume tag. Just adding this to the run in the same work directory will resume this pipeline but maybe you don't want to resume the last run you've had but the second to last run you've had. And in this case you can type minus resume and add the run name or the session ID and you can get this by using next flow log which will display all the runs you've had in this specific workplace, work directory. There's also block entry going into a little bit more detail. So if you want to use this this may be useful resource to get more detailed information. Okay. And so now your pipeline has had errors resumed it and it has finished what to do next. So there's this work directory I've been talking about and this can get quite huge. So you should probably remove it and don't worry the results file are all copied over into the results directory you specified so you won't lose any of your work but only do this once you're sure you don't want to resume any runs. There's also next flow clean. You can get some help on how to use it by typing next for clean minus H. Here you can delete the entire work directory also parts of it which can sometimes be useful when you use resume a lot the work directory can explode. And so you might not want to keep all the intermediate files from just from all the runs you've done but only from the most recent. Yeah. I've brought a tiny example by running the test profile on Sarek. So I create a clean work directory, open a screen session and then pull the new Sarek release and run it with a test profile and CFC which is our cluster. And this now runs for a while and this is heavily sped up because I think the Sarek test here takes around 15 minutes or so and it tests some of the standard processes here in this particular case creates the multi QC file now completed successfully. So the test was successful and now I have the next flow lock in there the results file and the work directory. I can remove the work directory as I don't need it anymore. When I go into the results file I can see the test results that Sarek generated. So Sarek apparently tests fine on our system. Okay, to summarize to run a pipeline create a clean workspace open a screen session or something else pull the pipeline you want to run then run the pipeline with the right version your institutional profile or any of the others your local configs, your parameter files and once you've done remove the work directory look at the results and be excited about that it finished. Yeah, some sources or resources so these slides are mostly based on the usage tutorial from the website. Take a look for more detailed information also on some other aspects and then also on the tips and tricks talk that Phil gave in the last hackathon which also has some more advanced information or tips and tricks in it. Yeah, thank you for your attention.