 Hi everyone, welcome to session three of the Nextflow and NFCore foundational training. My name is Chris and I'm one of the other developer advocates at SecuraLabs and I'll be the one taking you through the session today. We will start the session by digging into the NFCore website where I'll point out some of the key places that you might expect to find some essential information for either deploying or developing your own pipelines. As well as that we will dig into the NFCore tooling and you'll learn how to sort of use NFCore pipelines and also how to customise your execution. We will then jump over to NFCore tooling for developing pipelines where I'll show you some of the sort of essential commands that you might expect to use if you are developing your pipeline by using some of the NFCore tooling as well as the modules and sub-workfiles and other things that are available to you as a developer. I do want to point out that we will cover a lot of information really quickly today so a lot of what I will cover will be kind of a superficial view and we'll sort of skim over it quite quickly. I also might jump around a little bit with sort of different ideas and concepts. We might take a few tangents as we're sort of going through some of this material. What I would advise is sort of try and sort of follow along the best you can through the session today and if anything doesn't make sense, please do yell out in the Slack channel. We can obviously explain it there. There's also a lot of documentation on the website as I will point out soon. You better find a lot more information to build on top of what we're going to show you today. So we're looking forward to it and let's jump over to the NFCore website to get started. OK, so here I am on the NFCore website. As a reminder, this is NF-co.re. If you type in NFCore community online and your favorite search engine, you should be able to find this pretty easily as well. What I would like to do is sort of showing you a slide deck is actually just work through this web page and show you where you can find different pieces of information for yourself. I think a slide deck could have been quite nice to sort of like highlight all of this, but at the same time when you sort of come back to do this yourself or if you're looking for a specific piece of information having some familiarity with it now, I think is more beneficial. What you might notice is what you're seeing on my screen versus what you might be seeing on your screen is a little bit different. This is most likely because I have recorded this slightly before the actual training event. This is only because then I can be around on Slack at the same time to try and help answer any questions. Great. OK, so what I'm going to do is sort of scroll down this home page and show you a few different sections of it and then sort of work through these different sections of the web page along the top here. So first, what you'll see here is NFCore logo with a description of what NFCore kind of started off as, which is a community effort to collect a curated set of analysis pipelines built using NextFlow. While I think this is where the NFCore community started, I also think it's much, much more than this now with all the modules and sub-workflows and tooling, as well as documentation. I think the community is so much more than just a repository for pipelines and I hope that by the end of this you'll agree with me as well. Scrolling down a little bit further, there are some of the features of these pipelines, which I think are important to at least be familiar with. So all of these pipelines come with documentation. This is extensive and it should cover the installation usage of descriptions of the outputs, all the files that you might need to use and also what you get out of the pipeline at the end. Documentation is really important and is a standard for any NFCore pipeline to have this extensive documentation because it is important that you know what you're doing and what you will be expecting at the back end of the pipeline as well. All of the pipelines need to be portable and fully reproducible. So all the pipelines follow best practice to ensure maximum portability and reproducibility. This is because they are exceptionally well tested and easy to run. Each pipeline is released as a stable release. So they will have tags, meaning that you can always go back and find these on GitHub, no matter what version you might have run in the past. This kind of feeds back into portability and reproducibility. So if you go away, run a pipeline now, then realize that you need to run it again, you'll be able to go back and find this code online. It will never go away. Each pipeline is extensively tested, so every time a new push or commit is made to the NFCore repository, it'll go through CI testing. So it'll go through a series of tests using test data as well as other sort of limiting measures to make sure that nothing is broken and that everything that you would expect the pipeline to be able to do is still functional. It also makes sure that all of the code is in the right format and everything is where it is supposed to be. Again, meaning that these are sort of reproducible over time and that those changes that might be made by the community don't impact you or your pipeline in the future. All of these pipelines are cloud-ready, so they're all tested on major releases on AWS as well as some other platforms. This is just so that we are sure that these can sort of work at scale. These aren't only being tested on smaller datasets. You can also go online to find what these results look like from these larger tests. All of the software that is used as a part of NFCore Pipelines is packaged. So by default, all these pipelines will be able to work with Docker, Syncleric, Condor, as well as other sort of software managers. You should not need to install any software yourself. If you have NFCore and NFCore installed because it has, excuse me, because NFCore has these integrations, it should be able to go away and manage all of this for you. You don't need to worry about installing every single tool as a part of this pipeline yourself. All of that is already done. Something I sort of touched on a little bit earlier is that this isn't just another sort of repository or registry for pipelines. It's a little bit more than that. So with all of these pipelines, there's a few sort of ethos that people follow. The first of which is that you develop with the community. So the idea is that you can join the Slack or sort of engage on GitHub and sort of work with the community to develop great pipelines. You don't have to do this in isolation with NextFlow being able to work with things like Git and Docker. You can work on this remotely from anywhere in the world and you can contribute to this as a part of the community. You don't have to sort of go away and worry about every little thing yourself. You can become a part of the community, work with the community and make something together that's really, really, really great. All of these pipelines start from a template. This is actually really important. So it's gonna rely on the NFCALL tooling to sort of create a template. As a part of this, we do have template patches that come out occasionally and this just helps keep all of the pipelines up to date and up to best practices. So if there are new features or new structures that need to be introduced to the pipelines by deciding from this template and being able to apply these template patches is really important. Moving away from this, if you are developing your own pipeline as well, we'll create a lot of headaches down the line. So as a piece of advice, start from the template, apply the template patches, especially if you are wanting to integrate fully or make use of all the NFCALL, sort of module sub-workflows tooling. This will save you a lot of time and effort in the long run. NFCALL also has this idea of collaborate, don't duplicate. So what you'll find is that there is sort of one main pipeline for anyone's purpose. The idea again is that you develop with the community so that you don't sort of charge off in your own direction and create another pipeline that already exists. If you were trying to sort of submit your pipeline to NFCALL and there's already a pipeline that exists in the same sort of area or on the same topic or dealing with the same data in the same way, you're most likely to be told that, hey, look, great, how about you add your features to this other pipeline that already exists that we can make one awesome rather than divide efforts and then sort of divide the amount of time that we have available to sort of maintain and upkeep these pipelines together. Down the bottom of this webpage, you kind of get these sort of main highlights of what's happening in the community. Over here on the left, we have a little bit of an introduction to NFCALL. This again is another sort of description of what the NFCALL community is and how it's used. This video is probably getting a little bit out of date, but it's still a great resource if you do want to learn more about the community and what it's up to. Here we have some information about upcoming events. Again, you've got to find some of the events tab, but you'll see here that we just would have the upcoming sort of major events happening in the community. Every week, we have these bite-sized seminars which is on a 15-minute tour, some particular topic, a particular pipeline, are a great resource if you're interested in learning more about a specific topic. We also list these sort of major events. So for example, here we have Hackathon coming up in Boston in November and down the bottom here, of course, there's another bite-size. Over to the right, you'll see that we have the latest bite-size video there as well, as well as a search bar so that you can go away and look for a particular bite-size video or topic. Down the bottom, we have just this basic information about how you can install NFCore by yourself. What we'll be doing today will be using the GitPy environment again, so this is already installed for you, but if you are trying to do this on your local device or your own system, here are some sort of basic commands that you might need to use to install this for yourself. Again, more information down the bottom here about some of the places that this is used and some other links to sort of introduce you to more elements of the NFCore community. Great, so what I'm gonna do now is just jump to Pipeline. So as you scroll back up to the top and move along to Pipelines. This is a list of all the Pipelines that are available as a part of the NFCore community. You'll see that at the moment there are 86 different Pipelines. However, 53 of these are released, 21 are under development and 12 have been archived. The reason a Pipeline might be archived is because it's kind of been superseded by another Pipeline or it's been integrated into another Pipeline. This isn't done sort of lightly, but you can imagine that, for example, if you developed a whole exo-sequence in Pipeline and then that was kind of merged into the whole genome sequence in Pipeline to Sarac. So because we're both kind of doing the same thing, the idea is that we sort of put them together so that we don't have to maintain two different Pipelines. If you want to look for any particular Pipeline, of course, there's a search bar that you can type in a sort of a key word or phrase and all the Pipelines that sort of meet that will come up. So for example, if you type in RNA, you'll see all the different Pipelines that have capacity to deal with RNA data. Similarly, if you look for DNA or a single cell, these things do come up if you search them, which is really cool. If you were to click on any one of these Pipelines, it will take you to a Pipeline-specific page. Here, for example, I'm gonna click on the NFQ RNA-seq Pipelines. This has taken me to the most recent release of the RNA-seq Pipeline. Each Pipeline has its own set of information that you might be interested in looking at. What you'll see is it has its own sort of toolbar along the top here, starting off with the introduction. The introduction is really like a highly well-overview of what the Pipeline can do, pointing out key features and how it all fits together. Most Pipelines now will have a subway map to kind of visualize what this Pipeline is doing and the different kind of tracks that you might take or your data might take as it passes around. For example, here, we have different methods as different tracks going all the way around these different stages of the RNA-seq Pipeline. Underneath, there is a list of all the different sort of parts of this Pipeline and the tools that are used as a part of that, as well as some information about the usage, how you might need to construct a sample sheet to be passed into the Pipeline, as well as some information about the output videos, as well as credits to people that have made contributions to this Pipeline. If you are more interested in how to use this Pipeline, there is also extensive information about the usage. Here, for example, you can see that all these different sections have been broken down, so you have the sample sheet input, the information about the adapter trimming, different alignment options, quantification, reference genomes. All of this is explained in detail, and this is what I was referring to when I talked about the extensive documentation that is available as a part of all of these Pipelines. While I won't go into this here, if there is any particular Pipeline that you're interested in, you can open this up, have a browse, look for the information that might be relevant to you. All of this is reviewed as a part of the review process when a Pipeline is being updated. So it is kind of viewed by the community and kept in check that this is relevant and important information for the execution of the Pipeline. Moving along to parameters. So parameters, as described in the previous sections, are important. These are the ones that you can sort of modify in the command line as different flags. You'll see here that there's extensive information about each of the Pipelines, excuse me, each of the parameters. You can see here what the parameter is, if it's required or not, you have options over to the left here to open up help text to give you more information about this particular parameter as well. Again, all the parameters of Pipeline as a part of the NFCOR documentation and kind of the best practices needs to be described here. So there shouldn't be any sort of hidden or surprise parameters in the Pipeline. You should be able to sort of find information about all of this in the Pipelines as you scroll through. While I won't go into all of this in the Pipeline, after this particular Pipeline here, you can see that there is extensive information and it'll be up to you about how you want to sort of tune in these as a part of your Pipeline. The next tab along is of course the Outputs. So these are the Outputs from the Pipeline. This is what you would expect to receive in the Outputs in your local directories when you execute this Pipeline. It also gives you an idea of how this might be structured as your Outputs. The cool thing about this again is that you've got descriptions about what they might be, some information that links back to the usage documentation. All of this is designed so that you can mind your what you're doing and what the Outputs might be. While it will be up to you about what parameters you want to use, and this will sort of dictate what you expect to see at the back end here. I do really recommend viewing this as you are testing or kind of trialing a Pipeline for the first time. So it does give you a good idea of what you're doing and what your Outputs might look like, which of course is probably what you're most interested in at the end of the day. Moving along to Results. These are the AWS Results that I sort of mentioned earlier. These are the Outputs from the Pipeline as it was executed on, excuse me, using the full size test data. All of this is really good. You can dig in here and sort of download and look at these files for yourself if you're interested in seeing what they look like or running sort of any benchmarks about what you might expect from these files. Finally, we have some release information. Of course, this gives you kind of a change log of what's happened over time. This is quite important if you want to sort of check to see when a tool is introduced and if there's anything that's kind of changed since you might have last viewed it. All of this is, again, a part of the best practices so that you can always go back and find a Pipeline and view the different changes that have happened. What you might have noticed is that we also have this sort of marker over here, this tab that you can open up and scroll back to any previous Pipeline implementation. So the idea here is, again, as I mentioned earlier, if you have run a Pipeline previously and you think, I know I need to run another five samples or whatever the situation might be, this code is always going to be available. You can use a revision flag to access this in the command line. But most importantly, this code never goes away. The documentation doesn't go away. You'll be able to go back and find this and use it again at any point. So you don't need to worry about the Pipeline sort of evolving away in a different direction. You'll always be able to roll back and use the previous versions. So that is Pipelines. What I will do next is sort of move along to these resources. So here we have Configs and Tools and down here we have Components. We will start off looking at the Configs and Tools. What we have here is a list of all of the different Configs that are available as part of the NFCore config repository. So something that won't have been described in great detail yet is that you can also submit your config files to NFCore and this will make these configs available to everyone everywhere if they are trying to execute the Pipeline. These are loaded by default when you execute a NFCore Pipeline. Of course, while most of this won't matter to you if you're running on a shared system with others especially locally in your own local infrastructure it can be a nice idea to actually create one of these. If you dig into these in detail you might find that you'll find some really nice settings or some important information that might be relevant to you on your local system. I find this is quite a nice place to go and look for information about how I should be structuring my own configuration files if I'm deploying it on a different system that I'm less familiar with. So here for example, this is a config file for the Cric. This can be included as a profile by default so you don't need to load anything special. This will be loaded by NFCore and next as you execute the NFCore Pipeline but you can see here that it's got some information about the executor that they are using as well as some of the parameters for the different memory usage that you're using when executing this. There of course is more information about this under the documentation which you'll be able to sort of flip past as well as we move along these different sections. At the same time, again under here on the resources under Configure Tools we have these tools. This is where all of the tooling that is a part of NFCore is described. So something that hasn't been discussed in great detail yet is that NFCore also has its own set of tooling. So much like Nextflow, there is the NFCore tooling which you need to install independently. This is something we will do very shortly after we've moved through the web page. We'll look at all of this but if you are looking for specific information about the NFCore tools, the different functions, you can click on here to open up more information about these. Each part of the tooling is described in great detail. Most of them have these really nice screenshots which you'll be able to see. All of this again is available and kept up to date by the community. It has some really nice sort of details that if you are unfamiliar with this, this is a really great place to start. Down here under Components, so moving further down this sort of resource tab, we have Modules and Subwork Flows. Modules are essentially wrappers around the different processes that might be interesting to you from the NFCore community. So if you are developing a pipeline with the NFCore tool and you have the option to install Modules and Subwork Flows, well that can sort of rewrite these yourself. Again, this is something that we will do and when we talk about the NFCore tools for developers. What I wanted to show you though is that we have 997 different modules that are currently available part of NFCore. These will be contributed to by the community. As I said, these are essentially wrappers around different processes but what you can do is you can jump in here and look at these. You can see how to install it using the NFCore tooling. It has a description of what this tooling does and what you'd expect to use as inputs and what you'd expect to have as outputs as well as a little information about the tool itself down the bottom there. If you are interested in this in more detail, you can sort of jump in here and look at the actual code itself. You can see that this is just a typical process block that has been included. You have some information here, all of the Singularity Docker Conder information at the top there, inputs, the outputs, when statements, the script block down the bottom here as well. All of this of course is version controlled as well. So again, previous versions of all the different modules won't disappear. You can still rely on those in the future as well. It's not like this will sort of evolve away without you be able to access the older version. These are effectively pinned in your pipeline using the NFCore tooling as well. This of course is just one example that there are 997, by the time you might view this, people could be and most likely will be more. But you can see that all of these are listed and if you are looking for any particular tool you can sort of search for it. Of course, if it isn't available and you want to sort of create and attribute this as well, that is always greatly integrated. What is quite nice as well, this is a relatively new feature of this as well. You can see that this has been included in different pipelines. So you can see how this has been implemented in a pipeline and this can be great inspiration if you are trying to sort of implement this in your own pipeline as well. So these are modules. Modules are generally sort of a single process or execution of a single tool. Generally, I would suggest that these are all kind of singular or very like modular single piece. Occasionally you'll find that there will be multiple tools included in one module. This is when it makes sense for like memory usage, for example. But what you will probably know or very likely know is that occasionally tools are generally used in combination. So you expect to see two or three tools basically string together and used as a combination very frequently. And this is where sub-workflows come in. So sub-workflows are effectively the accumulation of multiple different modules that are used in combination frequently. These are all kind of named using a standard naming convention. If you are looking for anything particular, of course you can search for it. But what you will see is that if you sort of jump in here, for example, so you can see that bam, mark duplicate SAM tools, you can open this up. You can see again what you'd expect as inputs and outputs as well as a bit of a description about what this pipeline does. I always find it useful to go and look at the code base itself. You can see all of these tools are being included much like you would in a normal sort of main.nf file or workflow file. You can see what it takes the main sort of workflow block, how all these tools are executed as well as what's being emitted out the back. This again is essentially a wrapper but instead of around a single tool it's around multiple sort of modules in the sense that are kind of used together in combination. So much like a module which you would include using the install capabilities of the nf core tooling. In this instance, you can install multiple, excuse me, multiple modules at the same time as a sub workflow. Again, these are all very controlled over time so you don't need to worry about these evolving away in a different direction. Okay, great. I just want to highlight here as well. We will come back to this and then we'll demonstrate how to use this as a part of developing your own pipeline as well. What I'm doing here is just kind of moving through this quite quickly to sort of show you where you can find different pieces of information that might be relevant to you as you're developing your own pipeline. Great, moving along documentation. Again, I think this is a real sort of asset, a real highlight of the nf core community is that everything that is available here is highly documented. This is of course is all sort of community contributed. So some things might be word a little bit different or you might spot the occasional thing that might not be sort of standardized but generally this is all very, very good and the community kind of keeps itself in check by reviewing this as it's submitted to the web page. What you'll find much like this training session is that we've got some information here about usage as well as contributing. This is very much the same as what you might expect to sort of use as a user of nf core as well as a developer of nf core and next-door pipelines using the nf core tooling. For the usage, we have of course things like getting started, the installation, data management, pipeline configuration, reference genome is running offline, troubleshooting, all of these things that are relevant as a user. I do want to highlight here that we have information about installation. You would have seen this at the bottom of the initial sort of homepage that we've already shown. What I think is probably worth noting here is that we have the iGenomes. So with the nf core pipelines, most of the pipelines, if not all of them, utilize the iGenomes. So essentially what this is is a registry from nf core based on the Illumina iGenomes that stores all of the genomes for nf core in a standardized way. If you use a genome flag as a part of executing a nf core pipeline, again, we'll come back to this in more detail. You can specify a name for a genome and by default, the pipeline will download this for you and store it for you locally. Of course, we have some advice here about sort of downloading this once and having it stored on your local device. So you don't have to sort of download and use this each time you're trying to execute a pipeline. But a lot of people don't realize this is that you can sort of automatically download and store this using the nf core tooling, which is a great asset that you don't have to worry about downloading and installing and making sure that these files are kept up to date locally, nf core can help you do this. This of course is especially important if you're running offline, you don't have access to the internet to download and do this automatically. What I would suggest is sort of using the tooling that is available as part of nf core. I mean, the AWSI genomes that nf core helps to maintain you can sort of download this by yourself and have it stored off to the side and then just point to it. And if it's in the right structure, then it will just sort of work as if you bought it straight from online. Much like what I'm doing today, if you want more information about sort of working through how to use nf core pipelines, there are some additional tutorials here down under user tutorials. Contributing to nf core, of course it's called Pulsary, it can primarily be a user, but you might also want to use nf core tooling to develop your own pipelines. So while this is called contributing, a lot of the stuff in here does, excuse me, apply if you are also using, excuse me, if you are also developing your own pipeline. There's a lot of information here about what you might do if you're trying to add your own pipeline to nf core, if you're trying to add modules, if you're trying to add sub-workflows. Of course, there are also more tutorials about how you can be a developer or contribute as a developer. There's also some really nice guidelines and ideas here for different tools that you might be interested in as you are developing your own pipeline, such as using Gitpod, which would have been sort of recommending the use of for this training workshop, as well as different sort of guidelines and guidelines about test data and accessing those test data as part of, excuse me, developing your own pipeline. So this is generally everything that is available. We don't have time to dig into all of this today, but if you are interested in anything in particular, there are really great resources on here. I would recommend jumping on here and having a bit of a look around, depending on if you want to be a user or contribute a state developer. Okay, so that is really the documentation. What I will do now is to keep things moving as jump along to immense. As already mentioned, we have these bite-sized seminars. These come up weekly, apart from sort of major breaks, so they're European summer break as well as over Christmas normally. These are almost weekly and they might be sort of to describe particular sort of ideas or concepts or pipelines or tooling, things that are happening in the community. There's lots of really great sort of resources on here and again, everything that we do in the NFCALL community, we try to make it open and available. So you can always go back and find these on YouTube. So for example, if you want to learn more about the NFCALL MagPipe mind, you can jump back here and look at this video, which is on YouTube. And we also have the transcripts there if you need to look at those as well. What is probably less obvious is that we also have information about the hackathons, other talks as well as training. So you'll see here to have information about the training that is coming up. So events like this as well as your hands-on training and advanced training that is coming up in the future. We also have some information here about the hackathons. This is a loading property. So there might be a little bit of a bug in the website at the moment. But what you'll find is that we have information about the upcoming hackathons. So the hackathons happen twice a year and it's when the community gets together either sort of hybrid in person or online completely. All of this kind of depends on the specific events and this changes over time. But even if you are sitting somewhere else in the world you can access this quite easily in the gather town platform. If something you're interested in please do keep an eye on this we'll reach out if you have any questions. Okay, so this is working up here but these links aren't working very well. So just to recap that if you want to look at any specific events you can sort of click on these here at the moment and find out more information. Hackathon here for example which is happening in Barcelona prior to the next first summit on October 16th to 18th. Finally, there's just a little bit of about here about the NFCOR community, about what it was where it comes from, how it's governed, all the different contributing contributors the code of conduct. As well as some other initiatives we have the mentorship program which we're just finishing the third round publications and statistics. What I think is probably most important here is how you can join NFCOR. So NFCOR is available on multiple different platforms. I hope that if you're attending this live that you've already joined the Slack and you've been able to ask some questions in the dedicated NFCOR Slack channel for this event. If you are joining this later we have a Slack channel that you can join for free and what you can do is sort of just jump on there and ask me questions about the NFCOR sort of pipeline installing community. All of this is here through the Slack. We also have the GitHub organization, Twitter, Macedon and YouTube channel. Again, the YouTube channel is a great resource. What I will just highlight very quickly as well is that there is of course the NFCOR Slack which I've just mentioned but there is also a next flow Slack which if they're not directly related to the NFCOR pipeline you can join a Slack channel as well. So this might be an example if it's not, how do I run SARAC? How do I run RNAC? I'm getting this error because of this pipeline and it's more kind of about the developmental execution of a pipeline that isn't necessarily NFCOR. The next slide might be a more appropriate place to you. If you sort of posted the wrong channel and someone says, hey, look, try over there. That's everyone trying to help you. It's not kind of a you shouldn't have posted that here. It's just kind of say, hey, look, you'll get more attention if you post this in the right channel. So please don't be afraid just to kind of stick your hand up and say, hey, I don't know where this goes. You know, I've got this question. Where can I ask this? Someone will be sure to jump in and point you in the right direction. Okay, so that's the website. I have moved through this quite quickly and I'll probably reference a couple of things that we haven't sort of discussed in great detail yet, particularly around things like the modules and sub-workflows and some of the tooling. Not to fear though, we are going to sort of come back and look at this in more detail coming up. I'm going to take a quick sort of 30 second break to grab a glass of water. In the meantime, feel free to do the same. If you're watching this on YouTube, you're very welcome to sort of pause it and come back when you're ready as well. Anyway, I'll be back in a second and we all carry on. Okay, so for the next wee part of this workshop, the session, we'll be looking at how you can use any of core pipelines. We'll be using the RNA-seq pipeline as example and we'll start to execute this from our system in a training environment and then look at some of the any core tooling that you can use to help you really sort of customize the execution of this pipeline for yourself. To do this, what I would like everyone to do is go over to the NextFlow training website, which is over here. If you haven't been attending the previous two sessions of this training workshop series, this is a website that has some basic training material that you can access at any time. As a part of this, we have this button here which will access that Gitpod environment. If you have a GitHub account, about access to space as a virtual environment for up to 50 hours per month for free. The cool thing about this is that if you just click on this button here, it'll open up this dashboard and give you the option to create a new environment. If you haven't done this before, you will need a GitHub account to access this for the first time and there will be a few steps you need to go through to actually create your account. However, today, as a web disclaimer, I have tried to pre-record some of this content. As a part of that, I found a couple of bugs in the code caused by some new developments in the core tooling and a website. Because of this, what I've actually done is created a Dev environment using the Dev version of the NFCore tooling which fixes a couple of these bugs. By the time this training video goes out, hopefully I will have already installed this into the Git repository that has that Gitpod environment so you won't need to worry about this slight difference. If you are watching this video at a later time, you might notice some differences in exactly what the code is saying at different points, but the main concept should stay the same. Just for reference, the biggest difference is that I have loaded up this Dev version of the NFCore tooling and this sort of side browser over here. So please don't worry about that as a difference. If you are opening up your environment, it might just take a little bit longer. In the meantime, I will just sort of like clear up my environment and show you what I would like you to do as well. So here we are in the training environment. As you'll see that when you open this up, you land in this directory here with all the NF training material. However, we don't actually want to be in this particular directory for the training, mostly because we already have this nextflow.config file here. As you might remember, whenever you execute a pipeline in Nextflow, it will look in your working directory for the nextflow.config file and it will be loaded by default. We don't want this because there will be some sort of parameters or configuration in here, which might sort of conflict with what we're trying to do today. This is a good reminder that you should always be aware of what else is in your directory when you're using Nextflow because there can be conflicts created by previous executions, such as having a Nextflow.config file. To combat this, I'm just going to move back one directory and then I'm going to make a new directory, which I'm just going to call workshop. You don't need to call it workshop. You can call it anything you want. We're just going to move into that workshop as well, workshop directory as well. The main thing here is that we just move into a new directory that doesn't have the other files in it. At the same time, I will change my Explorer view by clicking on these three buttons in the top left-hand corner of my browser. I will go to File and then Open Recent and then go to this github workshop folder. What that'll do is just update my Explorer over to the side here so you'll see exactly what I'm doing and all the files that I create at different points of this sort of material will render over to the left there. Okay, great. The first thing I wanted to do is check that I have Nextflow and NFCore installs in this environment. I can do this by just typing in Nextflow and you'll see that it loads up with all the options and commands. I can check which version of Nextflow I'm using here. I'm using minus V, 23.04.1. At the same time, I wanted to show you a version of the NFCore piece. What version of NFCore I'm using, which is this 2.10.dev. Again, by the time you come to watch, this might be updated. Please don't worry, the idea is just that this version has a few bug fixes which were unavailable in the current release for this training environment. So that's great. At the same time, I'm just gonna show you that there hasn't been any Nextflow pipelines executed in this environment before, so I can just go to Nextflow list. So this is a function that'll print all of the workflows that have been pulled into the Nextflow hidden cache asset directory. I will come back to this in just a moment as well. Okay, great. So what I wanna do here is to show you some of the NFCore tooling to start with. So the first function I wanted to show you is this NFCore list. This is one of the sort of NFCore tooling pieces of tooling for users. And what this will do is it'll query the NFCore repository and list all the pipelines that exist there. So again, there's a little bit of a bug in here which hasn't been fixed quite yet, which is why it's showing dev release for all of these. But what you would expect to see is this pipeline name, the number of stars, the latest release. So you might see some master and dev all the way through here. When it was released, when it was last pulled by you and if you have the latest release. Again, some of this isn't quite right at the moment because there are a couple of bugs that still need to be ironed out, but we can still sort of show the main functionality of this. So what I want to do is show you that if you were to pull and next flow pipeline, next flow pull. So I'm going to pull this from the NFCore repository. And as I said earlier, we will be using the RNAseq pipeline as an example. So next flow pull, NFCore RNAseq. And what that'll do is just pull this pipeline down into a hidden next flow file folder in my home directory. So what you'll see is that this is getting checked and now it's being downloaded. This hasn't appeared over here in my workshop directory, which I've just created. So you'll see that there hasn't been anything downloaded here. This is a little bit different what you might expect from a Git clone. However, at the same time, if I was to go to next flow list, you'll see that I now have this NFCore RNAseq pipeline listed. So this is showing that it has been pulled into my hidden next flow folder. And at the same time, if I was to do NFCore list, it has also been pulled. And I was saying here, I do not have the latest release, but that's actually a bit of an error because this is the latest release that has been pulled by default. Sometimes you might also want to use a different version of the pipeline. And this is quite a common thing. You might find that the pipeline has since been developed or there's a new version released, but you want to rerun a previous pipeline version because we already used on some previous samples. What you can do is you can check what versions of the pipeline are available. Hopefully you already know what pipeline you are after. So I'm just going to clear that screen so I can work at the top a little higher up the screen. But you can go next flow. So this is a command that will query the Git repository for a particular pipeline. And this will print out all the different versions, all the different tags for that particular pipeline. So you can see here that we have all these different branches, which are our development for the pipeline, as well as a number of different tags for all the different pipeline releases for RNA-seq. Arbitrarily, let's just say 1.0. We will try and reuse this 1.0 version of the pipeline. To do that, to update the version that you have in your cache. So in that hidden file again, you would just go next flow, pull any core RNA-seq with a revision for 1.0. So that would use the tag for the 1.0 pipeline, which we've just seen used in the next flow command. So again, all this is done is we've gone away to GitHub as used the previous versions because all the versions on GitHub are version controlled and it has pulled that down to your system with this revision number. Just to show you what that looks like, we can go to any of core list and we can see now that this has been changed because we no longer have the release and it's telling us that we have 1.0. So I'm just going to clear that again. Just because we do want to be using the master version or the most recent version of the pipeline today, I'm just going to go with master. So I'm going to pull that most recent version again, the master version, the master branch. That is probably a better way to describe it. Again, one thing to point out here is that it does appear to be a little bit of a bug still in this pipeline, but, excuse me, in this command, but it has been updated to enter 3.1, 2.0, which is the most recent version of the pipeline. Okay, great. What I wanted to show you next is that you don't always need to pull a pipeline. If you just want to run it straight out of the box, it will be pulled by default. So to sort of demonstrate this, I will just show you the next flow run command. Instead of running a full NF4 pipeline or the RNA-seq pipeline, let's example what I'm just going to run a very simple hello pipeline. So this is on that nextflow-io GitHub repository. It's called hello, and all this is going to do is just print back some hello world statements to my screen. The main thing I wanted to draw your attention to here is that when you run this, it will be pulled by default. So I hadn't pulled this to my system before. This is the first time I've run this. This will happen the same way if you try and run the NFCore pipeline sort of out of the box without pulling it first. It will just pull the master version, the most recent version, that is sort of the main or master branch, and it will just execute it for you and pull it automatically kind of behind the scenes. Again, store it in that hidden nextflow folder. Again, if I was to go to nextflow list, you'll now see that we have the RNA-seq pipeline as well as this hello pipeline. So this has been downloaded sort of behind the scenes for me, which is great. Okay, so taking a bit of a jump forward, what I'm going to do now is sort of look at what your first couple of commands might be like if you're trying to execute one of the NFCore pipelines for the first time. Again, we'll be using the RNA-seq pipeline as an example. For this, we're going to use the nextflow run commands. You will have seen this before in the previous two sessions. We will be using the NFCore RNA-seq pipeline. And what I want to do is use this help function from nextflow at NFCore. What this will do is it'll print out all of the different parameters that are available for this pipeline. So some of this might look very familiar if you've already looked at RNA-seq pipeline on the NFCore website. What this is is a list of all the different parameters that have been included in this pipeline with some descriptions of what they do. While we can do this in the command line, like I said, you might also be familiar with this or want to view this in the web browser. So going back over here to the NFCore website, we can go to the pipelines page along the top here. We can search for the RNA-seq and I can click on this here. Again, we've seen a little bit of this already, but we have the introduction page and along here we have the parameters page. By clicking on that, we're going to get the same information about this pipeline and all the different parameters that can be used to execute it. What we can see is that we have a number of different input output options. We have the UI, a read filtering option, a reference genome options. All of these are listed here, as well as what's been listed in the command line by using that help functionality. What I wanted to draw your attention to here are probably some of the most important functions, or most important parameters for this particular pipeline. We have the input, which is going to be a comma-separated file containing information about the samples we want to use in this experiment. We have an output directory, which is required. Both of these are strings. A little bit further down here, we have some information about the reference genome. Most of this will be resolved by the pipeline. If you are using a genome here, the genome sort of parameter is very cool. So this is actually going to rely on the AWSI Genomes repository as a part of this. If you just include a named genome in your execution, it'll download and manage all the reference files for you. I will come back to this, but I just wanted to point out this parameter here before we go past it. As well as this, we also have a number of different parameters for different sort of reference files, so the faster, GTF, GFF, bed file, all of these are essentially going to be paths as strings to the files that you need to use as references for a particular pipeline. Again, this will become clearer very shortly. Jumping back to the Gitpod environment, I have looked at all of these and decided a few that I might want to use. However, to first test the pipeline and make sure it's working, I'm going to use the test profiles that are available as a part of NFCore. So every NFCore pipeline comes with sort of prepackaged test data that you can include in an initial command that you might want to use just to test that a pipeline is running on your system and everything else is going to work as you might expect. So before we go any further, I just wanted to sort of show you what this test command would look like and also what the outputs might look like as well. So to do this, we are going to go with Nextflow Run. That's a little bigger, so hopefully you might can see. We have the Nextflow Run, we have the NFCore RNAseq pipeline. Now I'm going to use profiles. So profiles are something that we haven't really covered yet as a part of this training series. They might have been referenced a little bit in session one and two, I can't remember. But these are essentially sort of prepackaged sets of configuration and parameters that can be executed as a part of the pipeline. All of this is in building to the pipeline repository and will be the same for everyone everywhere. Like I said, we have this test data. So this test data is a profile that includes a number of parameters, which will include test data from the NFCore repository. Jumping over to have a look at this. You can kind of visualize this if you were to look at the Nextflow.config file. So as a part of this, we have a list of all the different parameters that are included in this pipeline. These are the default settings for the pipeline parameters. As you might notice, most of these are not all false, meaning that you'd largely have to turn these on if you wanted to use them in your pipeline. However, down here, we have a number of different profiles that are being included as well. We have profiles such as debug and then a number of different profiles for software management, such as condo, number, docker, singularity as examples. All of these, if you were to include a profile for these, these will essentially turn on everything you need to run the pipeline using this as well, using one of these as your software sort of management. For example, here, if we were to use docker, docker is turned on enabled equals true, docker use the emulation equals true. We have the condo, singularity, podman, shifter, charlie, clout, and aptana, or false meaning that they're turned off. Kind of the opposite side of this, if you were to use something like singularity, you'll see that this is enabled with auto mounts while everything else has been turned off. So these are kind of all the switches to turn stuff on and off. And by using a profile, you can sort of switch all these on and off in the right way without having to sort of code all of this in yourself manually. So it's really being packaged together as a set of configuration. Similarly down here, we will be using this Gitpod profile. This is just a small profile that is used to manage the resources in a Gitpod environment. And then right down the bottom here as well, we have some test profiles. These are all getting included from this conf folder. And inside this conf folder, we have test.config. So let's jump over here and have a look at that as well. So here, again, in the RNA repository for NFCore, we have this conf folder. And inside of that, we have these profiles. The first of which I want to look at is this test profile. So this is the test, this is how the test data is specified for this test profile. This is how you can test the pipeline. As a part of this, we have some sort of resource limits so that this can be run on GitHub Actions. As well as that, we have some input data. So this is going to be the input parameter. This has been organized as a part of the parameter scope. And this is specifying some data from the test data sets repository from NFCore. And this is starting off as a sample sheet. As a part of the sample sheet, it will be including other test data, which is also stored on the test data sets repository. At the same time, you have some more extra information here such as a faster reference, a GTF, a GFF, a transcript faster and an additional faster. These are all reference files that are required for the pipeline to run with this test data set. If you wanted to run this again, it's a full-size genome. You could replace most of this by just using that genome parameter that I showed you earlier and just specifying something like GRCH 37 or 38 or whatever reference file you're trying to use. This is an example here of that. So this is the test full.config. And as you can see here, this is much more minimal. This is actually a much larger test data set which would take much, much longer to run. And all that has been specified here is this singular genome. So if you are using a full-size test data set, you could just specify a genome here, name it, and then as a part of the NFCALL pipeline, it will download and manage this for you. There are some caveats here that you probably should be aware of. Is that every time you do this, if you're different user profile, try to download and store this locally as well. You would be better to download and store this as sort of an institutional or have a shared file for this on your system so that everyone who is using NFCALL pipelines will be able to benefit from this as well rather than everyone doing it in the South and creating multiple copies of this genome. As another little diversion, you can find out more information about this under the docs here, under reference genomes. There's a lot of really nice information here about how this is organized, how it works, how you can use it. So like I said earlier, you can use this genome parameter with a named genome and also some extra information about how you could sort of download this yourself and have it stored as a shared config as well as some extra information here about custom genomes. Taking a step back again, jumping back over to this confolder here in the NFCALL RNAseq GitHub repository. I do just want to highlight another couple of files while we're here. The first is this base.config. So these are the default resource allocations that are going to be used by the pipeline. What you might not be aware of is that in here we have all these different labels. Every module in NFCALL has been given a label. These labels will rely on what is specified here as a part of this base config to allocate the resources for that particular processing task. So for example, we have single process, low process, low process, medium process, high, as well as a number of others. If you were to actually dig into an NFCALL module, there has been included as part of this pipeline. So for example, we're just going to look at one of the RNAseq modules. Let's come from NFCALL. We're just going to look at salmon, for example. Have a look at the index. You go to the main.nf file. You'll see at the top here that it's been given the label process mediums. Because of this, it'll rely on the process medium label, it's been included as a part of this base.config, meaning that it would get given six CPUs, 36 gigabytes and eight hours of time to run that particular process. Of course, there's a little bit of logic in here that if this doesn't work, then it might be changed based on what is actually available in your system and what is required to run it for your data. Largely, you don't need to worry about this too much. The idea is that these resource allocations are given probably a little bit higher than what you would need for an average set of data, because we want this to work out of the box. We don't want people to have to modify this for the first time and they're still trying to get a feel for how this pipeline works. Down the line, after a successful run, you might decide to come back and modify this reducer to actually make it a little bit more fine-tuned for what you're trying to do to be a little bit friendly to a computer environment. Okay, so that's okay with a little bit of a diversion, but while we're passing by, I think it's worth all noting these. The final file here is this modules.config, which I do want to point out as well. Much like the base.config, it has process selectors instead of with label, this is with name. So any process with this name, there are some extra sort of parameters or configuration is being applied to that particular process. Most of these are really only specifying where these files should be saved in the published directory, but a small number do have these external arguments or extra arguments, whatever you want to think it as. Here, some extra arguments are being applied to this particular process with this particular name. The idea behind any of four modules is that they are kind of bare bones. This is done by design because you want it to be sort of reusable across different pipelines and different situations. Because of that, you wouldn't automatically include sort of flags or parameters that are going to be specific to one pipeline. These are kind of kept externally or decoupled from that module and then applied using a modules.config. This does feel like an extra bit of work, but it does make the modules much more reusable and you can also modify and configure these in an additional decoupled way, which we'll come to later so that you can modify these without having to touch the code base. On that topic, you should never need to actually modify this code base, except for in extreme situations. Most of what you're doing here should be able to be managed sort of externally using configuration and parameters files. Again, that might seem a little bit of a strange idea, but I will show you how to do this towards the end of this part of the session. Okay, again, all of that is in the site. So we're going to jump back over here to our command and keep building it. So we're going to use the test data. We're going to use the Gitpod profile to help manage the resource allocation because we are a Gitpod and I'm going to use Docker to manage my software. So Docker will sort of manage all the software, it will download it, pull those containers, run my software with it, or run this pipeline, run all the processes with the respective Docker containers sort of in an isolated way. So I don't have to download and restore all of this software myself. All this is done and managed for me. And then I also know that I looked at the parameters file and I knew that I needed to specify an output directory that is required. So I'm just going to call it results one. Okay, so now we have a run command, which is next to run, NF call RNA seek. I'm going to be using some test data to test that pipeline is working. Gitpod, check that all the environment to make sure that we're running this environment and Docker to manage all the software for this pipeline. Okay, all this is going to be saved in results one. So I'm just going to hit enter. What this will do now is spin up basically the execution. So this is going to be testing that everything makes sense. Everything has been supplied. If everything is okay, it'll start to launch. There are a couple of warnings here, but you don't need to worry about those. Those are expected. And what it is doing here is it's printing out some log information, which I might find relevant for the execution of this pipeline. As you can see here, everything that I have specified on the pipeline already has been listed here. Great, so that is starting to spin up and run, which is really nice. Okay, so this might just take a wee second to check. It will be sort of pulling those Docker containers down to start the processing of this pipeline. The first couple might just take a wee second to download and run. Great, so that has now started. You can see that these processes are started up. So we've got zero ones happening near and this will start to tick over quietly in the background. This pipeline using the test data will take about sort of 10 to 20 minutes to run. This won't run particularly quickly, especially because it's the first time I've executed this in my system and it's having to pull all those Docker containers for me. In the meantime, what I thought I would do is just jump back over to the docs here. Again, jump back to the pipelines page. This is on the NFCOR website again. I'm going to look at the RNAseq page here. And quickly start to talk about how you can sort of manage all these parameters and configuration for executing a pipeline. So as you can imagine that if you have lots of different parameters that you need to include in your execution, your execution command could be quite long and sort of difficult to write. To help manage this, you can include parameters using a parameters file as well as additional configuration using a configuration file. What you might remember from the documentation is that there are lots of different ways that you can configure your pipeline. So here, for example, this is on the NextFlow docs page. You can include extra configuration in the pipeline without having to touch the sort of main NF files, any of the files in the actual sort of could have a repository, the NFCOR pipeline repository itself. So for example, you can include things in a home and this hidden NextFlow config file. You could also include things in your command execution using the config file specified minus C option or a prams file option. Or of course, like I've already done in this previous execution here now, adding it such as the output directory using the command line. I wanted to sort of show you again how you can use prams files and your own configuration. This will be kind of a very simple example, but as we're doing this, just try and think about how this could be quite useful if you were trying to specify lots of different parameters or supply lots of sort of custom configuration at the time you're trying to launch your pipeline. Jumping back here, that's all ticking over quite nicely. I'm just going to open up a new bash window here. I'm just going to type in code by prams.json. And this has just been built up the top here and you can see this here in my open editor. I haven't saved that yet, so I'm just going to click sort of Command-S just to save, let's just save that here. So it's now being shown down here as well. So all I've done is just saved this file using Ctrl, excuse me, Ctrl-S on my keyboard, which you couldn't see. So just taking a little bit of a step back, let's say I wanted to customize the name of my multi-QC title. If I was building the execution command again, I could choose to include this in this command. So what would that look like? Would be NextFlow, Bram, NF Core, RNA-seq. We're going to use these test profiles again. Pro, file, test, the pod, Docker. I also need to include the Outdoor, which I'm just going to call results to in this example. So it has dropped over two lines, I was making it a little bit smaller. And then say I wanted to include this primitive, which is multi-QC underscore title. Now we go dash dash multi-QC underscore title, and then I could just call this my report. What this would do was apply this string, my report, to the multi-QC title. This is very cool. So I could just keep adding these into the command line. Some of these would be structured in slightly different ways, what you're trying to do, depending on what you're trying to do. For example, if you were trying to do something like skip QC, this is a Boolean, meaning that you don't have to specify true or false afterwards, you could just type in this dash dash skip QC into your execution command, and it would be applied by default. So for example, you just type in skip QC, which has gone off the side there, but that is all you'd need to type. What I'm building up to here is that you can imagine again, that if you had lots of different parameters, you wouldn't necessarily want to write one big long execution command. It's going to introduce reproducibility, every reproducibility problems, and it'll be a lot harder to kind of keep track of. What you might want to do is include all of this in one singular parameters file. So instead of including all of this, you can use prams file and include a parameters file, which is a JSON file. So what you would do here is write your JSON file. As a part of this, you would type in the parameters that you wanted to include. So you include these in speechmarks. You would go out there. Obviously, the coolest results too. I actually need to put that into speechmarks there as well. I would add a comma and then say I also wanted to add another parameter that I wanted to modify. Excuse me, modify. So multi QC underscore title. And then I could just call this something like Chris2 or whatever else you wanted to call it. You could then save this and include it again using my prams.json. If you were to execute this again, this would run the pipeline. It would apply this parameters file, my prams.json, because we've included it here as a part of this prams.file. I do not need to specify all of the parameters one by one on the execution now. I can just keep adding them into this file. I can keep this file. I can share this file. This will control all of the parameters that I'm trying to execute as a part of this pipeline. One important thing here is that if you want to use parameters or it have parameters included like this, they have to go in the prams file. You cannot include them as a part of a configuration file. This will make more sense again very shortly. However, for now, I will try and execute this, but we might have some issues here because this hasn't fully finished, but it isn't far away. I'm gonna try and execute this again, but I'm gonna use the resume flag so that we don't have to re-execute everything and hopefully we'll just sort of jump in at the end because I'm only trying to modify this multi-QC process down here. This is agonizingly close to finishing, so we might just give that a wee second so we can sort of just execute this again without having to worry about anything clashing. While it's doing that, you'll see that all of the processes here have run. They've been multiple tasks for some of these processes, so five of five, for example, but nothing has come back with a failed, meaning that everything has run through properly and this has run quite quickly, which is nice. That's run in eight minutes and 28 seconds. Of course, from here, we could go and look into the results folder and see everything that has been created, all of this has been organized for me. So for example, you could jump in here. Just gonna clear this and just look at the results folder for fast QC and you'll see all the different fast QC reports that have been produced and organized by the pipeline for me. Okay, like I said though, I'm just gonna leave that there for a second and we're gonna try and run this again. So I'm gonna use the resume flag to try to execute this pipeline again, but with the prams file. My prams.json, which I've created and saved in my working directory here. You could also include this using some type of full path, relative path, if you wanted, but for simplicity, I've just included it here as a part of my working directory. What I'm hoping to see is that the output directory for this pipeline is now going to be results too. Let me have this multi QC title here, which is Chris too. So these parameters, which I've included here as a part of my prams.json have been applied to this pipeline. What we can also see is a lot of this has been cached or being pulled from the cache because it has been run before. What I've modified as part of this pipeline has an impact to those particular processes or tasks or the tasks to those processes. So we've been able to rely on those pretty much all the way through the pipeline, except for right down the bottom here, we've got this custom dump of softwares in the multi QC report, which will now be named Chris too. Great. So this is going to get even more complicated because let's say that you don't want to sort of write all this manually. This can be a little bit cumbersome. It can be a little bit tricky, especially if you're new to writing JSON files. NFCore tooling has a way to help you do this using the NFCore launch function. So this is a part of the functionality that was a little bit of a bug. This is a little bit broken, which is why I'm using this dev environment or this dev version of the tooling. Again, this should be fixed very soon and will be a part of the main version of the tooling very soon. So I'm going to use a different command this time. It is going to be called, I'm just going to clear this. It is going to be called NFCore watch. Then I'm going to enter. This is essentially a function or some tooling from NFCore that will help you launch the pipeline and choose which parameters you want to use. The pipeline I want to launch is going to be remote in this situation. However, if you've downloaded the pipeline yourself and want to use this to launch a local version of the pipeline, you can do that as well. I'm going to use the RNAseq pipeline. Again, this is just all happening with my keyboard. I'm not having to do anything special. I'll be using the 3.12.0 release. It is checking that everything matches and it's now giving me the option if I'd like to do this in the command line or a web-based GUI. I highly recommend using this web-based GUI. What is happening here is it is... I'm not sure if you can see this, but it's just asking me if KitPub would like to open up an external website, which I'm going to say, open. And now this has opened up down the end of my browser. What you should see now is a big sort of next slide page that says launch pipeline with configured workflow parameters for a pipeline run. What you'll now see is that you have this big long list of all the parameters that looks very similar to what we've seen on the NFCore sort of parameters page over here already with all the different parameters that are available with descriptions of what they do and even things like some help text to choose how I want to launch this pipeline. The cool thing about this is that this will help you manage and also write that parameters file. So you don't have to worry about this specifically. So you don't have to worry about everything when you're trying to launch the pipeline. So I'm going to show you how I would build this run command again and we will show you how to sort of include those same parameters that we've just used to launch the pipeline again using that prams file. So I'm again, I want to use some profiles. I'm going to use test and Gitpod and Docker. I am going to use the resume functionality. The input file here would normally need to be specified especially if you are using a local sample sheet with your own data. However, because it's already included here as part of the test profile, I want to include that. Here however, I'm going to use results three as a new name and I'm also going to change the name of the multi-c title to Chris three. While I could go through here and choose any number of different sort of parameters to modify, to keep things really simple in this example let's just keep it at that. But you can see here how quick and easy it is to actually go through and modify all of these because they've all been defined as a part of the pipeline. So you could just say true false as quickly and easily as clicking on one of these buttons or providing strings or other files all the way through all these different parameters. So that is really cool. Right down here at the bottom you have the option to launch the workflow. Also at the top here you can hit this launch button as well. I'm just going to click launch workflow. This will validate everything that I've typed in and it's just gone away to generate this first of all launch ID for me. So you could copy and paste this back into your browser excuse me into your terminal and it would launch. This would basically just query this particular ID and launch it for you. However what I wanted to show you is that this actually creates this params file for you by default. So what it has done is it's created as any of params.json. If I was to go back over here you'll see that this file has been created in the terminal already for me. So you can see results three and Chris three. This is different to the my params.json which I created previously. This has been created by the total informer. It is giving me an option if I would like to run this command and it's telling me what command it would be running. So here for example, this is a very similar command to what we've run previously. We have this next flow run, NFC or RNA-seq. It is a concluding the revision of the pipeline that I would like to run. I should have been doing this as part of my profile as part of my executions already. However I've just been leaving it off just to keep things a little more condensed. We can already see the profile. So test, get pod and docker already included. We're using this resume flag and this params file. It is pointing at this params.nfparams.json rather than my params.json which I created previously. If I was to click run here, it would just run it for me. Before I do that, we'll just jump back here and show you that this is the same command that has been generated. I could copy and paste this. I could also copy and paste this params file. The thing about this is that even if you are working in an offline system, you could just copy and paste this out. You don't have to use the launch functionality. At the same time, while I've done this through the command line, which I'm just gonna jump back to very quickly, hit yes. So this will start launching the workflow for me, which is very cool. If I was to jump back here to the NFC or RNA-seq page, you can see this launch button at the top here. I think this has been fixed. This was one of the issues that we had previously when I tried to record this, that this wasn't working, but this is working again now. So you could just type everything you're gonna get here and then you could sort of generate the run command and again copy and paste it out without having to input anything through the command line. Okay, so this is the game running and you can see here that the output directory and the multi-UC are the same as what I included as a part of that launch command, which is very, very cool. Okay, so that is all running and it's looking very good. I'm just gonna jump back to my other terminal window. Again, if this wasn't clear before, you can actually just click this little plus sign here to create other terminals that you might wanna sort of play around with or use and parallel at the same time. What I wanted to show you next was just another one of these NFCore commands for users. So we've looked at the list and the launch, but I also wanted to show you download. So download is a really useful tool, especially if you're trying to work in an offline environment. What you can do with this NFCore download is actually download and pipeline, all of the institutional configuration profiles that are also included on the NFCore repository. So realizing that I haven't actually explained what those institutional configuration profiles are. The NFCore GitHub also has these institutional profiles and they are essentially shared profiles that might be used in an institute. These are also loaded as optional profiles when NFCore pipelines are launched. And you can access these from anywhere, basically in the world. And there's also an option to download these as part of this NFCore download function. You might consider making an institutional configuration profile, saying you had a shared HPC cluster, for example, with some sort of resources or sort of reference files that are stored in particular locations. You could set all of this in a configuration file that is shared by your institute. Again, there's some really nice documentation online under the docs page about how you can do this somewhere on here, pipeline configuration. So there's a bit of information here about how you can have your institutional configurations as well. Anyway, jumping back here to the GitHub browser or GitHub window, the virtual environment, we have this NFCore download. So I'm gonna hit enter there. And I want to download a pipeline. So this is going to download all of the code, much like we did with pull, but this is a little bit more of a practical way to be able to see the files and folders, as well as the configuration files, as well as the singularity images, which is something I want to explain next. So I'm just gonna download this RNSE pipeline. I'm going to download this 3.1, 2.0. This is giving me the option to download the configuration files or the institutional configuration files, which you might as well do. And I'm also gonna download the singularity images. So this is a really important feature of the download function. The download command is that you can also download all of the singularity images that are included as a part of the NFCore pipelines. The cool thing about this is that all of this will be downloaded as the files that you would need and you can sort of move this across to an offline system. So if you're working in a clinic, for example, where you don't have access to the internet, you could download all of this and move it across into that environment without exposing anything to the internet. So here, for example, this is gonna give me some options to define a singularity cache. This is where the singularity images will be downloaded and stored. You can set this to be a shared folder. However, if you just want to sort of download everything that move it into that folder manually yourself, you can do that as well. I'm gonna click yes just to show you the functionality. I'm just gonna use, let's just do it to the local directory. Do I wanna add this to my profile? So this is like adding it to your BashRC, for example, so that this will be included by default. Many of you load a browser and this environment is gonna say yes because it doesn't manage too much. Obviously, I'll amend that. And here it is going to say I don't wanna compress this. I'm just gonna say don't worry about it for now. So singularity isn't really installed on this Git pod environment. What I'm really trying to show you here is that you can go away and download all of this and download the download command for manage all of this for you. What you can probably kind of see over here is that we're getting all of these downloaded, downloaded tools. Of course, if I'd chosen to sort of zip these or package these in a different way, they might get stored in a different folder. But the idea here is all of this getting downloaded and managed for you and you have the option to move this to an offline system. At the same time, we've created this folder up here which is the RNA seek version 3.12.0 which is the version I chose in the download command. You'll see here that I have the pipeline with all the different code which is the same as what you'd expect to find here on the GitHub page. At the same time, we also have all of these institutional configs in here. These are all the institutional configs that can also be found on GitHub. So just as an example, we can look at the Cric and they have some information in here that could be used to execute this on the Cric system. Again, this is all kind of slightly more advanced usage. For you to start off with, you could just sort of try testing this test command and then sort of build up to something like this if you're using Nexon and if your pipeline's more reclet, multiple people in your group or institute are using these as well. While all of that is downloading, which is great, I'm actually gonna jump back over here and show you one final thing. Up until now, we've been talking about parameters that you might want to sort of modify or customize, but there's also this extra layer of configuration such as that what's in that base.config or that modules.config file which I showed earlier. The thing about processes and the NFCLE pipelines is that occasionally you might want to modify these in a particular way so that you can do something that wasn't necessarily included as a part of the thinking of the pipeline developers. So as an example of this, you might want to modify the multi-QC in some way because we're using multi-QC already, I think it's nice to sort of keep a consistent example. So here, we're just gonna close that again, move this across. What we wanted to do is just create a config file. So much like what we have already seen in this example, we can actually jump in here and look at this pipeline. This is the pipeline repository that's really being downloaded. And here, I'm gonna look at this 3.3, underscore 1, 2, underscore 0. I'm gonna look at the conf folder and look at this modules folder. This is slightly more advanced usage but I think it's important to at least be aware of it. What I wanna show you is something like multi-QC and we are going to look at this process here. So this is the process block. What I'm going to do is actually just copy this out and we're going to modify it in a way that I'm going to reconfigure how this particular arguments are added. Again, this is slightly more advanced usage. This is something that you might want to build up to, especially if you are thinking about adding extra parameters or flags that aren't included as parameters. Okay, so I'm just going to code new config dot config. Obviously my naming isn't very exciting and all I've done here is just copied out this process block and put it into here. Today, I'm not worried about where the parameters are going, where this is going to be published. All I want to do is modify this argument. At the moment, this is probably a little bit more complicated than it might need to be, where the external arguments are using some groovy logic to essentially test if parameters multi-QC equals true. So if this parameter has been set, add this here, otherwise just don't add anything. So this is some logic that has been used to test if this parameter has been set, if it's been set, apply the title, if not just ignore it. However, what I want to do isn't quite that exciting. What I want to do is just remove all of this and I'm going to hard code in what I want to call the title. So in this case, I'm just going to go for dash dash title, which is the same as what was included in this flag. If anything, I can probably just take that and think of that, so you want double marks there. These slashes here are used to escape, essentially the speech marks, otherwise, next time I will get a little bit confused about how to manage this. And this would become everything that is included rather than escaping out as special characters. It doesn't matter too much. Instead of using the parameter multi-QC, I am just going to overwrite this as something, and it manages it, which is called chris4. Now, what will happen next is that if we were to try and include this, I think that is all closed off properly, yes. So we're going to have this process block, this process for anything with the name multi-QC, these arguments will be applied. As you can imagine, if you did have multiple processes with multiple different names, you could sort of start to stack these in here as well. This is a very simple example, but if you had different processes named, you know, different things, salmon, for example, you can check how these things were named, sorry, over here as an example, you could sort of just choose to copy out these names and just apply your own arguments in here. I won't do that today. But you could sort of build out this file to be quite large, just apply a number of different sort of extensions to what you're already doing. Okay, so going back to how you actually include this in the run command now, which I think is probably most important. Again, I'm just going to clear that. We're going to go for nature in the right folder. All of this has been downloaded and included, but won't be included. There's nothing in there that I think will cause me any issues. I'm just going to go for the next flow run. So this is the same kind of run command that was built previously, NF Core, RNA-seq. We're going to go for the test profiles again, profile, test, get pod, docker. We are going to not include a premise file, but we're going to include a conflict file with minus C. Sorry, I'm just going to move this up on the screen. Let's forget to do this, sorry. I think it's a little bit bigger as well. Hopefully we can see that. So if the next flow run, NF Core, RNA-seq, minus profile, test, get pod, docker, all separated by commas, minus C. Now I'm going to include this next flow.config. This is this custom conflict file, which I've created. Inside of that, we have the process block with a selector with name, multi-QC. And I was applying these arguments, which is this going to be a hard-coded title for multi-QC, which is going to be Christopher. If you jump back over here to the actual terminal again, I am going to execute this. I will need to include an output directory, which was going to call results for, just to keep stacking this up. And hopefully this will run. Okay, so I just need to include that. 3.12.0. So this is because I've been using revision flag sort of intermittently. If it doesn't like it, it will specify configuration does not exist, nest.config. Okay, so I'm just going to move into a different directory here. I think there's something that's conflicting, cd-new. And we'll just move the new.config into new. So just to explain what I've done here, I think there was something only that was causing me an issue in terms of some sort of conflict that's probably been caused by downloading a lot of this stuff in the background. There will be a few extra files in there that I was anticipating. But I can still go back and use this command here, which is an excellent run, any of core RNA-seq profile test to get by Docker. I'm going to include this. That's why I've included the wrong file. This should be new.config, sorry about that, which is why I was getting that error. This will include the new.config and execute the pipeline using this. So something worth noting here is that I have used the process scope. You could, in theory, include lots of different scopes in here. So for example, Docker, you could go enabled equals true. You can include lots of different scopes for lots of different parts of this pipeline. What you can't do, and I did sort of allude to this a little bit earlier, is use the parameters scope. So because of some of this sort of any of core magic behind the scenes, you can't include params in here as a part of configuration file. They won't be loaded in the right order. So previously we've sort of shown again that we have this sort of order of priority for configuration and parameters. The any of core pipeline is kind of operated slightly off the side of this in that parameters aren't loaded in the right order, meaning that they wouldn't be loaded as a part of the configuration via before they are required of the execution. The exact sort of sort of functionality for this doesn't matter too much, but the main thing is that you should never include parameters here as a part of the configuration file. Always include them using a params file. I can't stress that enough. A few of the pipelines now do point this out, and I think there's even a few sort of flags for this as warnings under things like pipeline configuration here on the NF core website. Again, I can't stress this enough. Parameters go in the NF and the params files and everything else can go in the configuration file. Okay, so I think that it's going to be running and loading. I should have looked at the top of this before. It kind of got pushed off my screen, but you might have to trust me that this has worked as a config file with the title Chris for the top there. I might just kill it very quickly. So I can show you what it looks like at the top there. So again, this is just going to spin up this environment, excuse me, spin up this execution. And we can see here at the top there we have, this hasn't been included. Oh, it's because I haven't included as a parameter. My apologies. So you won't expect to see this here is because this isn't a parameter anymore. This has just been hard coded in. The parameter hasn't been included. So all we're going to see here is the results for. We'd have to wait until the whole pipeline runs to see it as a result down here in the pipeline. So please ignore everything I said for the last 30 seconds. I'm not thinking about that properly. Okay. So just to kind of recap this session for users, users of NFCALL pipelines. We have this NFCALL tooling. NFCALL tooling has a number of sort of commands for users. You should never need to go in and actually modify the code base. You should be able to sort of download, use the pipelines, check which version you've been using. You can sort of pull, update, change the version that you have stored on your system. If you have lots of parameters, you could use this launch command to help you sort of generate that prams file. You could also use this launch command to actually generate the launch command and use it directly on your system. If you're operating on an offline system, you could use this download functionality, which will download all of these singularity images, as well as the code, which you could then just move across and execute on your system using a relative path. Again, a lot of what I've spoken about here has kind of been a very quick sort of over-the-top view of some of this, trying to demonstrate some of the core functionality. I can't encourage you enough to go over here and look at some of the NFCALL documentation, particularly here for usage. So things about like getting started, the installation, the data management, and particularly the configuration as well. Something I don't sort of go into great detail here because we're using this sort of training environment, which is much smaller, is that of course we have these iGenomes. And again, NFCALL manages for you simply by adding the parameters, the parameter genome and naming a genome, which is available as a part of that particular pipeline. This will download and manage this for you. So you don't need to worry about sort of specifying a path for every single reference file. By default, the pipeline will manage this for you, which is another great feature of these pipelines, which I don't think we talk about enough. Anyway, I'm gonna take a quick break again. Feel free to do the same. Because this is on YouTube, of course, you can pause it and just come back to it when you're feeling like it. Anyway, I'm gonna take a quick break, like I said, and we will be back again soon. Okay, so welcome back to the final part of the session, where we will sort of move from being an NFCALL pipeline user to an NFCALL pipeline developer. Of course, you don't actually have to sort of create and submit your pipeline to the NFCALL repository. You might just be doing this for yourself, but all of the same concepts apply and that the NFCALL calling can really help you make the best practices pipeline. What you might notice here is I have changed my screen just a little bit. I've closed a few windows and moved a few things around just to make things a little bit easier for you to follow. What I will also do is just make a new directory, which I'll just call part three. And I'm just gonna move into that directory. I will also update my explorer over here to the left by just clicking on file, open folder, good pod workshop, pass three. The only reason I'm doing this is so that everything I'm doing is I'm creating this new pipeline for you to follow along in a little bit more of an obvious way rather than sort of being full of all those other things I've really created while showing you how to execute your pipelines. So here I am now in this part three folder. Just gonna show you where that is. Of course, you don't have to be in exactly the same path. You could just create any sort of empty directory as well or even work in a directory that you've already created. Okay, one small thing I'll mention again as well. If you have started watching this video halfway through, I am using a dev version of NF Core in this workshop. The reason I'm doing this because it was a small bug, which we found as I started to record this, the dev version has fixed this already, but it hasn't been submitted to the main version of the tooling. So this is me just using a dev version for now. This will have been updated for this version of the workshop as it goes out live. You might find it being an updated version if you're coming to watch just at a later date. Anyway, if you see any sort of differences, that is most likely why. Okay, right, so moving back to the NF Core tooling. We've already talked a little bit about these commands for users in particular list launch and download. Of course, list those pipelines that are available on the repository launch will help you launch these using a GUI interface and the download will help you download the pipeline, associated configuration files, as well as the singularity images if you choose to to move to an offline system. What we are really doing here is moving into these commands for developers. In particular, we'll be creating our own pipeline using some of these linting tests to help keep our pipeline up to date and following these practices. And then we'll look at how we can kind of use the schema to build some schema that can be used for things like the launch functionality and then we can look at using the modules and sub-workflows function as long as you add a little bit extra modules from the NF Core Pository to our pipeline. So to start things off, what I will do is just clear my screen again. I'll try and keep doing this, that's license centered on what I'm doing. I'm gonna use this again, NF Core Create, so a few typos in there, NF Core Create. This will then bring up the tooling that I can use to create a pipeline using the kind of NF Core structure. So the first thing is in a workflow name. So I'm just gonna call this workflow. I'm sorry, I'm not procured with the names today. So my workflow name is workflow, a workflow for demo. And I'm gonna call myself the author. I'm now given the option if I want to customize different parts of the template I want to use. At this point, you could choose to exclude different parts of the template. However, I'm just gonna include everything as a part of this sort of example. So I'm just gonna hit no. It is now creating NF Core workflow, which is the name of the workflow I gave it. Inside this, you will see all of the files that we've already seen, all of the folders that we also got to view a little bit of when we're looking at the NF Core RNA seed pipeline. The cool thing about this is that all of these files are kind of created in a viewable in the same location. So for example, we have this nextflow.config. This should look very familiar because this is what all the NF Core pipelines are sort of based off. This is how every NF Core pipeline will have started or been adapted to. At the same time, you can see that we've got this conf folder up here with some test data in it. So you can see that we have this test.config. This is being included using an include config statement down here in the nextflow.config file. As a part of this, we already have an input path using some test data, which is already hosted on the NF Core test data set repository. The reason I point this out is because this is actually a working pipeline already. This is really a foundation for you to start building on top of. So by default, what you can do. So before you go any further, I just want to point out here that it does recommend that you push this to GitHub. I want to do that as a part of this tutorial, but you have the commands here. If you've created a repository for yourself on your GitHub, you could push this code to that repository. And this will allow you to use Git and GitHub to manage your code as you are developing, which is especially useful if you are doing this collaboratively. Anyway, just taking a step back again, you'll see that I'm in this working directory. I have this NF Core workflow slash pipeline, which I've already created. I'm going to use the nextflow run to run this pipeline because this is local. I don't need to specify the NF Core repository or my own Git repository for this pipeline to be executed. I can just specify the file and it's relative path because it's in my current working directory. I can just put it straight here. I'm going to add profile. Then I'm going to use the test config as well as the Docker config. So the test config will bring in those test files. Docker will manage all the software as a part of those modules that are included in this pipeline. And then I'm just going to add this parameter in the output directory just again. Call it results one. I can now hit enter and this pipeline will execute. The reason I show this is because this is a working pipeline straight out of the box. This gives you the opportunity to really iterate on top of it. You can dig into these files that are already included in this repository. So for example, we have the NF Core modules that are already included. So FastQC, MultiQC, we have this custom dump of softwares as well as this local module for checking the sample sheet. This is now launching. So NF Core is, excuse me, NextLoad is going to be bringing in those Docker containers to run this pipeline. And you can see that this is working straight out of the box. At the same time, this will also be familiar. We have some of the options and different parameters that have been selected as part of that test profile being printed to screen here, which is really useful. So this will just take over very quietly in the background. One thing to point out here is that we are using this sample sheet to check, input check, sample sheet check to check the input sample sheet. I think this will very soon be superseded by NF validation, which is going to do a lot of this work for us. NF validation is a plugin which has been recently developed to check the inputs for an NF Core and NextLoad pipeline. The idea here is that this is a slightly nicer way of checking the inputs for a pipeline. Well, they're having to sort of use these sort of clunky modules, these clunky processes that while they have served a very good purpose, there are better ways to do it now. Okay, so this has completed successfully. What you see is that we have this work directory which has already been created. There's also results being stored here and results one being the name of the output directory which I've already created. So that's great. We've created a pipeline using the NF Core Create sort of command. It is now working pipeline. I can start to sort of iterate and build the pipeline I want to build on top of this. To start things off, we're going to sort of start by breaking a few things. I can show you how you can sort of circumnavigate having to sort of deal with these problems. But what I wanted to show you, first of all is the NF Core Lint function. So NF Core Lint is a really useful tool that you can use to check that your pipeline is following best practices. So what I have done here is I'm actually in the right directory. I want to move into my NF Core pipeline directory and run this again. So I've just again moved into my NF Core workflow directory, the workflow I've created and I'm going to run the NF Core Lint. What this is doing is it's running a series of checks to check how my pipeline is looking, see if there's any warnings I should be aware of. It's running a number of different tests that are included as a part of the NF Core Create functionality to check that my pipeline is looking good to check if I've done anything that is kind of branching away from best practices. So here, for example, we have a list of warnings. Most of these warnings here are to-dos. So to-dos are basically strings that have been added to different files in the pipeline. This is telling me to go back and to do this. At the same time, you have this nice plugin over here on the left. So this is a VS Code plugin that you can use to help you basically go through and identify where these are without having to dig through the code yourself. This is really just a nice way that you can tell yourself to come back and fix something, or change something, or do something. Give yourself a reminder that there is something that you need to go back and do in your pipeline. Again, you don't have to do this. This is just like a nice sort of way for you to go back and sort of manage the development of your pipeline, especially of what people are doing this in collaboration. However, in this situation, I actually want to turn this off because there is a lot to look at there and I don't want to go through and do all of this now. To do this, what you can do is you can go to your nextflow. Excuse me, NFCore, where is it? Your NFCore YAML file. So this is a hidden file here. And we can add in some code that will tell us to, will tell the linting test to basically skip this. To do this, all we need to type in is lint. We go down a line, tab in. And then we can go to pipeline underscore to doops box. What this will consequently do is turn off the linting test for pipeline to do so that we won't get these warnings anymore. I have now saved that just on my keyboard, which you won't be able to see. And it's going to type in NFCore's lint again. So this will go through the series of checks again. You can now see that most of these warnings have been turned off. We still see some warnings here for these module test warnings. These are warnings to tell me that there's a new version of these modules from NFCore and the NFCore repository. So pipeline sort of ways, if you're using the NFCore modules, you will be able to track if there's newer versions of the tools available. So for example, here we have the NFCore modules. Jason, as a part of this, we have tracking information for all of these modules. And this will tell you if your version of that module is most recent. Here, we can see that this module is not up to date and there's newer version available. I will come back and show you how you can both include, update, remove modules using the NFCore tooling a little bit later. For now, I'll just leave this here. I also wanted to show you how you might want to turn off some other tests, especially if you are sort of modifying different parts of this configuration for an NFCore pipeline used, or excuse me, created using the NFCore create command. As an example, we have this code of conduct here. So this is just a file that is included, basically saying that if you're developing, contributing to the NFCore community, that you're doing it in a good way following this code of conduct. If you want to sort of remove this for your pipeline, for whatever reason, this is a new line explanation mark. Editing this file would actually throw a warning, which I can show you as well. So I've just added that line, if I have saved it, I'm now going to lint again. It is spitting back a failed test. And it's telling me that it's because my code of conduct does not match the template. Again, all we need to do is go to your dot NFCore YAML file and then I'm going to add files unchanged. Unchanged, I'm going to tab in from there and then go code of conduct.md. This is just specifying this file will be ignored. These tests will be ignored when it's going through the files unchanged, linting for this pipeline. Again, I've just saved that and I'm just going to go to NFCore lint again. And I'm hoping what we'll see is that I've typed it in correctly and this test now sort of, it doesn't fail because we have ignored it. Again, you can sort of modify this based on what you want to do. There is some documentation online for how to do this if you search this on the web, it'll come up. Okay, I'm just going to clear that and we'll keep moving on just because we do have a lot to cover. Right, so what I have here next on my sort of run list is this NFCore modules update. So as we saw in those warnings, we saw that a couple of those modules were out of date. NFCore tooling also has options to manage modules. So this is the next command in this list. So commands to manage an XFIDSL2 modules toolwriters. What I'm going to type in is NFCore modules and we can look at this kind of sub commands for this. We can see here that we've got things like list, info, store, update, remove, and patch as well as some tools for developing new modules. Start off we'll be looking at these tools for pipelines. The first of which is the list. So again, I'm just going to go to NFCore modules list and I can now have the option to look at my local or remote modules. So I'm just going to look at the local modules. What this will do is just print out the version of the modules that I have, where they came from and the version that they are, as well as some descriptions down the side there as well as a date. So that's really cool. I also know from my learning test that some of these are out of date. So now, again, going back to the NFCore modules command, I want to update one of my modules. NFCore modules update. I now have the choice, do I want to update all modules or a single module? Because I want to show you the functionality of a single module, I'm just going to go for a named module. It'll now ask me what module I want to update. I'm just going to go for fast QC. Do I just want to update it or do I want to preview the differences? I'm going to preview the differences to find out what they are. You can see here that there aren't too many significant changes. You just have this main.nf.test was created. Because of that, I'm actually just going to click no. I don't want to update that right now. This is a little bit of functionality that I don't think has been fully implemented. So I'm a little bit nervous to do that on the fly. So let's just leave that for now. But if you're updating your modules, you could use this functionality to do that. It is a really good way to update your modules as you go, especially if there has been a newer version of a tool released. At the same time, there are other aspects of NFCore modules functionality that you might want to try out. NFCore modules, I want to show you how to look at the info. So again, you can go NFCore modules info as a module locally installed. Yes, please select module. In this case, I'm going to go for multi-QC. And we can print this out to the command line about what you're expecting as inputs and outputs. Again, you could jump back over to the website, go into the resources for modules, for example, look for this module here, multi-QC, and preview this here. But if you are biased towards just using the command line for this type of stuff and don't necessarily need it to render on the website, you can just do this in the command line. Taking a wee jump to some other functionality, we'll come back to modules as well as some workflows and how you can add them using the NFCore tooling very shortly. I want to take a quick jump over to another part of the NFCore tooling, which will help explain the NFCore and any NFCore modules in the long run as well. So what I wanted to do is just go clear. Unless you see that we're still in this workflow directory. What I wanted to do is add some new parameters to my pipeline and then show you that using the linting will pick up that I had added these new modules. It'll tell us that it's not quite right and also suggest ways that we can try and fix that. So here as a part of this Next.config, let's say that I want to add some new parameters, which was going to call foo, which we're just going to call chris and then we're just going to go bar, which I'm going to include as a number. Let's just call it three. I'm just going to save this. All this is doing is adding some new parameters and next I will treat these as parameters. Once they're a part of the parameter scope, the parameter scope has been defined up here on line 10 and it's closed off down here on 63. So I'm just adding some new parameters to the pipeline, which case the string chris and the number three and they're called foo and bar. Again, I'm just going to clear that. I now want to show you what happens with my NFCore lint. Again, this is the same linting tests that I've done previously. It's just going to be checking my pipeline. We now have another two failed tests. Instead of just turning off the tests for this, we're going to look at these error messages and look how to fix them. So this will be quite a common error that you might come across, especially if you are adding new parameters to an NFCore pipeline. As a part of NFCore best practices, we do have this schema. As part of the schema, you have descriptions and information about all the parameters that you're using. This is really important for reproducibility and also for others coming along who use the pipelines. You can imagine how much chaos there would be if you didn't have all of the different parameters to find in the RNAseq pipeline, for example. So what you can see here is that the warning is prams foo from the next by config, not found in the next flow schema JSON. So we can go over to the repository over here. Let's close a few of these. We can see this next flow schema JSON file. When you open this, you'll realize that this is a massive JSON file with information about all the different parameters that are already included as a part of this pipeline. Before you do anything, don't touch this file manually. You never have to touch this file manually. If you're touching it manually, you're probably doing something wrong. Instead, what you should be using is this NFCore schema functionality. As a part of this, you have options to build, you have options to look at output docs, linter as well as validated. What I wanted to show you is this build commands. This has opened up the NFCore sort of I'm tooling the functionality. And again, we've been given a question down at the bottom of the command window. I'm sorry, this is down at the bottom. So it is found prams.fo in the pipeline config, but it's not in the schema. This is the exact error that we've seen around the linting test. Do you want to add it to the schema? Yes, we do. So let's type in yes. And it's the same for bar, it's been found, but do you want to add it to the schema? Yes. This has now been added to the schema to a degree, but it's asking launch web builder for customization and editing. Yes. So while this has been added, we'll find that this has been added down here at the bottom of the next row schema JSON already. I'm going to open this window. So this is another sort of window that's opened on the web, much like the NFCore launch functionality. Instead, this is a nice GUI that you can use to build your schema. As you can see, all of the parameters that were already included as part of this pipeline are listed here. So these are all, this is effectively this next row schema file that has been rendered. All of this information has been sort of presented in a way that you might expect to see it using that sort of launch functionality, or close to that launch functionality that we've already looked at. As you can see, we've got inputs and output options, we've got the reference genome options, we've got the institutional config options. All of these parameters exist in this next row.config that have been sort of included here as a part of the next row schema. What we will see right at the bottom here is we have this foo and bar. At this point, we could now add some extra descriptions. This is a string. This is a number. This is just arbitrary, what I'm adding in. This is an example of how you can add descriptions that you might want to use. You can see that it's already worked out that this is a string and that this is an integer, how I could change it to a number, and it wouldn't really change too much. It has given a default value. Of course, you could sort of change this to any number you want. For now, I'll just keep it at three. The same here, we have a default, which is Chris. You could change this as well. You can also do things like adding a required flag, so much like the output directory, which is required by default for the RNA-seq pipeline. You could make this required every time the pipeline is launched. If you were to sort of change this to null, as an example, you could then make it required and then you would have to add in that parameter whenever you execute the pipeline. For now, I'll just keep that as Chris. You could also hide this if you want to. You'll see a number of hidden parameters that don't appear unless you ask for them. At the same time, there are extra sort of options that you can use. You could use enumerated values. You could sort of request a specific pattern or a particular format. You have some nice dropdown menus here to help you do that if you choose to. As a rule of thumb, I find it quite useful to actually kind of investigate and interrogate what others have done in a file like this. I find it really useful to see how others have done it when I'm designing what I'm doing myself. So for example, you could sort of jump up here and just look at different parameters, how they've been designed, if they have default values, what they are, how this has been written. I think this is a really good place to start. I really encourage people to sort of imitate what they're already seeing on the NFCore repositories because some of the pipelines there are just incredibly good. Actually, all of the pipelines are incredibly good there. I should say some of them, but I think they're great sorts of inspiration if you are trying to develop stuff like this for the first time by yourself. Anyway, I just wanted to point out here, before we move on, that you have options to add parameters at this stage as well as add groups to your sort of documents. So here, for example, you could just add my new group, description, this is new, call it a group, and then you can kind of drag up and down here and include this. Sorry, it's quite slow, but you can imagine that I'm going to drag all this way to the bottom. I'm just going to put that in there for now, and actually put these two parameters in it. Here, you'll see that this is the JSON file that has been rendered. You can see that it's automatically updated with all the extra information that I've already added to the pipeline. But when you are finished, you can just hit this finish button here to say you still have your new core schema build running in the background. It should now update with your new schema. You can close this window. So that's the important part. If you didn't use a schema build or as existed, copy the schema below and copy it in yourself. We've got some nice buttons here to do that as well as some information here, giving us the new information that we could add to the pipeline. But that's great. I'm just going to close this window and go back to my schema. And if we scroll down to here, it'll be this search, search for through. You can see that information I added in that schema builder has already been added here, which is really cool. So going back to my client, which I said at the start, is that you shouldn't need to touch this file by hand. Use the NFCore tooling. Save it south the headaches. The NFCore tooling is here to help you build this. Anyway, I'm just going to make sure that's saved. This is finished. The web browser has closed. I'm now going to go back and use the NFCore linting. To show that these tests are now passing. Great. So we still have a couple of warnings there. We have these ungroup parameters. These are warnings because I was put those parameters in there and didn't actually add them to the group, but nothing to worry about. Great. So what I want to do now is jump back to kind of what I referenced a little bit earlier when I was showing you things like the NFCore update that you could use to get rid of some of those warnings. We're going to use the NFCore modules and sub-workflows commands to add some new modules and sub-workflows to these pipelines. So again, I'm just going to close that to try and keep things nice and tidy. Okay. So again, NFCore, I'm now going to use this modules command. So NFCore modules. I now want to start installing a module from the NFCore repository. So now I'm just going to go to NFCore install. You can see that I'm slowly building these commands each time. Tool name. So now that I'm trying to install a module from NFCore, it'll be querying the NFCore repository. I can pick a module, which I have here that I'll try and install this cat, FastQC. It is now giving me a little bit of code that I can use to install this module in my pipeline. So something that was covered in session one, a little bit in session two as well, of this sort of workshop series is how you can include modules and include processes from other files in your pipeline. So here in my workflows folder, I have this workflows.nf. Here we have sort of the base template that you can use to sort of start building on top of. You'll be able to see the include statements for the processes that will run when I ran that test file earlier. What I now want to do is include this new module, which I've already added, which I just added to this pipeline. These are already pretty well organized for me. You can see that I have modules installed directly from NFCore modules. I can just add this in here, and then just add in a few gaps to make the formatting a little bit more, a little bit prettier. Here, this hasn't actually been implemented on the pipeline. This is just me including it, so that next slide knows that it exists from this workflow.nf file. It'll go away and look for it or include it when I launch the pipeline. Here, for example, I will just add this in here and after fast QC, I can add in this pipeline, add it into the pipeline, and then add in what it actually needs for the pipeline, excuse me, for this module to run. Because I find it a little bit easier to look at, I'm going to go for resources modules. This is back on the NFCore website under the modules page. We're gonna go for fast QC. Fast QC, so a little bit confused there. You can see that it's expecting a map and some reads. At the same time, if I was just to look at this under the modules folder, you can see that this NFCore module has been included here, and this is the main NFCore for that. You can see that this is getting brought in as a tuple and we're expecting a value of meta as a path to the reads, which is gonna stage. So all I need to really do here is just copy and paste this in here. So the reason I can do this is because I know the structure of this channel already. If I was to go and check this, you can see fast QC here. So I can go to the fast QC module and I can see here it's got the same structure as what I'm expecting. That is how I knew it. It's because I've sort of run this previously and I know this module reasonably well. Okay, so now this is an example of how I have added in this new module. I've used the NFCore modules and still, I pulled this directly from the web. I'm just gonna do a little bit of extra coding in here just because I like to have things nice and organized. Run fast QC. I'm a big advocate for commenting your code like this. You'll thank yourself down the line when you come back and have to work out what you did. Now I'm just gonna clear this again. You'll see that I'm still in this pipeline directory. I wanna be back one so that I can execute this again. I'm going to go next flow, run my pipeline name. I'm gonna use the profile of test and Docker. Then I'm gonna use the output directory, which is prams, excuse me, prams out the results too. Again, I can just launch this and this should run for me. Hopefully I haven't made any mistakes there. So this wee warning that you're seeing here is something I should have mentioned earlier, but I sort of just can't pass it. If you have parameters that have been included but haven't really been used anywhere, you'll get these warnings sort of saying that these input values have been detected but not really used anywhere, or haven't been defined anywhere. This isn't a big deal. Again, this is one of those warnings that helps you sort of keep your pipeline maintained and up to best practices because you're not adding in sort of parameters willy-nilly without them being defined or controlled properly. Down the bottom here, you can see that this catfasqc has been added to my pipeline as run successfully. I very quickly and easily just dropped the symbol that I'm gonna do too much myself. I can go away and view everything, what it's expecting and also see where those outputs are, what those outputs looks like and where they might be stored. So for example, I'm sort of clear there and look at results too. None of this has been stored and defined but actually you can see it there in cat. So I should have explained sort of better. By default, this would just get stored in the name or the first part of the name of a process. So for example, because I haven't specified the name of the folder I want this to be stored in, this is gonna be called cat. Sorry, this is probably a little bit of a tangent but you can also choose the name of these folders that you want by defining them in those published directories that we've seen as a part of the RNAseq pipeline. Here, I haven't added anything into my modules.config to tell the pipeline where to store it. So it's just being installed in cat as a default because that's just part of the default name given to the module. Sorry, that's a bit of a tangent but I wanted to explain it just because I started. Okay, so I've added a module. I've used NFCoreTooling to do that. You'll see down here in my modules.json it's also being updated. So you can see this cat fast QC is also being added. So it is now being tracked by the pipeline. What I wanted to show you as well is something that you might not sort of be aware of. I think this is a particularly cool part of the functionality that people aren't always familiar with. Say you wanted to include an NFCore module. It is pretty close to what you wanted but for some reason it's not quite right. There's something that is missing or something that isn't working the way you want it to be. It's not quite worth making a pull request towards the NFCore repository to update the module. It is something very niche to what you're doing. To help with that, you could use the NFCore modules diff. This is a very, very cool piece of functionality. So let's go and look at this in a little more detail. So I've just included this cat command. We've got the main.nf here. Let's say that I just want to remove this. Whatever reason, that's an extra option that's been added to this particular input channel. I don't think I need it. This is a very simple example but you could imagine this could be the same as adding another input. It could be changing something down here in the script. It could be anything. I'm just choosing this because this is a really simple example. It's a reasonably straightforward example but I can demonstrate quickly. Now, if I was to go for NFCore lint, she'll move into the pipeline again. So I've just moved into the pipeline folder, the directory for the pipeline. Again, I'm going to go NFCore lint. What we should now see is that we have this test failed because the local copy of the module does not match the remote. As another test that you can sort of use as part of NFCore linting, you can check to make sure that your modules haven't been changed, haven't been modified. This is important for an aspect of reproducibility. So if you're saying I've used this module, here's the number for it. If you have changed it, you want that to be reported. So what I'm now going to do is basically fix this morning. And the way I'm going to do that is using this NFCore diff, NFCore modules diff functionality. Again, to show you how to get to this and sort of build up to this command, you can go to NFCore modules. Down here, sorry, it's function called patch not diff. I'm going to create a diff file. Sorry, that's my mistake. What I'm going to do is NFCore modules patch. Sorry about that, I've been calling it the wrong thing. For this, it is going to ask me the tool. I'm going to go for cat fast QC. Of course it knew that this was one of the tools I had installed. And what this patch function has done is it has created a diff file, telling me what the differences are between what I have locally and what is on the remote. This is really useful because I can keep this in my repository. And as these modules are updated over time, if anything changes that impacts this diff file, you'll get a warning. If it doesn't, then you don't need to worry about it. Now, what I'm going to do is just clear this and show you first of all the diff file. So this is just a diff file. It's got some, this is basically a tracking file that will be searched for by NFCore when it's doing this limiting. This is very cool. This has been saved. This has already been created for me by default in this NFCore repository next to the main.nf and the meta YAML for this particular module. I'm just going to close that. I think it's got NFCore linked again. And you can see now that the NFCore tooling is aware of it. It said zero tests have failed, but it also knows that this was there and that I don't need to worry about this particular change anymore because I have kept a record of it. The NFCore tooling really helps you keep records and also keep your pipeline sort of up to date. At the same time, I am going to kind of risk it. I think it will be cool to show you how to update module. But before I do that, what I'm going to do is just show you that you can also NFCore sub workflows, match like the modules. There's also some functionality here to download and store modules from the, excuse me, sub workflows from the NFCore repository, match like a module. You can install these, list them, remove them, update them. All of this is used to make it easier to basically drop in these large blocks of code module modules at one time into your pipeline with that and you sort of recode all of this yourself. Again, the cool thing about this is that NFCore will help you track all of this so that they can be updated and changed and improved over time thanks to the community contributions. Anyway, so that's just a little sidetrack. I won't sort of go down that path. What I'm going to do is just go, NFCore modules update. I suspect this might fail. I'm just going for a named module. Again, fast QC. It is trying to create this new main.nftest file which I don't necessarily think is going to work because I don't know if I've got the plugin installed. Again, this is kind of new functionality which hasn't fully been implemented as a part of all pipelines. I'm just going to try and update it. It has updated it for me. So over here, fast QC, this has now been updated. We've also got this tests file which has got some information in there about some tests that should run whenever this has been updated. Again, I'm sorry, this is kind of new functionality which hasn't been applied to all of the pipelines and all of the modules yet. But let's just see if this is going to go into work. Again, I'm just going to try and run this but I suspect this will split back in error. Okay, so not all of this has been implemented properly in this version. So we won't worry about that. I apologize for going down that path. Okay, so what has happened here is just like I said that there is a new functionality that hasn't been properly implemented. Please don't worry about that too much for now as this stuff is sort of added and improved. It will become implemented properly so that you won't get errors like I was about to face there. Okay, so that is really everything that you need to know to sort of use the modules that are already a part of NFCorp. Final thing I wanted to show you, this is what we will finish on for today is that you also have this NFCorp modules. Again, this is some functionality. These are really for the pipelines here that already exist but you might want some help in creating your own modules as well. For that, we have this for developing new modules, section of tooling. So you go for any of the core modules create. This is giving the option to create a new sub tool. Sorry, I'm actually gonna exit out of there and go into the pipeline repository but I'm already in it, apologies. Again, I'm just gonna create this again. Name of new sub tool, my new tool. This gives you the option to sort of specify a biconda package that already exists in the biconda sort of archive. For example, if you just want to try and include BWA, for example, you could start to add in some details here. I haven't actually got any notes to do this on the fly. So let's just try and do this in a slightly different way. Is that gonna work, BWA, yes. So this has gone away and found BWA for me. Of course, here is already an NFC for module for this but it will go away, look at these repositories if it already exists on biconda. This will help you on your way to creating the module itself. This is just gonna ask for some details. This will be your GitHub handle. So I'm gonna type in Chris. You can start adding some labels. I'm just gonna start saying process single. Do you want to require a metamap for a single sample? Yes. This has now been created in my modules local folder, which is here, so close to this. So just to help orientate, this is inside the NFCore pipeline that I've created. We have this modules folder. We've mostly been looking here at these NFCore modules. Here we have this local folder. These are the local modules that you might want to create that you don't want to submit to NFCore for whatever reason. This has already, of course, give me a lot more to do, but a lot of this is really useful information for writing your own module. It has already told me what the condo environment will be and told me what the singularity and docker container images will be as well. It's already put all of the different blocks that you might expect to see in a module in place. So for example, the inputs, the outputs, some when conditionals and the script block, as well as some information down, as well as some extra stuff in there about some stubs that can be used to help test the pipeline as well. Sort of rephrase in all of this. You can also use the NFCore tooling to create your own local modules. And this just really sort of helps you on your way. Of course, you'll just need to do all the thinking about how you need to design this, but this will give you the framework that you need to sort of write a module. And if the module is something that you can contribute, you can go away and contribute to the NFCore repository as well. Again, this is a little bit of an aside, a little bit of bonus content if you are trying to create your own modules as well. But hopefully, whatever you're looking for is already included in the NFCore modules repository. We are at around 1,000 modules. So there are a lot of tools that are already available and hopefully one of them will be suitable. If it's not 100% suitable, you now know how to use the patch to create a defile, speak and modify it. And now, if there still isn't enough, you can go away and create your own using the NFCore modules create functionality. Again, if you do create a module and you think this would be great, you can submit it to the community. Some of them will be more than happy to help you sort of edit anything that you might need to. And there's some really great documentation here for developers for contributing as well. So for example, if you were to be contributing a DSL2 module, there's a lot of really good information in here about how you can do it, what you need to do. All of the Git commands that you might not be aware of to include an update and sort of make the module the best it could possibly be. Okay, so that's a lot of information. I feel like now we're probably pushing, I guess we are pushing sort of three hours. Hopefully, what I've said has made a little bit of sense. This is session three of a three session block, which are called the foundational training. This session was primarily for any of core. Of course, what I have covered here might not necessarily make a 100% sense, like everything with an extra learning any core learning, any new language or any new piece of tooling does require time. Hopefully what we've done is we've been able to sort of introduce you to a lot of new concepts and ideas, kind of give you kind of a quick look at how you can use some of this tooling. Hopefully it's set off some, you know, some alarms or some sort of happy emojis that you can see the benefit of using this and that you wanna start using it as a part of your own sort of pipeline development or if you are using any core tooling, you can see how, excuse me, if you are using any core pipeline, you can see how the tooling will help you sort of design and run those executions on your local system. As always, if you have any questions, do jump on to Slack, yell out and ask those. Even after this training, if you're watching this at a later date, you can jump onto the training channel, ask anything that you've seen here, ask them to explain it again or in greater detail, as well as if you are looking at a specific sort of feature or tool or pipeline or whatever else, hopefully there will be a channel that's right for you. So again, thank you for attending and we'll see you again soon.