 have had a good rest and managed to kind of get all of the new knowledge and new training in from the first couple of days in and are ready for the climax of this course. Day three is a little bit different to the first couple of days. I should introduce myself. My name is Phil Jules. I work at Secura Labs and I am a lead developer advocate for NextFlow and NFCore. So I try and look after the communities and try and stimulate growth and make sure that everyone has a good time and basically make the recoding experience as good as possible for you guys. I'm also the co-founder of NFCore and I've been writing pipelines for NextFlow for, I don't know, lots of years now, one of the early adopters in Sweden. So today is a little bit different to the first couple of days. Monday and Tuesday we really kind of focused on getting NextFlow training core concepts of the language of how to write NextFlow and how everything fits together. And today we're going to look at NFCore, so what NFCore is, how the community is and how it's different to NextFlow. And then we're going to talk a little bit about some of the tooling that's available to NFCore and work through some material on how to create pipelines from the NFCore template using NFCore tooling. Then I will be a bit of a break and Harshal Patel, who is from the NFCore core team, will jump in and he will take you through the wonderful world of DSL2 modules, NFCore modules, which are kind of shared sub parts of pipelines and how to build pipelines using that. Finally, we'll wrap up with Evan, Evan Flodin, the CEO of Secura Labs, who will talk about NextFlow Tower, which is the kind of final icing on the cake really of this whole NextFlow ecosystem. So he started off the first two days with a core of NextFlow, adding a layer of NFCore and then Evan will finish with a layer of NextFlow Tower. Just a quick mention about a couple of things while I've got everyone's attention. So we originally planned this training as a lead-in to the NextFlow Summit and the NFCore Hackathon, which is kicking off next week. So if you haven't already been spammed enough by us about this and haven't seen this, do drop over to summits.nextflow.io to see the details. It's too late to register for in-person, of course, but you can still register to attend online for both the Summit and the Hackathon. Make sure you register pretty quickly if you want to join. That will close on Friday, but you'll be able to follow both the Summit and the Hackathon online. And if you go to the NFCore website, you'll go to events, you'll find more details about the Hackathon. And that's really what we're really keen to get as many new faces into the Hackathon as possible. And that's a perfect way to take all this training that you've done and spend some dedicated time to get to grips and actually do something with it. And so if you go to the NFCore website, you'll find all the details of how it works and how to join and what the groups will be. One other notification about something similar but slightly different is that NFCore and Nextflow Mentor Ships, this is done through funding with the Chan Zuckerberg initiative, thanks to them. We've done one round of Mentor Ships already and we've just opened applications for round two. That's both for mentees, so people who'd be trained, it's free for them and also for mentors and that's a paid position so you'll get a small stipend for helping out with that. The first round went really, really well. You can check out there's a blog post that we wrote which goes through the different pairs and the kinds of things I worked on and we were really, really happy with that. And round two is twice the size we're going to take on 10 mentors and 10 mentees. So please do have a look if you're interested. Closing date is November 1st for applications. Right, so if you go to the training page on the NFCore website for this training, back to now, we've got the schedule with all the streaming links and everything. You'll see that we've got a couple of links which are slightly different for this section. We're using a different Gitpod environment for all the stuff with NFCore. When Evan comes on last, he'll go back to the original one from the first couple of days but for when me and Harsh were talking, we're going to be using a new Gitpod environment and we also have slightly different written training material. So if you follow that link then you can find everything I'm going to talk about. It's all written down here. You can follow through at your own speed. If you didn't catch what I said, if I said it too quickly, you can hopefully sort of keep track of where we are in this written material. And importantly, there's the big launch Gitpod button so feel free to go ahead and hit that now and get that spinning up. And these are linked to within Slack as well. Okay, so I'm going to start today with just a few slides. Going through an introduction to NFCore for those of you who are new to the community just to give you a flavor for who we are and what we're doing here. And then I'll dig into some more hands-on stuff. So to recap what you've been going over the past couple of days, let's just try and describe exactly what NextFlow is. So NextFlow provides you with this language to write workflows and pipelines in. Those are based up of kind of the single unit within those are processes. They're connected with channels. And the whole thing is encapsulated within workflows. Those processes, NextFlow gives you implicit parallelization so everything can run in parallel where it can without you having to worry about it. NextFlow gives you the ability to resume a pipeline. It gives you re-entrancy so you can kick off wherever you restart, wherever you kind of stopped if your pipeline crashes halfway through or something. And it also gives you reusability that you can rerun the same pipeline and get the same thing again. Underneath that kind of language level, then NextFlow can also coordinate with a software level and a compute level. So with software, it has all the container images for all the packages within the pipeline. And you don't have to worry about installing all the tools that the pipeline runs. And with compute, it can interact with HPC schedulers, local servers and cloud. So again, the pipeline code doesn't need to have anything specific about your infrastructure. And then NextFlow also handles integration with code repositories for the pipeline code itself. None of this should be new. You've been going into this in a lot of detail the last couple of days. And the key take home really from all of this is that NextFlow is reproducible and portable. And hopefully you're well on board with this now. So that's NextFlow. What's NFCore? NFCore, this is kind of our tagline and community effort. Just because you can do all of those things with NextFlow, you can make them a NextFlow pipeline super reproducible and really portable. At the end of the day, it's a coding language. So if you write a NextFlow pipeline and you hard code a file path on your system, it's not going to be portable to someone else. And if you handle the software dependencies in such a way that software versions are not pinned and things like that, then the pipeline is not going to be reproducible. So NFCore kind of came out as a community project that sits alongside NextFlow were distinct, but there's a lot of overlap. And it came out of this desire to kind of establish a set of best practices. And also because it's possible to collaborate and share workflows, instead of everyone building their own RNA-seq pipeline that does pretty much the same thing, the hope is we can come together and work together on a single gold standard pipeline. And that's why NFCore has one pipeline per analysis or data type. So it's a community effort to build a curated set of analysis pipelines. What does NFCore give you? It gives you the pipelines which are ready to run off the shelf. So you want to run with the tax seek or chip seek or your data type, you can see if there's a pipeline already written and you can use it. Simple. You don't need to know very much about NextFlow in order to do that, really, just have to configure it. But beyond that, we also have a lot of tooling, kind of almost like third-party tooling, I guess. It's you don't need any of it to run the NextFlow pipelines or you really need for that as NextFlow. But these tools, the NFCore tools, make it easier for you to run pipelines as an end user and also make it a lot much, much easier to develop NextFlow code. Finally, what Harsh was going to talk about in his section is this idea of modules, which is a fairly new thing to NFCore, which is kind of collaboration and sharing of pipeline code, but not on a whole pipeline level, but on a single unit level. And this is something which has really exploded within NFCore the last year or so and is now really a major selling feature for using NFCore. Something I think I should probably make more of a big deal about because it's not clear to everyone to write away. NFCore, we have NFCore pipelines, but especially the tools and the modules here, these are for anyone with any NextFlow pipeline. And we actively encourage people to build pipelines with using this stuff, even if you never intend to put it onto NFCore, even if it's internal only and private, even if it's so specific, even if you're just testing and playing around with stuff. You can still use these tools. You can still use the NFCore template. You can still use all the DSL2 modules and collaborate on a module level. And it's very much encouraged. So please don't be put off by thinking, well, I don't ever want this pipeline to be part of NFCore on the NFCore website. That's fine. This is all still super relevant. Final point about the community. If you're kind of familiar to kind of other places which host pipelines or code, it's important to point out that re-operating a slightly different way. We're not a code repository. So it's not the kind of thing where you can turn up with a pipeline that you've written and ask for it to be put onto NFCore. We kind of really ask you not to do that. We want you to come with an idea for a pipeline. Tell us about it. Check that everyone agrees it's a good idea to be put into NFCore in a long run. And then maybe even find some collaborators or see someone else who's already proposed the same idea and work together with the community from the get-go. There's a couple of other things which are quite specific. We insist that you use the NFCore template if you're going to work with NFCore. And that's for lots of reasons. Standardization is a really big part of what we do. And much of the tooling we do is based around this idea of kind of common boilerplate code. Excuse me. Okay. And then finally, this idea that we only have one pipeline per data type. So if you come in and say, the RNA seat pipeline doesn't do what I want it to do, I'm going to build a new pipeline that does this. We'll say no. And also please go to the existing pipeline and add on that functionality. So that way that anyone coming into the NFCore community doesn't have to choose between pipelines. They just look up the data type they have and it's very obvious which pipeline they should use. So if you look at what pipelines are available to use today, if you go and check out the NFCore website, you'll see a big list. There's 39 which are released. And the release with the little green tick is important because this is the main quality control check for the NFCore community. So to get your first release with NFCore, you have to go through a lot of code review. And this is where you really make sure that that pipeline does indeed adhere to the best practices and the guidelines that we lay out. So once the pipeline's got a first table release, you can be confident that it's kind of best practices. We also have a lot of pipelines under development, which don't have a first release. And they range from very early development through to pretty complete pipelines, which may well be in use in production, but just haven't quite got through that first review. So still lots of usable code there. And then some pipelines are archived. This is usually if the functionality that that pipeline provided has been brought into another pipeline, or if it's outdated and it's just never going to be updated again, we archive it, we never delete things. That way, people who use that pipeline in the past can still go back and reuse it for reproducibility. I mentioned these DSO2 modules. These are more numerous because yesterday and day one and day two, you've had, for example, a process with salmon index to build a reference index. So that would be a module, is a module. And then salmon, you can have an alignment step, and that would be another module. So each individual process level gets wrapped up into DSO2 module. And we have this massive library, which has just grown like crazy the last year or so, and shows no sign of slowing down. So many, many common bioinformatics tools already there. And that means you can assemble the pipeline very fast. This is a really strong selling point. But how shall we talk about this in much more detail? So if you go to the NFCOR website, you'll see a big old list of pipelines and you can pick through and there's a similar page for modules. So we started NFCOR in 2018 and we've had a really steady increase in community size since then. This is the visitor statistics to the website. And you can see a little kick just at the far right there. That's this training. I think you're one who's been coming and checking out the NFCOR website. And basically shows kind of no sign of slowing down at this point. We have over three and a half thousand people on the NFCOR Slack, which is crazy. So pretty much any time of day and night you have your problem with the pipeline, you will find someone lurking on the NFCOR Slack and you'll be able to get help. And this is really one of the key selling points for NFCOR. I've nearly one and a half thousand people now who've actively contributed to at least one pipeline, be it through writing code or creating issues, bug reports, things like this. So the community is really big and is really growing fast still, which is super exciting. We are spread now pretty much all over the world. We started off in Europe. That's where next flow started and much of the initial community kicked off, but we've got a strong presence in America's now and also increasingly in Asia and Australasia. So we're really interested in kind of increasing the inclusivity of the community and geographic kind of spread. This is part of the reason we're doing things like the mentorships with CSI and also Marcel, who you heard from yesterday, is based in Brazil. He's a developer advocate. So he's really trying to get more people interested and excited about next flow down in South America. And Chris, who ran the training sessions in the APAC time zone, is trying to get people on board in Australasia and Asia. So please do try and spread the word, but we're really excited with the way that this is going. Right, final thing in my slides is just to mention that there is a pipeline came out in 2020. Parts of it are a bit dated now, but if you're curious about how we built the community and why we did things in the way we did, it's quite a good read. I'm a bit biased, but and the supplemental materials have quite a lot of detail on how we set up the back end of the community, which could be interesting, especially if you're, I think you're doing something similar. So yeah, check it out with that. Let's get stuck in and actually do some do some coding. So big green button, quite a long list link, but like I said, there's an easy way to find it through the event page. And I think it's linked to in the in Slack. So click launch, get pods, and then you can launch get pod either in your web browser. Or what I've done is I have it connected to my local installation of VS code, which should be exactly the same. But I just prefer it slightly. And so now I am logged into get get pod note that this is a different get pods to the one we're using the first couple of days. We've been doing some quite intense work on NFCore tools right up until the last minute, by which I mean, yeah, less than 24 hours ago. We were up late pushing a version 2.6 release of NFCore tools. That will usher in additional functionality that's paving the way for sub workflows, which are going to be a huge thing for NFCore. More on that later. So as a result, we wanted the latest leading edge version of this software for you guys. So we're using a slightly different get pod environment. So check that out. And when Evan comes on later, he'll be back in the original one. So the first thing I'm going to do this get pod environment is based on the NFCore tools repository. And there's a load of files here on the left hand side that I basically don't really care about. So I'm sorry. This is going to move that browser window out here so I can follow the materials myself. So the first thing I'm going to do is I'm going to bring up the terminal here. All size. And how shall we need to mute yourself on zoom? And we're going to make a new directory for myself. So I'm going to go to my home directory. And I'm going to do make the training and going to go into the new training directory. I'm going to do pwd find my current working directory. And I'm going to grab that. I'm going to go into file and open folder. If you're doing this in the web browser, it's like a little kind of burger icon just above the explorer. You should be able to find file through the same place. And there was a shortcut key as well, which depends a little bit on your operating system. But open folder, you should get to a dialog. It looks like this. And I'll place that in. And then VS code is going to reload. Yes, I trust your authors. Great. So now on the left, we've got no files here, which is good. If things are ever slow, you can hit that button to refresh the file explorer. But there's nothing ever. There's nothing in this folder. So kind of surprising. Great. Okay. So we've got a slightly cleaner environment to work on. This get pod environment, like I said, comes with NFCore preinstalled and also NextFlow. If I look at NFCore version, I can see that it's got version 2.6. And if I do NextFlow info, we can see that I think it's NextFlow 22.04. If you are installing NextFlow, these tools yourself, NFCore tools especially, is actually written in Python. So it's not groovy or anything like that. And it's available through the Python package index here. So you can just do pip install NFCore. It's also on Bioconda. So you can do condo install NFCore. If you've got Bioconda channels and everything set up properly. So you can install it pretty much however you like with typical Python packaging. There is a Docker container, but I would avoid that for now for this tutorial if that's okay. Okay. Great. So hopefully you can follow this along either in get pod or locally. So if we look at NFCore help, minus, minus help, get an overview of all the different commands which are available here. So this is kind of broken up into a couple of sections here. You've got different sub commands available for people running pipelines for users. And then you've got commands available for developers. Most of the time we're going to spend on this bit. But just to kind of point this out, we've got NFCore list, for example, which gives you all the available NFCore pipelines. It's the same as the website page, basically. But the key difference here is if I do NextFlow whole NFCore RNAseq. So NextFlow grabs this copy of the public GitHub repository for me. It's doing the same thing as if I did NextFlow run NFCore RNAseq just without actually running it. So it's going to fetch a copy for me. And then if I do NFCore list again, we can now see that it tells me I've just pulled this copy of this pipeline. So if it could also say a week ago or six months ago, and it also tells me that it is running the latest release of NFCore. So this is quite useful just to keep an overview of which pipelines you have locally and whether you need to update them. And also on the kind of user front, we've got things like NFCore download, which is good for anyone running on a system which is totally disconnected from the web. You can run NFCore download to fetch a pipeline and its configs. And if you want all the singularity images, so that you can then transfer that onto your system. That's used a lot. And NFCore launch, which gives a graphical interface for launching pipelines where you can fill in all of the different parameters. And that shows you kind of help text as you go along and validates the inputs as you go along and stuff. We'll touch on that a bit more later. But what we're really interested in right now is creating a new pipeline. So I'm going to go right ahead and I'm going to do NFCore. I'll do close. I'm going to do NFCore create. Most of the NFCore subcommands and commands, you can give command line flags to make them run. But if you don't give them any flags, they will prompt you for stuff as you go along. So generally, when you're running interactively like this, I would say just do NFCore create and it will tell you what to do. One of the benefits of this is you get kind of inline validation. So if I do this is rubbish name, then it's going to tell me I can't call my pipeline that, which is probably for my own good. So I'm going to call this one demo. I'm going to say my name. With the newer versions of NFCore, you can choose to make a pipeline without, for example, the NFCore logo and branding. If it's not a genomics pipeline, you can choose not to have reference genomes and things. And so this is quite a nice new feature for those of you, especially working out slightly outside the realms of NFCore, you can choose which parts of a template you use. I'm not going to do that right now. I'm just going to say no, give me the full pipeline. But it's good to know it's there. Right. So it's created a folder for me, NFCore demo, and it's even told me what commands I should run next in order to get my code up onto GitHub. And it also reminds me that I shouldn't go ahead and start writing a pipeline. I should go and join the community first. So if I look in this folder up here on the left, there's a lot of files it's created. I'll walk through some of those quite quickly. But the first thing I want to point out, if I CD into the directory here, this is like a fully fledged Git repository already. If I do Git status, you can see there's, it's kind of tracked. I can do Git branch. And you can see that we've got the three branches that NFCore always works with. When you run a pipeline of Nextflow, run URL or Git repository, it fetches the default branch. So what we do in NFCore is we say that the default branch is master. And that should always have the latest stable release. So if people don't specify a version, it gets the latest stable release. And all of the development work happens on dev. And the template branch is used for automated template synchronization. So if I do Git log, the way this works is that when I created the pipeline, it automatically made a first initial commit, which is shared across these branches. And this has the pipeline template in it, which is completely unmodified. The template branch, you never touch. You carry on developing on dev and master. But when there's a new release of NFCore tools with a new version of a template, where we have automation within GitHub actions, which creates a new commit on the template branch with a new updated version of just the template. And so Git knows what's changed between those versions of the template. And then it sets up an automatic pull request coming into dev, which hopefully brings over just the changes that have happened within the template and leaves all of your custom code unaffected in theory. So that's what this branch is. And that's why you shouldn't touch it. And then you see here also, so it tells you you can push to a remote. So if you go to GitHub, you can click new repository. Remember not to click initiate with a read me or anything. You just want it blank because we've already got the first commit here. So you can add your new remote that you've created online and then do git push. And remember to do git push minus, minus all so that you get all three of these branches up onto the web. Okay, great. So NFCore sets you up properly with Git with all this kind of configuration. What else have we got in here? Some of these files in here you will never need to touch. And some of them are kind of obviously where you're going to write your pipeline. So we've got files like here, project, prettier, things like this, which are basically configuration files, which you shouldn't need to touch. Some of these are auto-generated, these files like the schema and JSON, which you shouldn't need to edit manually. And some of these things like lib just have kind of core boilerplate NFCore code, which again, you shouldn't need to touch. In bin, we have scripts used by your pipeline. And in assets, we've got things like via test data seed, data sample sheet. So these have some files you might want to edit. Conf has lots because you're going to have to come in here to change the base.config, which has all the defaults for how things should run. And you've also got the modules.config, which is how you use the DSL2 modules, which Harsher will come on to talk about. And then in documentation, we've got docs. We've got these files you should kind of work on as you go along to make sure that your pipeline's got really top-level documentation. And then this is where all of the DSL2 modules get put into. And then finally, the main meat of your pipeline is within the workflow's directory. And this is kind of pulled in by main.nf here. Main.nf is quite short, but the main workflows end up in here. So that's a very brief outline of the files that we've got created here. Just to kind of prove to you that this is a real functioning pipeline right away before I've done anything, I can try and run it. So if I go up a directory here, I can do a next-flow run, give it the directory path here, which is nf.cordemo, say minus-profile-test. So that uses the... We have a specific test.config file, which is within the profile scope. So if I do minus-profile-test, it pulls all that in, and that has all of the parameters for the pipeline that are required in order to test it, including input data and other things. This is one of the things you all need to build as you build your pipeline. And I'm going to say run it with Docker, which we have on Gitpod. And I'm going to say out there, test run. And so that's going to kick off. And that's going to load this new pipeline and hopefully run it for us. And we can see the file starting to pop up up here. As it goes, it's going to start telling us about how it's run. It's got things like the version of the pipeline. So we've started on version one, because it's a new pipeline. Every time you do a GitHub release, you update this in the pipeline, it'll be reflected here. It shows us some of the variables which you should be familiar with from the past couple of days, things like project deer, where the pipeline is, launch deer, where I'm running it from. And then it shows me which parameters have been changed versus the default for the pipeline. So these have all been set by the minus-profile test, basically, but from out there, which I set myself manually. And then off it goes, you can see it's running some processes down here. It's checked for the sample sheet, which came with the pipeline, which is for the test profile. And it's now running FastQC on a bunch of very small files. Obviously, these processes themselves are not super meaningful. You might be building a pipeline that has nothing to do with genomics, in which case running genomics test doesn't really make any sense. It's just a starting point. Don't overthink too much what's actually being run here. You can remove these processes, these modules. You can remove this test data and place it with something which is more appropriate to you. It's just so that there's something, some holding content there, so you know what's extended. And if we look in the test run folder, we can see the output files being generated. We've got our FastQC results in here. There you go. MultiQC's finished. We've got a MultiQC report. We've got our pipeline reports here. So it's nice HTML files and things. And we can see down here that we've finished. So we're off to a good start. Everything is working so far. And that means if I do another demo later and it breaks, it's my fault. Right. So we have a lot of people within NFCore and a lot of people working on pipeline code. And they might be coming from different backgrounds, different places, have very different tastes in what they're used to in terms of writing code. So how do we maintain kind of quality and standardization across all of these different hundreds of people? We use automation a lot and we use automated code quality checkers. And these basically come in kind of two flavors. The simpler ones are just looking at code formatting. And this is important because we want to reduce the kind of non-meaningful differences, makes it much easier to review pull requests if the only changes there are just the actual code that people have changed rather than those of white space that's been changed all over the place. And it just makes it much easier to read the code. It's one of these things. Once you start using code linters and formatters, it's difficult to work without them. And so we use Black for Python code, which is kind of the most common one for Python. And we use a tool called Prittier. This is the logo down here, which does Jamal, Jason, Markdown and stuff. So just to kind of show this in real life, we can open a Markdown file here. And you can see I've got some Megan and Mark down here. I can start deleting things and removing white space. Maybe I can make a list and just call every bullet point one, two, three, things like this. This is valid Markdown. This will render no problems to HTML. However, the formatting and all the syntax here is not standardized. So what happens is we've got Prittier. I've got it installed as a plugin in my code editor here in VS Code. And hopefully it worked the first time I did it this morning. It didn't work the last time. But hopefully I'm going to hit command S to save a file. And Prittier will run, analyze this file, realize that there are some problems with it and fix them for me automatically without me having to touch anything. So what I'm going to do is save a file. In fact, I'll do it up here so you can see if I'm saved. And there you go. Oops. It's put in the extra white space back around here. And I didn't renumber it. Why is that not working? I'll do something random like that. Okay. So 111 is allowed, but not non-sequential numbers. So now all your lists get automatically fixed. If you've got Markdown tables, they get automatically readjusted or things like this. So it saves you a lot of time when you're writing code. It means you don't have to think about this kind of stuff, whether to use single or double quotes. It just happens automatically. And helps us because it makes it easier to review code. So those are the code linters, the code formatters. A linter is just a tool which checks other code. And then we also have one which is written specifically for NFCore. And this has, this is kind of custom written by us. And it has a whole load of checks which check the code quality to make sure that you haven't done something within your pipeline, which kind of violates any of the NFCore guidelines. You haven't deleted or edited a file which shouldn't be deleted or edited. Things like that. So let's have a look at some of the things that they might do. One of the most obvious ones you'll come across straight away is one that looks for to-do statements. Now, the reason we have these is that when you look at this new pipeline that's being created, it's great that it does so much work for you. But it's also kind of overwhelming, especially if you're not used to looking at pipelines like this. There's lots of files, there's loads of stuff like where do you start? So what we've done to help you get started is if you start looking in a config file, for example, I think I saw one up here. Here you go. You've got these code comments lurking in here that say to-do NFCore. This is highlighted in orange because I've got a plug-in in VS code which is doing it. It's just a regular code comment. Like if I change it like that, it goes back to the normal coloring. But it just kind of flags. We've got little kind of little flags here saying this is something that you should probably do or check or whatever. And you can work your way through all the files in the pipeline looking for these. And when you're happy that this is done and customized, you just delete the comment line. Because we use these to-do comments, you can also use plugins like this to give you an overview of the whole pipeline. And I think it just needs to scan the entire workspace. So it's not showing all of them at the moment. This to-do tree plug-in in VS code will give you a complete list of all the to-do comments. And you can just work your way through them and ticking them all off one at a time. It's quite easy for you to miss these and maybe we've updated the pipeline and you kind of, we've got a new part you should customize as a new to-do comment. So one of the things that we lint for, let me recheck for, is we look at all the files. So I'm going to run NFCore lint on the pipeline here, hit enter, and it's going to look at all my pipeline and try and see if it's up to scratch. It's going to be a little bit slow on the first run because it's fetching some files off the web for DSL2 modules. And it's also collecting all of the next flow config and stuff. So the next time I run it will be a bit quicker. And it spits out all this output. So we've got tests at module and pipeline level. I'm going to ignore these for now. This has been fixed already. So the next release of tools, there won't be any warnings for modules. But for pipelines, you can see I've got a bunch of warnings here and sure enough, these are all the to-do strings I mentioned. So if I had missed any, they'd be super obvious to find via this. I can cheat slightly and don't do this, but I'm going to find all the currencies of to-do and replace them with demo so that I no longer have any NFCore to-do statements. And now if I run NFCore lint again, hopefully all of those warnings will have gone. Hopefully a little bit quicker as well. Yeah, sure enough, no pipeline warnings left. The number of past tests has gone up and the number of warnings has gone down a lot. So good news. So this gives you an idea of how this works. Warnings are kind of tolerated. They're okay. You should try and fix them up before you release your code. But they're kind of okay as you go along. So they're not a deal breaker. Or if you get failures, those are something that needs to be fixed. I can give you an example of that. This is the NFCore code of conduct. So typically for an NFCore pipeline, it's something you shouldn't edit. And so I'm going to break the rules and I'm going to edit this file. Now if I run NFCore linting, it should give me an error. There we go. A couple of things to note here. It tells me what's wrong, which is useful. And it also has this key here. And this name in the pipeline linting is basically a unique identifier. If you're running this locally on your computer, you should be able to alt click this. And it is the hyperlink that you'll be able to open in a web browser to get documentation. It doesn't work so well in Gitpod, unfortunately. But I can try and quickly show you where it goes to anyway. So if I go to NFCore, tools, docs, latest pipeline lint tests. So it would have dropped me straight into this web page. But you can also find it yourself manually if you want. And you can see that this is documentation about specifically this lint test. It tells me all of the files which should be unchanged and these files I can add stuff to the bottom. So when you get your failures, it's pretty quick to go in and try and figure out what's going on and fix them. Importantly, when you have a failure, it also gives you a system exit code of one, which is bad. And this means that we can use these lint tests for automated tests. For continuous integration automated tests. What I mean by that, if I go to a small RNA seed pipeline, I'm going to pick on Alex because you don't mind. I go to a poor request that Alex has created here about adding something in. And if I scroll right down to the bottom, you'll see that there are a bunch of automated tests that run on GitHub using GitHub actions. Some of them have failed. So it's trying to run the pipeline with some test data and that's not work. So it's got a red cross that's failed. And some have succeeded. Some of them have got the screen tick. And sure enough, there's one called NFCore linting at NFCore. It's passed and I can click on details here and see exactly what it has run. Go to NFCore, click run NFCore lint. And you see exactly the same output that we've just been looking at with warnings and things. This was a pass, so there's no failures, which is nice. So this means when you create a poor request for an NFCore pipeline, this automation runs and anyone who's reviewing the code will see a little red X straight away if anything's wrong. And it just makes sure that we don't miss things, which would be maybe easy to miss some big poor requests and things. Also helps you because you know what's wrong before you bother putting in the submission. And hopefully it kind of means that that kind of cycle of fixing things is much, much quicker. Sometimes, last thing on this, sometimes it might be, you might have a good reason why you want to edit this file. Maybe you wanted to delete this file. You're not part of NFCore. You don't care about the code of conduct or whatever else. So some lint test, which is failing, but you don't mind. And thankfully you're able to customize NFCore linting because otherwise this would always fail and you'd just never be able to use this test. But we have this NFCore YAML file, which sits in the root of every directory. And we can put in this config file that we want to ignore certain tests. So I'm just going to copy this from the docs that I might have got on the side. I can say for linting, I want to take the lint test called pipeline to do's. And I'm going to set that to false, which means I just don't want to run that test at all. And then for the pipeline test called files unchanged here, I want to ignore specifically this file. The test changed a little bit on how much you can customize them. I could just write false and skip the entire test. I'm going to skip specifically this file so that it continues checking all the other files. So if I save that, rerun NFCore lint, then everything should be fine. We've got some ignored tests, but we don't have any failures anymore. Great. Hopefully, everyone is still kind of with me at this point. Hopefully, someone would have told me by now if the streaming wasn't working or if you couldn't see my screen. We're going to change that, I would say slightly now. I'm going to move on to my next topic, which is talking about next flow parameters. So just to give you an example of what I'm talking about, if I run next flow, run, and I give it the pipeline name, and I can do minus, minus help, and I can do this with any NFCore pipeline and it treats it. It runs the next pipeline, but it just spits out some usage to help and exit straight away. When I do this, it will tell me about the core command line options that this pipeline has. And what I should have done, and I forget every time I do this, is I should also do minus, minus show hidden prams, which shows the full list, including all the boilerplate ones, which are usually a bit boring, which you probably most users wouldn't want to change. But what you can see in this output here is that we've got a typical pipeline command, and then we've got all these parameters. And if you remember looking at the next flow config file, when you set these things, they're just kind of strings or numbers or whatever. There's no additional content here about the type of that parameter. There's no help text or description or anything. But here on our, on our help output here, we have this kind of information typing, defaults, we've got descriptions and it's even grouped. We've got Booleans, all kinds of things here. So how do we do that? There's no way to do it within the next flow config file, as of yet. So what we did was we set up a new kind of standard within NFCore, which is now sort of spread to all of, all of next flow really. And we created this new file called a next flow schema file. This follows JSON schema, which is a sort of standard structure for JSON files of this type. And it means that we can define our properties, which are our parameters here. So minus minus input should be type string. And then if we scroll down a bit further, if we can find a Boolean, here we go. So IGNamesignors type Boolean. We've got our short text descriptions, and we've got our help text and everything all hidden within this JSON file. And this is what's being used. This is really, really helpful because this additional information means that we can do a whole bunch of stuff. We can print help text like this. We can also render website documentation. So if you go to any NFCore pipeline page, you'll see a tab that says parameters. And this uses the same JSON file to give us kind of nice, rich help text. And we can use it to build kind of graphical interfaces. So you can launch websites, pipelines on the website using this form to input all of these different parameters in a rich way. Evan will talk about next flow tower later, which has a similar kind of infrastructure. So it lets this third party kind of ecosystem of tools use parameters in a more effective way. And really importantly, it also allows us to validate inputs from the end user right away. So if I do next flow run, just like I did before when the whole pipeline ran, but this time I'm going to remove the out there at the end, which is a required parameter. I'm just going to run it without that. And the JSON schema says that the minus minus out there is a required parameter. So right away, next flow exits before it does anything. And this idea of failing fast is really, really useful. Before we had this, your pipeline would either run silently and not do things you wanted it to, or it would run halfway through the pipeline and then crash. So instant validation of parameters is a really helpful thing. Right. Now, some of you, when I showed that schema file, you might have come out in a bit of a light sweat. This is a big JSON file. And we've only just started work on our pipeline. And already we have 280 lines of JSON, very complex, deeply nested, specific keys and everything. It doesn't look nice to edit. And the good news is you should not edit it by hand. If you're looking at this file in a text editor, generally something has gone badly wrong. We have written tooling around NFCore and the NFCore schema. So you never need to really touch this by hand. And we can kind of handle it all through a nice graphical interface. Now, to demo this, I'm going to go into my next flow config file here. And I'm just going to add some new parameters. So I've got one called a demo, which is a string. I've got my nice Boolean here and I've got a number. So nothing really special about this. But of course, they're not in my JSON schema file yet. So I go back into the pipeline directory. I could do minus, minus, minus help. There is a sub-command here called schema, which is dedicated to working with the next flow schemas. And I can do if I do minus, minus, minus help on something went weird with get pod then minus, minus help on schema. You can see here there's also sub-command, sub-subcommands under here. And one that we're really interested in now is called build. So I'm going to do NFCore schema build. Now this spits our all the parameters from next flow pipeline using next flow, and it compares them to the JSON file first and checks if there's any mismatch. So here we've got some new parameters from the next flow pipeline and it said, do we want to add these to a JSON file? Yes. Great. And straight away, that has updated our next flow schema file. By looking at changes here, I should hopefully be able to find the, yeah, there it is. Here, if I scroll down to the bottom, we should see, here are new properties that have been added into the bottom of the file. It doesn't say very much about them yet, but they are there. So we're off to a good start. Next, it says, do I want to launch a web builder to customize them? Yes, I do. Right. This is a bit weird. This is a get pod thing. Normally at this point, the command line tools would kick off a browser window, but get pod doesn't really know what browsers you have installed and things like that. So it tries to be clever and it uses links, which is a command line browser web browser, but we don't want to do that because that would be a nightmare. So I'm going to press Q to exit out of this. Yes, I want to exit. This is what it normally looks like. Thankfully, it gives us a URL here that we can copy out, which might also be useful if you're running on a remote system or something like that. You can open this web URL wherever you can open on your phone if you want to. Right. So I'm going to pop this URL into my web browser here and have a look. So here we have all of the different parameters from the schema in a graphical editing interface. I'm just going to collapse all of these groups and you can see the new ones down here at the bottom. I can make new groups, so I call this demo group, and if I zoom out a little bit, I can also drag and drop these around to reorder them. So I'm going to drag my new group down to the bottom here and then I can pick up parameters and drag them into the group or between groups or whatever. If you have lots of parameters, it's usually quicker to click this magic button over here, which shows you ungroup parameters and you can kind of add them all in one go. And then I can still reorder them and everything. It's been a little bit clever. It's spotted that this should probably be an integer and sure enough you can see this is a numeric field. It's got this wrong though. This has got a default string value of false, but we know it's a Boolean, so I'm going to click Boolean here. I'll zoom back in. So Boolean, and now we can only be true or false. I can also write a description for all of these, a short description, and if I click here, I can write longer form help text, and it even gives me an option of previewing what it will look like on the command line and on the website. So this is a really good way to document all of the parameters in your pipeline rather than writing super, super long markdown files. And I can also set icons, which is great for building graphical interfaces, be it help or launch interfaces. So I can pick a test tube and hand sanitizer, weight training bar, whatever is relevant for your option. So this is how you get that nice, rich formatting of all your parameters, and it's worth spending some time on doing this because it really, really helps people running your pipeline. Importantly, I can tick if it's a required parameter, so the pipeline won't launch if it's not set, and I can also hide it from the default help options and stuff. There's also this button over here on settings, and this lets us do some extra kind of cool stuff. Booleans are kind of boring, so we can't really do anything with those, so there's no special settings. But if I go into integer field, you can see now I can set a minimum and a maximum value, and I can even do a list of enumerated values here, so I can say that this has to be set to either 2, 4, 6, 12, 24, separated by pipe characters. And now launch interfaces will probably show a drop-down box here, and if I try and set anything other than those values when I launch a pipeline, it will fail with a validation error. Strings have even more. If we go in, we can do enumerated values again, but I can also set regular expressions, which is a really flexible and powerful way to sanitize and validate your inputs. So the example and the placeholder here would be to check that this is for a .csv file, so it has to end in .csv. If you're not used to regular expressions, you can get some help about that. This one is very new, so probably most people won't have seen it called Format, and this might be more things in the future, but for now is a way to define whether a parameter should be a file or a directory or either. If I select File Path, it gives me some more fields, and I can say what kind of file it is by its mine type. Again, click here to get a big old list. If it's a .csv file, I can do text.csv. This is a very new feature of File Scheme. I'm not going to talk about that right now, but it's for validation of the files themselves. This is really useful for tools which integrate with NextFlow and NFCore pipelines or NextFlow pipelines, because Evan will touch on this later when he talks about Tower. Tower, for example, uses this to know that this parameter takes a data sheet or a spreadsheet and can give you really nice rich interfaces around that, and you can imagine all kinds of different ways to use this information. Great. Okay. I'm just going to quickly minimize these windows so you can see what's going on. Actually, I can drop that in. There we go. This command that I ran is still running in the terminal here. The whole time it's just been sat there waiting in the background with a little spinner going around. When I'm finished, I can hit Finished over in the web browser here, which is on the NFCore website. This will see that something has happened and will take the schema which is built on the website and save it in my pipeline, which I just think is super cool. Sure enough, if I go into get, sorry, source control and look at NextFlow schema, we should see all of the nice things that we've just been setting in a group. There we go. We've got the type information. We've got our enumerated values. All of this stuff has been saved into that JSON file for us. Really powerful and hopefully very easy to use way to set up customization of the parameters. Again, this works for any NextFlow pipeline. It doesn't have to be NFCore. You can do these with the example scripts you're doing with the past couple of days. You still just do the NFCore schema built. Right. With that, I am going to call it a day. We're running over a little bit, so we're not really going to have much of a break. We'll have a couple of minutes break. I'm going to hand over to Harshal. Harshal is going to talk all about DSL2 modules. They are amazing, powerful things. But I'm finished for the day. Thank you very much for watching. Hopefully, I'll see some of you at the summit next week, be it online or in person. If not that, please do join Slack, become an active member of the community. It's really fun. NFCore is a really powerful ecosystem. Hopefully, I've convinced you of that. I'm going to jump in and see if there's any questions that I can help with now. But otherwise, join us back in a couple of minutes and Harshal will continue talking about DSL2. Thanks very much. Okay. Maybe we should get started. So thank you guys for tuning in today. It's the last session now of what's been crazy a few days for a number of people that have done amazingly well to put all of this together. It's awesome working with a community like this where we have lots going for us and stuff. What I'm going to do is introduce you to NFCore modules. It's essentially our way of compartmentalizing next-flow workflows on NFCore. But before I start doing that, I'd like to introduce myself as Harshal Patel. I'm head of scientific development at Secura Labs. I've been a long-term user of next-flow and also contributing to NFCore from the very early days. It's just absolutely amazing how the framework has evolved and how it's grown over the years and just the willingness for people to learn and get involved with the framework has been phenomenal. For those of you that don't know, next-flow initially started with a DSL1 syntax. The syntax itself was the way that you write these workflows. It was quite monolithic in that you had to write everything in one big main script. So if you wanted to write a complex pipeline and reuse components within the pipeline, you had to physically copy the code in different places. A couple of years ago, well longer than a couple of years ago at one of the community conferences, Paulo decided he wanted to implement the DSL2 version. It took a little bit of time to get it there. And eventually we ended up with a preview of DSL2, which then turned into a stable DSL2 release. So DSL2 in this context is mostly the ability to reuse components of next-flow workflows. So say, for example, you write a process which we would consider as a unit of next-flow that you guys have probably learned over the past few days, the ability to reuse that process within the same workflow and other workflows in sort of an include-type fashion that you do with conventional programming languages is essentially what DSL2 is. And I'll show you, I'll explain some of this to you in a bit more detail. So this is what a typical workflow will look like. This is the NF Core Aranasi pipeline. It's one of the most popular workflows that we have on NF Core. And it's probably also one of the most widely used. And that's partly because Aranasi tends to be one of the most common sequencing protocols that scientists perform to figure out what's different between conditions in terms of gene expression. And so you can see there's a number of steps here where, say, FOSQC, even in this one workflow is being called before and after adapter trimming. And there's other key components of functionality here that potentially could be reused across NF Core. In genomics, there are typically very common steps that are performed in pipelines up until a point where you do alignment, for example. And then after that, you tend to sort of diverge depending on the type of seek that you've performed. So for Aranasi, you will do quantification. For chip seek, you'll call peaks. For ATAC seek, you'll call peaks. For DNA seek, you'll call variants. And so up until a point, there's a lot of common functionality between pipelines. And the idea behind DSL2 and the fact that it became available was very appealing to NF Core in general because we have 60 or 70 pipelines now. And we would love to share functionality across these in a way where we maintain everything in a single location and then these units are essentially installed within pipelines so we can reuse them. And another benefit of that is that you update these modules or these scripts in one location and everyone benefits from that. And so we created this Git repository on GitHub called NF Core Modules, which at this point is just hosting module scripts. So a module in our definition is a unit of next row. It's a single process that performs a particular task. And so, for example, that could be a module for FastQC that does QC on FastQ files. You can also have other modules for tools like, say, SAM tools index, which is a tool and a sub-command, but that does a particular task. Similarly, you can have one for BWA-MEM and so on. And so these modules have now accumulated on NF Core modules over time. We've refined the way that we're using them. We've had some help from Paolo to help us adopt modules on NF Core as well, which has been incredibly useful. And so at this point, NF Core modules is just units. However, we are hoping to now extend this functionality out where we can also automate and have sub-workflows that we can share between pipelines too. And so a sub-workflow in our definition is a chain of modules. So say, for example, you have a simple sub-workflow that sorts a BAM file, indexes it, and then runs a few stack steps on it at the end of it. That sub-workflow can easily be called within the same workflow. So say, for example, you are sorting a BAM file multiple times within your workflow. You can easily just import and include and use that directly there. Or you can use it across workflows as well. I'm pretty sure most genomics workflows will be sorting a BAM file at some point. And so this becomes more powerful now because you're not only sharing individual units, but you're able to now plug chunks of functionality directly into platforms and hopefully automate the way that this all works. A workflow is an end-to-end implementation, and so that would then just be a combination of modules and sub-workflows, like the RNACI pipeline is an end-to-end workflow. When we started discussing and going about figuring out how we will actually implement NF Core modules a few years ago, we had to discuss some of the key requirements because whatever we try to do on NF Core, we try and do it properly, first time round. It just means that it's less effort later, and things are less prone to breakage, although based on the fact that we had to update all of the modules last night and over the past couple of days, as well as release NF Core tools, which in fairness wasn't actually our fault because it just requires some foresight to incorporate sub-workflows on NF Core modules, which we've now done anyway, and so that will be cool next week to work on the hack of them. But since there were some key requirements with NF Core modules, they have to be reproducible. This is key to modules and also science in general. They have to be flexible, so users and developers can add and adjust them according to their needs in their different pipeline context. They need to be portable, which is partly due to Exflow and its awesomeness in terms of running with ease on different platforms. They need it to be standardized as well, so we tend to try and make things as simple as possible, which leads to the last point there in terms of automation as well, because the more you standardize things, the more you can automate things. If you start using custom bits of logic and things get out of shape across repository, then it becomes very difficult to automate updating things. This was actually quite apparent when we had to update all 630 modules a day before this event. We needed to do all of that restructuring in order to now go to the next step and start using sub-workflows on NF Core modules. But if things weren't standardized, then this would have been a lot trickier and a lot more difficult to do. Also documentation is key. It's something that's very close to our hearts on NF Core. We love documentation for everything. It just makes things more transparent, easier to find, and easier to use as well. All of these things came together. The way that we've adopted NF Core modules has evolved over time. We've got better and better with the syntax. It's been rolled out across more pipelines on NF Core as well. We're able to share this functionality. Now we've got 632 modules, 150-odd contributors. It's growing very, very fast. It's amazing because anyone in the next-world community, in fact, can leverage from these modules and just directly install them in their pipelines, as I'll show you in a second. Sub-workflows is something that will be coming soon. It's going to be one of the key topics of the hackathon that we have next Monday to Wednesday. We're really going to try and iron out and get some sub-workflows working on NF Core. Configuration is also quite important. When we went about developing NF Core modules, we wanted to make sure that they were flexible. What this means is, basically, rather than having a module in a pipeline that has hard-coded parameters that is used by one pipeline developer and can only be used by one pipeline developer, we wanted to write and develop modules in a way where you can overwrite any non-mandatory arguments that you would pass to this tool. That then makes it easier for downstream users to then tweak the behavior of the pipeline based on simple configuration that they can feed in on top of the standard configuration provided by the pipeline. This makes things more flexible. It also puts less pressure on pipeline developers to create releases, which can take some time. All of this configuration is generally set in NF Core pipelines in this modules.config file, which sits in the comp folder in the pipeline repository. As Phil took you through NF Core tools, there's a bunch of commands and automation that we've added to deal with a lot of this functionality, not only with pipelines, but also with modules. I've collated a bunch of resources here that may be useful for you. I'm going to dump this slide deck into Slack as soon as I finish this presentation. All of these links work. I'm not going to be able to cover everything regarding NF Core modules and stuff. I'll try and give you a good enough overview as to what you can get from it. But anything else, there's a mammoth talk here about contrary to NF Core modules. Some of the principles may be outdated a little bit, but in general, in terms of testing and adding modules and stuff, the same sort of principles apply. If you want to get in touch, we have Slack. You've obviously found Slack already. We have a modules channel there where we have a modules team and a bunch of other people in the community that are doing exactly what we are doing and everyone else is doing, so feel free to come and talk to us there. We also have a sub-workflows channel as well if you want to come and find us there as well. Thank you in advance to the NF Core community, the NF Core community by containers and my awesome colleagues at Secure Labs. Now, I'm going to jump right into a demo. For those of you that were interesting while I was chuckling earlier, it's basically because, after Phil's comments about how we managed to scrape everything together last night, I was looking for the appropriate GIF and I managed to find this and I'm sharing it with the other guys on helping out during the event. Sorry, Phil. I won't do it again. In terms of what I'm going to show you, I've attempted to write up a lot of this in this document here. It's a follow on what Phil was showing you and so here is a module section right in the bottom here and I'm going to basically be following this through and taking snippets out here and showing you what I'm doing. I've got this in another screen so I'm just going to close this for now and I'm going to open up a GIF put environment in NF Core Tools, cancel that, open the browser and so this should take a second or two to warm up. Now I'm just going to come out of this directory to the directory above and I'm going to create a pipeline so let's see if we've got NF Core installed, which we do and this beautiful output from Rich. Now the first thing I'm going to do is create a pipeline here and Phil kind of showed you how to do that but it's just simply NF Core Create. I'll give it a workflow name, training pipeline, not to customize bits, not... Okay so now we have a pipeline. I'm just going to switch to this folder as well to show you what that looks like so I'm just going to open the folder on this board demo. And so now the top level directory here is just this repository and so oh I'm also going to do now, might as well get the test running. I want to make sure that this pipeline is working and so I'm just going to run a test in this particular directory to make sure that everything is working and so this will take a second and in the meantime what I'll do is I'll explain briefly what the structure here looks like. So you've got a bunch of boilerplate files here that you can use for various purposes. There's a change log, there's in terms of modules, there's three main directories, there's modules here. Modules can be separated into local and NF Core modules so local modules are those that typically you may want to keep close to the pipeline because they're quite niche to the way that you're writing with the pipeline and potentially not amenable to sharing like say for example the modules on NF Core modules which can be used in multiple pipelines. So you've got local modules there to your pipeline and then you also have NF Core modules. So these are actually installed directly from NF Core modules on GitHub and so all of the modules themselves are hosted on GitHub and with the tooling that I'm going to show you we're basically querying the GitHub API and installing these modules directly into the pipeline and so we have one source of truth and NF Core modules and then these are then via the tooling that we've got maintained and managed within the pipeline. Then we have a sub-workflows directory here again we've got local so this sub-workflow may be something specific to this particular pipeline and so we've put it in this local folder and hopefully next week Drums Please will have an NF Core folder here as well where we can install sub-workflows directly from the tooling that we've got on NF Core and so then as I mentioned that will allow you to share chunks and functionality rather than individual components. We then also have a full fledged workflow here so this is like an pipeline where you will be including either modules or sub-workflows directly in this main workflow in order to run the pipeline from end to end and you can also have multiple workflows as well depending on how you set the pipeline up you may in some instances have a workflow for Illumina analysis and another one for NF Core. They're completely distinct but there may be aspects of the same pipeline that allow you to share functionality between the two of those. Great so we've got we've got the tests working let's just check everything looks okay in terms of the linting and so this will go away and check that everything adheres to some standards and other things that we've set in the NF Core Tools package and hopefully we've got all passes which we do because we released this yesterday and so we fixed a lot of this yesterday anyway and so yeah we don't have any any failures here which is great there are loads of warnings here for TodoStreams that Phil probably alluded to in his talk where these are just essentially guidelines for you to come in and change the pipeline template here in places that you may need to and so we put TodoStreams there just to remind you and you don't forget and these come up in the linting. Okay so now let's try and add some stuff to this this main workflow so here first of all a copy of process in that I've put on the website there yeah this is just a process that takes as input a channel with a meta map which you'll see quite a lot on NF Core and I'll briefly explain that in a second as well as paths to some files in this case they're just fast queue files so at the moment I've just copied this process into the main script it's not doing anything there you just it's a bit like defining the function in order to actually use this you need to invoke it in the in the main workflow. So let's first see what the contents of of this channel looks like so here I'm just going to put the magic resume at the end now so it will only rerun tasks that have been set not to be cached and so the meta map is is a bit like a or this is running a meta map is a bit like a python dictionary it's it's it's something we've adopted across NF Core and it's become very important with the way that we are sharing modules on NF Core because you can imagine when you pass data around a workflow you may want to associate that particular data with metadata like say for example in RNA seed pipeline we need to pass metadata like strandedness through the entire pipeline but that metadata will be different for a chip seed pipeline where you may need antibody information or it may be different for a very important pipeline where you might need patient information and so this meta map just gives you that flexibility where you can have an arbitrary python dictionary type structure or object that you can pass around the workflow and you can add and change the entries in this depending on your use case so here in this case this meta map is just a couple of key value pairs where you have an ID and it tells you whether this particular sample is single end or not and so in this case it's not single end which is why we've got two sets of reads here it is single end and so you've only got one one read in this case um so if we want to share modules we need to have a flexible way to pass this sort of metadata around different pipelines if we want to reuse them in different pipeline contexts and that's where this meta map comes in if you hear it um in the NF Core NF Core context then that's that's exactly what it is it's just a dictionary to pass around sample information essentially and so that's what that looks like um and based on what the output of this input check uh channel looks like you can see that we've configured this correctly because we've also now taken as input a meta map and some reads so now all we need to do um is is invoke that echo reads process within the main workflow so here i'm just pasting that here and so now what should happen is um this process will take as input a channel with two elements the meta map and the reads and it will just echo um the names of those particular read files so there you go it's just printed the names of the files and it's working exactly suspected now if you want to use a module um the problem with having um the process started in the main script here is that we can't really reuse it um but you can if you put it in a module script so so here and we can't really reuse and share it properly um and so what we can do is we can go to this local folder this is custom module we want to add we'll add a new file here and we'll call it echo read stop NF so now all we do is cut this out of the main script paste it in here and we need to now tell next flow where to find it because we've moved it out of the main script so we paste this include statement here and we are so now instead of calling that script from the main script uh that process from the main script we're now calling it from a separate module and then you can share this module and do whatever you want with it and afterwards it then becomes sort of a uh as i mentioned a unit of functionality so there you go it's done exactly the same thing but now that process sits somewhere else what you can also do is involve the same process more than once and now i'm using exactly the same process but i'm calling it twice in the same workflow and so here i've just i'm including it as these names and so i can come in here and just say once for those of you that's still awake this should print the reads twice so i guess this highlights to you how powerful it is to to be able to use these modules in different contexts this was just a simple example but there will be there's almost definitely more complex examples which require configuration and lots of other things that need to be taken account with especially when you're running standard bioinformatics tools and having modules that are shareable in this way makes makes it very nice in terms of the fact that someone can update the modules or you have a central location where you can update these modules and reuse them directly within fact lines you don't have to keep on you know updating the logic you may even get others that come and contribute and update the version of the module the way it runs and that that was breaking before there was a bug and so it's advantageous not only for nf4 modules but also an excellent pipelines in general because it allows you to write structured workflows in your own setting as well so there's a number of other a number of other really poor commands that we've got in nf4 modules as well so if i do nf4 modules list local i already showed you that there's these three nf4 modules that we've installed in the pipeline so this will go away and tell me exactly which modules i've got installed from nf4 modules where they were installed from so you can even install modules from forks of your repository where did that functionality because there are a number of of members of the community that wanted to maintain their own forks of nf4 modules and installed directly from there because they just wanted to have more control of all what they were doing that's completely fine and so you can also install directly from your own fork as well what we're tracking here for version control for these modules is is the git hash that this module was installed from and so that that in fact is is tracked within this modules.json file and much like the parameter schema that Phil showed you earlier you shouldn't have to edit this file this file is simply here as a record to show where you've installed all of these modules from and at which version you've installed them nf4 tools is what uses this to update modules to install modules and remove modules it will directly interact and change this json file as and when it needs to do that you shouldn't need to update this manually so that's the modules and where they've come from so that's the modules list local command let's now try and update all of the modules in the pipeline which is another command so this is really nice um back away when when I was sort of developing prototypes for DSL2 with RNAC and stuff I had to manually update all of these modules by hand um and it was painful this this particular for me this particular command was a revelation because when you have 20 or 30 modules and you want to update them all at once this this sort of does exactly that and so you can update them you can view the diffs as well between what what is in your local repository compared to the remote to see whether you want to accept the changes and also just to give you an idea as to what has changed um or you can just brute force update everything so here um it's just telling us that everything is up to date but we know that already because we just released NFCOR tools yesterday and everything had to have been up to date at that point um but for more of a realistic example whenever we release NFCOR tools every pipeline gets a an automated sync pull request like this um which means that you need to merge in the template updates anything that's changed in NFCOR tools in the pipeline template um then needs to be merged in to your pipeline as well and in an automated way it's really nice because it means we can send this out to 60 or 70 pipelines to make sure that they're all up to date um but when we also released yesterday we restructured um the way that we were using modules and so that was a big change on the modules repository and the way that the modules are also installed in the pipeline itself as well uh and so I literally just had to run one command to update all of the modules um that are currently in the pipeline and the modules JSON itself was updated automatically you can see here I didn't have to do anything so when you start looking at production workflows and doing things at scale um it really makes sense to have some sort of automation in place to do this type of stuff okay so that's the update command you can even list modules that you have in the remote repository so in this case it's NFCOR modules what sits there and get up this may not be very useful because there's 600 modules um there that you don't want to be scrolling through but we do have on the website if you go to the website and just slash modules um you can search for all of this directly on the website here it will take you to the right modules and you could search based on tags and all sorts of other information as well so let's try and install a module now um we have to bear in mind that we're still working in the pipeline context here so um let's try and install a module in the pipeline oops NFCOR modules install let's install fast p which is a very popular trimmer in the genomic space for adapters and all sorts of other stuff um and so now what we've done um again with a very simple command is we've installed this fast p module this module sits on github in the NFCOR modules repository NFCOR tools will create the github api fix the module install the pipeline install the module wherever it's supposed to be installed within the pipeline and so here you can now see that we've got this fast p module that's been installed in the pipeline this has probably gone through various iterations of updates from various members in the community bug fixes add in optional outputs and really extending the functionality of this this even just this one module there's a meta yaml associated with it which i'll show you later when you create modules and that you fill in to add information about this particular module the documentation licensing input and output information and author contributions as well and now let's try and wire this module into our pipeline so here you can see it's got exactly the same structure that we're using for for the echo reads process where you've just got a meta map um and some reads and so what i'm going to do is try and get fast p running here with exactly the same um with exactly the same method i did for echo read so what nfportool also does it prints a nice um it prints a nice include statement that you can copy directly from the terminal to include this module in the pipeline but we still need to invoke it within the pipeline here as well and so what we'll do is copy that from the website and put this in and so here you can see we're just passing exactly the same channels which again is another strength of dsl2 because you can reuse these these channels aren't exhausted which they were with dsl1 and so in the fast p module there's a couple of boolean inputs as channels here which i'm determining whether you want to save trim fast q files or merged ones but we're not interested in that at the moment so i'll just set those to false and then we can try and just run the pipeline to see if it works you can also get module information directly from the command line so i'm just going to come out of here after a second and the new terminal there's a command called nfport modules info and that will tell you pretty much all of the information that's in that meta yellow it will go away pass it and tell you more information about the inputs outputs and even how you how you can install this particular module in your pipeline so there we go fast p is now running it's been wired into the pipeline and we're still echoing those those are the reads as well say for example you want to make a change to this this module and you don't necessarily want to contribute this back to nfport modules you want to use the bulk of the functionality that comes with this module but you don't want to create a pull request and update nfport modules we're obviously always very welcoming of updating these modules if they're of general benefit but it may turn out that you want to do something custom to this module and so here let's just add in a random echo to say that these are my changes and save it now let's try and run nfport modules limit fast p and what should happen and what does happen is that the linting fails now because what nfport tools has done is detected that you have made a change in this module that isn't isn't the same as what's in the remote now given the fact that we're doing versioning and we're tracking the versions in this modules.json it's very important that whenever you install a module it's installed at a particular version and you're able to track any changes that happen otherwise it breaks that that whole reproducibility there's no point in you using that versioning and so the the main reason behind behind doing that obviously is because you need to you need to know what version of the module that you've installed and be able to track that especially say for example when you want to update the module you need to know exactly what what hash you have and where you want to update it to so there's a really cool tool that that we've also got in this package that allows you to patch this particular module so now you can see the linting is failing because the local module that you've now changed doesn't match what's on github so what you can do is I'm off to the modules patch fast p and what that will do is it will create a diff between what's on the remote and what you have locally and it dumps this fast p.diff file with the module right next to it here to identify what the diff is and what also happens is in the modules.json a patch line is added here that tracks this diff and so now when the module gets updated this will automatically be applied for you if we try and rerun the linting so now let's try and rerun to rerun the linting for all modules in fact also which is quite nice then we don't have any lint failures anymore and that's because we've we've accounted for the fact that we've changed this module locally okay so that's patching modules which n is very powerful if you want to use module in your own specific way and a lot of people have actually started using that that particular command and which is which is awesome but now let's try and create a module nf core modules create sorry I forgot to increase the sizes I just realized now looking at it oops let me get smaller so nf core modules create and you have to remember that we're now running this command in the pipeline context and the way that nf core tools knows this is that we have this nf.nf core yaml file hidden file that tells it what type of repository we're running this particular command in and so the fact that we're running this in the pipeline context what we can do is we can create a module here give it a name we want to use a different bioconda package name I don't know so what this is doing is if the first part of the name of this tool matches something that's on bioconda when nf core tools creates this package it will automatically populate the container fields for you because it will go away and say okay you're using a module called samtool slash index it will go away look for samtools on biocontainers and it will then populate the container definitions automatically for you within the module and the api was broken yesterday so I'm not going to risk it today on the demo but in typically that's that's what it will do it will go away so you don't have to go away and look for any containers anyway it will go and automatically populate and the container is there for you so for now I'm just going to select null I'm going to give myself a name I'm going to say that I want to by default give this process a label or this module a label of process single and the way that where this comes in is that every nf core pipeline has a base config file and in this file we set some sensible defaults or groups for groups of resources and so for here process single is just a single cpu and a bit of memory in time and then process low is a bit higher process medium is even higher process high is double of process medium and then you have long and so on where this is nice is that you can you can then create modules with with standard groups of labels and so if we look at the we look at the fast p module you'll see that it's using process medium and that's potentially because with fast p it can be multi-threaded and so you give it you know a decent amount of multi-threading for it to work in that context similarly you may have another one that only requires a single cpu and so there you would just give it label process single and so this allows us to standardize the labels that we're using in nf core modules as well of my window too small okay and then this asks whether you want to add the metamap information to this particular module not all modules will require passing around a metamap so some modules will won't be dependent on sample information so you don't need to pass that metamap as input to that particular module and in those cases you can just select yes or no so here i'm going to select yes and what you notice is that nf core tools has now created this this module directly in my local folder for me to then go and finesse in my own time so here i've got the demo modules.nf if we look at this particular module you see there's loads of to-do statements that mainly there is a guide to help you figure out whether you've done everything or updated the module in the right places a lot of it can potentially be deleted when you get more confident with the whole process and so it's a bit like the to-do statements that we have in the pipeline template they're there to so people don't come in and look at this and just get confused by what's here essentially and so yeah that's what this looks like and the reason that when you run nf core modules create in the pipeline context the reason it's created the local module is because this module is something potentially specific to your pipeline itself and it allows you to automate the creation of that directly in the pipeline you can then amend and change the functionality and so on but still using the same structure that we're using on nf core modules so what i'm going to do now is show you how this looks different if i were to run exactly the same command in a clone of the nf core modules repository so if i come out of the pipeline repository now and i clone the nf core modules repository from github and go into that directory and what i might do actually is change the folder all we are in there top level is i don't know why it does that one-off thing yeah so now we're in we're in the top level folder of nf core modules again we've got nf core yaml file that tells nf core tools that now instead of a pipeline repository we're running in a modules repository and so the nf core tools commands will have to be adjusted based on the context that we're running this this particular part i mean before we had a bunch of auto detection for this to look for particular scripts and stuff which is a bit fragile and so in this case what we decided to do was just to have something a bit more explicit that we can say yes this is a modules repository or this is a pipeline repository and so now if we check out a branch because we don't want to edit stuff on master we see all of the same things and now that's when nf core modules create so if you remember in the pipeline context we only created one local module within the pipeline itself when you run it in on nf core modules what you so the prompts are the same but now we've created more files and the reason that is that typically whenever someone comes in and clones nf core modules and wants to work with that or even a fork of their own they typically want to contribute these modules back to some sort of source repository and add tests and do that type of stuff and so here we've got two files that have been added in this modules folder which we can have a look at here and so this is the main script it looks exactly the same as when we added it to the pipeline we have a template and modules template in nf core tools that we use to install this directly into the pipeline we have a meta yaml here that you need to fill out with the information about the module so a number of downstream tools like nf core modules info for example plaz this file and dump that to the cli this sort of information is also used on the website for searching and that type of stuff with the tags and keywords that you use and so this is quite a valuable pipeline so anyone that is contributing to nf core modules has to fill out a certain bit of information about the module as well to just so that it makes it easier overall for everyone and and standardize things in general and so yeah once you're comfortable again you'd go about developing this pipeline but we also have a number of files that we've created in this tests directory and so let's have a look at that tests modules demo so we've created these three files here this is a template script that you would use to test this particular module now this is very important because any module that gets added to nf core modules has to have unit tests we need to make sure it's working there's you know as I showed you there's 150 different contributors if someone comes in and updates a module we need to make sure that it's always working so whenever you create a core request to nf core modules there's a number of ci tests that get triggered for both for all of the main containerization so for docker singularity and condom this module is running all of those different settings to make sure that it's working before that core request is merged there's also some linting that goes on on the core request to make sure that then you know there's certain things like prettier and other than black that are happy before you merge this module in and so that just keeps allows us to keep a really nice tiny code base as well in general and so when you're happy you can add tests for this particular module here I'm just linking in some test files we've created loads of files a bit of time so maybe I'll show you we've created loads of files on this nf core test data sets repository here where we've collated and tried to standardize the way we organize this data as well so here we've got a genomics and the majority of the modules and nf core modules will be using test data from SARS-CoV-2 because we've had heaps of that around recently as I'm sure some of you may have been affected by and so here we've got some genome files we've got Illumina SARS-CoV-2 data and nanopore and also metagenomic data so you'll have various types of different files and over time we've built up this resource which is a really valuable resource for minimal test data which in itself is really valuable it's very hard to generate minimal test data like this that works or you can use for testing and in fact we were at some point talking to other communities like the snake mate community and potentially Galaxy also about sharing and sharing this maybe something for the future and so what this module is doing is just running a simple workflow it's running this that is importing this demo module and then it's running running that demo module with the test data that we we provide it this next row config file is it's just a simple file here it's mainly there to to get a sensible publishing directory out when you run this module you can also update and amend this next row config file if you need to use say process selectors to have specific functionality when you're testing this module for different outputs or different arguments and that sort of stuff you can customize this as well and won't go into much detail there but it's it's possible and one of the key files I think when you want to add a module is this test.yaml now for all of the unit testing we do we use this this tool called pytest workflow it's a Python tool that allows us to use tags like this like demo or demo module and also the command that you want to run so in this case it's just saying I want to run tests modules and of course demo the demo module I want to use this entry point you can have various different entry points so you can even have at this moment it's just one entry point for test demo module there but you can have oops you can have another entry point for test demo module under two and then in that case you would just append an additional section here for that particular entry point and for each entry point you may have different output files and those output files may look different depending on how you run that module so if we just have one entry point for argument state say it generates a test bound file we now can look at doing md5 sum checks on this bound file to make sure it's exactly the same when you update it over time and we can also do a bunch of other checks that looking for file contents and also making sure that files are present in a directory and so on and so this is really powerful for unit testing PyTest workflow is really powerful there is also another tool that is becoming quite notorious in the next low groovy type space called NF test and in the future potentially we may be looking at assessing that it's more native to next low and groovy and it looks promising in terms of the functionality it can offer so we'll look into that as well and so that's it when you add a module in NFC modules there's five files basically you have to play around with to get the test passing for your specific context you give the modular name you you put the appropriate input and output channels in the script section as you would any process and then you add the appropriate tests based on the test data that you use I've in the resources that I sent you there's some really extensive talks about this and how to go about that entire process and maybe you can follow up on that later on down the line but just to show you how easy it is to run the tests in a very portable way say for example we want to run the tests for sam tools index and we want to make sure that it's running so locally all you really have to do I mean in my case I just set a temp there to my home directory I say I want to use docker I'm using pytest these are standard pytest options and the only thing really that you need to change is this module name here so I want to test that samtools index is working and what will happen in the background is that pytest will go away based on the tags that I've specified here it will go and find the samtools index module which index and it will run through all of these different workflows that are specified here and make sure that the files that are generated by all of these different workflows match what has been specified here so you can see it's just going through all of these different workflows it will check the md5 sums of the output files and eventually come back with an overall result whether this is passed or failed there we go everything's passed this module is working if I change it I rerun the tests these tests will also be run on the pull request when you create them to nfb modules if I want to run it for a different module I just change the tags and so this makes it very easy for everyone in general to test when they're contributing to modules we wanted something with a low bar of entry that anyone could use and this kind of fits the bill a lot of the automation we have when we use nfb modules create creates all of the necessary files for you in all of the correct places all you really have to do is then substitute whatever you need to customize an agile module there with hopefully a minimal overhead rather than doing everything in scratch which again goes back to how automation is key for a lot of this and very important so hopefully that gives you an idea as to as to how a good overview in fact of how we're using nfb modules on nfcore to standardize a lot of the things that we're doing that not only with nfcore pipelines but also with nexflow in general as well and yes thank you for tuning in and and giving me your undivided attention I'm sure you didn't look away and throughout this thrilling talk so I look forward to to meet you some of you guys in person next week and also interacting with you in the nfcore community as well and if there's any questions that you guys have been feel free to feel free to ask us we're very willing to help thank you I think we've got a break now for 10 minutes so we're going to come back at eight o'clock I believe for the Evan's talk no more so there's a break now thanks again bye bye okay welcome back everybody so as a quick reminder so my name is Evan I'm the CEO co-founder of secure labs and I'm going to be doing the final session today and it's going to be focused on on tower in nf tower just a quick reminder of where you can find this information so we're looking at the training material we're going to be up to section 12 and there's a lot of stuff which is actually not covered directly in here but it's going to be a bit more of a demo and it's going to be a kind of a background as well on on tower itself and hopefully it makes a little more sense now we've seen the development of the nfcore pipelines and the like and we can we can see a little bit more into that so a sort of background on on tower itself after developing next flow for for several years we realized that there was a kind of a key problem and a key key kind of thing so trying problem that we're trying to solve here which is around the next floor is fantastic if you want to use it by yourself or in a single application maybe from the command line etc but when you want to be able to take your pipelines maybe scale them out with across routine or share them make them available to the other users call them as a service have a history of all of your uh day terms sort of have a history of all your execution it becomes very difficult to do that from a command line tool and tower is really the kind of solution uh to be able to do that so what i'm going to do first is is just going to show you the basics of it you can find all of this at tower.nf so just just log out here so if you go to um either next search for next floor tower or tower.nf you'll find this and the way tower is set up it's supposed to be kind of flexible a little bit like next floor in the sense that you can install it in any location but this is the hosted version here and when you run the pipelines when you essentially do the execution from within tower that is separate from where towers install so it's got this full portability like next floor in the sense that you can run things in different locations here this is the hosted version i'm going to call tower cloud you can come on and try this if you sign in here you can try them with your github credentials or your google credentials and when you sign in you the first thing you'll see here is this a collection of of pipelines when we started this the first thing that we were trying to do is obviously this monitoring and and also the kind of having the history and basically a database of your of your analysis the i'll kind of go through the evolution here of of how of how this works when we started we wanted to be able to actually take things from the command line and have a way to monitor them so what we can do here and if you follow the training script here is we could simply take a an extra pipeline that we've been running before like script number seven and all we need to do is place on the end of it with tower and by doing this then when we run this pipeline we can then monitor that inside the user interface here so you can see for example i'm just going to go to my my personal workspace the way this is set up is that you've got this concept of organizations and slash workspaces here you can see this community is a showcase you also have your own personal one when you sign into here so if you want to follow along here you can come and sign in you'll have a personal workspace and the way that you can do this with with tower in terms of using the with next flow option from the command line is you put a token and you essentially use that token to to link it i'm just going to go here and launch that pipeline so when i write scripts seven with tower i can then go to the runs page here and the run as it comes as it's basically going to start up here it's going to come here it's going to be visible for me to to see i can monitor the status of it from here but i can also jump into it and see some information so here you can see script number seven it started running here and it's the exact same thing you can see you can see here i can jump in and monitor that live here it's very minimal in the sense that this is this pipeline just runs for a few seconds but i can follow the status of those tasks i've got a history of the command line the parameters configuration etc here you can see some basic information that's available to us you can follow the status of those tasks or the processes we've got some basic aggregate statistics for time cpu memory etc and what's really nice is we've got a task table here so we can quickly go through and find a particular task maybe we want to define the salmon sample where we ran for example on lung they can select this here and find this here and then do this i point out when i'm running this this is from the original um get pod that we were using the one that you can find in the training notes here i'm to do that if you want to try this now with your own get pod environment the only thing you need to do here is create a token which you just do once inside of tower and then you take that token and you export it to your environment here and that's good to go then you can use with tower just to show you from tower's perspective here you can go into the to the user icon here and you can look for your tokens and you can create one there um and make that available uh for them so that's the kind of the i guess the basic point that we started with with regards to tower in terms of its monitoring over time though it became kind of clear um that there was there was a sort of much more unique that if you wanted to not only monitor but you also wanted to to launch those pipelines with tower you know situation where you're not so dependent on uh just you know keeping this environment open so here i'm running from a get pod environment which means essence could be pulled away but i could also be running from my laptop in which case i would need to keep this open and there's a whole bunch of things that has to take place here so that's that next flow is sending the data back to tower but we're kind of essentially responsible for next flow running here what we could do and what we wanted to think about was actually removing this itself so that you could think about next flow basically calling next flow as a service and the way that we can do that is to actually launch the pipeline itself from within tower so i'm going to jump into your head sort of different uh workspace here here i can essentially build the next flow command line to a certain extent i can kind of launch a pipeline um and and go through from that stage so i'm going to do a quick launch here here i want to point out that i'm showing you all of this from the kit from the user interface but the same thing exact same thing can be done uh from from api and also from from the tower cli which i'll show you in a moment here you can see that i'm defining essentially the pipeline i wish to launch i can put the name here so i can provide with the name i wish i could add some labels so maybe i want to associate this with a particular project and some RNA seed data that i want to run here then the next thing that we can choose is a compute environment i will explain this later on but it's essentially a way that you decide both when next flow runs and also where the computers uh the compute of the tasks we're going there and here i can put a git repository so i want to continue the example that we had let's just say mix flow RNA seek and i want to find the nf example here here you just copy the git repository this could be public private hosted uh whatever you've got there and then you can paste it into here and you'll notice that this blue little circle on the right hand side this is it going to github it's going to look up the branches to commits the tags etc they call that information available and from there we can see the particular revisions that we've got so we can say i want to run a particular version all of that corresponds to the releases that we have here or the branches that are available here so again providing us a kind of a ui into uh into be able to do that let's say that we wanted to run particular version 2.1 and you'll notice here that now we have a working directory what this means is this is the essentially the the location of our of our work directory itself with the intermediate files will go and this computer environment which i've set up here is in e us 3 which is in paris and that means i created a bucket here in paris for me to do that the final thing that we can see here with regards to setting up the pipeline is these config profiles and if you remember from yesterday the next configuration in the config you can create a concept of profiles which is a way to group together config and here this is basically going to pass the next flow config file so if we look in the next flow config file of this repository this is essentially passed for us so we know which profiles are available to us for that repository so we've got docus learn batch etc and then when i go to launch the so when i go to launch the pipeline it's going to show me all of those and i can i can choose what i need important to point out that given the fact we're running here in aws batch and like we don't need to tell the explore here a particular profile the profile is actually the information for the configuration of that execution it's going to be handled by tower itself okay so in the first part then we've got we're going to we've got this information that we wish to put in here so we've got our essentially what pipeline we're going to run we were going to run it the basic things and we can simply select kind of launch here what's taking place in that case then is the pipeline itself then essentially the next flow run command essentially this this command here is being submitted in this case to to aws batch this is a slightly different approach from before we're here i'm running next flow in this machine in this case i'm submitting the next flow run job to a to some sort of compute it can be a cluster it can be cloud it can be kubernetes it can be some essentially anywhere but from there the next load job next load run essentially command is going to be is going to be submitted to the individual compute as well so let's have a look a little bit what that looks like in practice i've got i've got this community showcase here you can go into here and start trying things out i would recommend if you want to log into here to try and get a feeling for this is just to run some pipelines all of those are available there there's some free compute to go give things a give things a go and important to note out this is just essentially running into compute which we are which we're providing here you can connect in your own one in this this kind of cost associated with that if i select here this pipeline that i triggered a little while ago you can see that here we're running now the nf core rnc pipeline you can see we've got the version that we're running here we've got parameters configuration data sets i'll mention in a moment this is a way of linking in essential specifics particularly sample sheets that we're going to be using for input we've got an execution log and we've got some reports shall mention them scrolling down you can see we can monitor the status of those tasks so you see i'm on the final task here i think this is probably multi qc which is running i can follow the status of those i can even see inside of the process block so you can see there's quite a lot of processes in the multi qc in the rnc pipeline here we've got aggregate statistics on cpu time memory read write and cost as well so there's actually a database on the back end which is looking at all of the costs associated with associated with the each cloud provider in each region and each instance type so that we can basically say okay this task itself is using half that virtual machine and that virtual machine costs x dollars per hour and we can or extends per hour and we can calculate all of that up there's also information on cause so we can see we're using two cores at the moment out of a total of 48 or the maximum we've been using this 48 we're using one task and the maximum we're using is 24 and we can see some memory efficiency and cpu efficiency when you launch in tower you get a lot more information because there's more information that we can send back what we're able to do is for example jump into the task table and say show me wild type or get to sample and I want to find out what happens when we run star align I can jump into that task so as well as the information that we get from the script section which is essentially the script section of the extra pipeline evaluated here we can jump into the execution log and we can see the actual log for that task and this information is typically quite difficult to get because it would be essentially containerized right off that task and you have to go find it maybe in some bucket difficult to get out but here we can also access those those logs themselves the task remember is a containerized task so each of these tasks in this case it's 200 but it could be 10 000 are running in themselves in a container so this task ran inside the star container it's star align after all it ran on this queue it requested two cpu's six gigabytes of memory and and three hours on to the compute we were using the aws batch executor it got placed on the r4 large machine type in the us one using spot and therefore we get some kind of estimated cost of of that in itself we can then compare the resources that we requested with the resources that we actually used and from that we can start to to can start to optimize that so when a pipeline is complete we can kind of visualize that over over the pipeline and let's go up to the top here you can see this one's not quite can ah it just finished us now so that's fantastic timing so when a pipeline is complete here what we can do is we can start to see a little bit more information for example on the reports this is a way to define the output of a pipeline that you may wish to to look at for example html report we can see here the multi qc which is coming from many of the pipelines that we can create here you can see this information access that we can also generate information from pdfs or images um structured data as well here's a there's a quick image of what's coming from now comparing our samples across other types here we've got the salmon sample for example here's some structured data that we have uh for for all of this and this makes it very easy to kind of jump into those things i want to draw your attention to this section here on the the path here and tower is not like a sass tool here where you're kind of running your things you're uploading your data into the environments and then you're running it's much more of like a transparent view on your own compute so even if you run even if you're using the cloud version of tower this is still like your path this is your bucket that the compute is taking place in your environment and then you're able to kind of provide this kind of this view into that information and that's a very important distinction here in terms of the way this runs and it should become a lot clearer when we look at the compute environment itself otherwise we can consider that the if we also want to look at the resources that we've used as part of this execution you can see that we've got cpu for example across those processes as well as memory and we can start to optimize that now based on uh based on the previous ones they can show you show you some of that work in in in moments as well so that was a kind of let's say a basic launch that we had done where we created the pipeline itself where maybe we were adding that what happens if you want to do similar complex pipelines where you want to make them available for people maybe it's a pipeline that you're running very often during production and you want to make that available for people well before you would have seen that there was in this part of this the this the section there on in of core they're looking at the next flow schema and the next flow schema when you place that inside of your repository you're essentially able to get a ui for free um and it's been part of tower so for example when i go into if i want to go launch a pipeline here say the r and ac pipeline here this whole form the fact that i provided a schema or the correct there is a schema itself and it's part of the repository means that i get this whole ui for free so here you can see that there's input there's out there is et cetera and this this this icon this text this help message example can all be defined as part of that schema there's also a key part of that when you want to define uh for example if you specify it's a particular mine type and and the data entry point i can say here for example r and ac i can find that particular data set and from that data set here i can simply launch that pipeline makes it very easy to manage the data here and kick that off and launch obviously i've got booleans i've got other sort of points here one thing another thing i wanted to point out with regards to that data set so here you can see inside of the showcase we've we've got some some sample data sets mostly for just test data this the structure of these is typically around the concept of sample sheets where you've got csv tsv text like vials and those vials have typically got some kind of columns to them which we can reference here you can see we've got a sample id in the first column we've got the locations of those vials and this could really be anything next flow can read from many many different vial types itself so you don't have any issue with that and the credentials are essentially part of the compute environment so accessing these things in this case it's public but it could be could be private and you've got this concept of the metadata itself so this could be essentially a way of telling how to how to process the sample sheet you can add these via api but one thing that we're in the kind of direction that we're thinking about taking this in is it's the kind of soft concept so expanding a little bit of the work that that feels previously done with the sra explorer here where you could for example search in this case a public data set so i could search for example human liver micro RNA in the sra i could find the samples that are maybe interesting to me maybe i wanted to compare these particular samples as long as as well as these ones etc i can then create a sample sheet i can essentially create these data sets themselves i can download the metadata for these in this case in a tsc file and then i can go to go to tower itself and then upload them so by uploading the drain you know dropping from api you can see if we set the first row as a header here now i've got some sample id's i've got some information i got i know how these were sequenced so i can process them differently etc and i've also got the locations of either an sra or in this case an ftp xlur can handle all of that and that could then be the entry point for my uh for my pipeline essentially my sample sheet and we kind of think it's still expanding out this uh you know so beyond sra and then public data sets as well so a little bit of a side and kind of a pointer to where we're going for the future um with that the other thing i mentioned before is that when you when you kind of get this UI for free we also saw the fact that we could create these reports and when you when you're defining your pipeline here it's part of that you can specify so i'm just going to show you an example of one what those reports are going to be so if i go into here there's a concept of one of the more advanced options here is what we call a tower configuration file and inside that tower configuration file we can basically say exactly what we want to report on so we can say okay we want some reports we want one that's a multi qc report and we can give it this name we want to find uh for example the the d-sector two plots which we want to call d-sector of course um as well as like any any files that we want like so by specifying this that makes them available for people when they see them um when they're when they're running from there as well so that kind of provides um the kind of the kind of basics of that if so we've now kind of seen how you want to add your own pipeline um how you can make those available when you're going to launch them kind of what happens a little bit but the real magic of of the system is is the concept of how it can connect into to any compute and to reiterate that i'm going to jump into the other workspace that i was in before here where i've got a few more compute environments set up and i'm going to set up here i'm going to say i want to launch a pipeline let's just say that we want to uh launch this pipeline here and i'm going to run ATAC seek here and i'm going to run a particular version of it um let's say 1.2.2 and in this case here i'm running an e us 3 so this is in in paris so i've got my paris bucket etc all the same as before but now maybe imagine that my data is somewhere else and maybe maybe my data is in a different aws account or maybe it's a different region maybe it's in a different cloud provider i can simply switch this out and essentially say i don't want to run an aws now now actually i want to run an azure on east us switching this out you can see that everything can stay the same let's you run the same next load what happens there here is that we end up with the the working directory will be different and we can now reference for example the input files could be in this location yeah or we could do the same thing and run in google life sciences or the same thing with slurm etc and each time we're essentially connecting in we're actually submitting the next flow job into that environment and all of the things that kind of come along come along with that that is driven from this concept of compute environments and these compute environments can be can be set up so here you can see the examples of the one i've set up for slurm or azure or aws google life science as well and you can create any of these here so by creating a compute environment i'm going to show you one for the cloud providers one i'm going to show you one for uh also it's kind of a setup here for some of the other ones that we have here you can see there's this batch azure batch or amazon batch azure batch this is actually just very recent is the google cloud batch so that's that's coming out of beats as soon and then there is the the schedulers here in kubernetes if we exist to select any of these so i'm going to select amazon batch you can see the the basic setup for this is we've got uh the setup for credentials from that it's going to basically use the api and determine what we need so i'm going to create a compute environment now in e us one it's going to show me my work directories associated with those credentials let's say i choose this one here and ideally just i would i would always select a bucket which is in the same region um and i would also if you're going to set this up is is set up a bucket um which basically a lifecycle policy so this is typically temporary files maybe cash files for one day one week maybe one month and at the end of that you just have you just have a policy which is kind of deleting all of that stuff as it goes with regards to batch now when i can set this up i've got a couple of options i've got this forge mode which is going to create all of the resources for me so i'm going to say i want 200 cpu's max um in my queue or you can do 1000 whatever you want uh here and then we've got a couple of options ebus order scaling future mounts etc um and finally we can kind of choose uh sort of information around how we want to do this so we can have for example efs fsx we've also got something here in here which is a resource labels this is a way to add essentially a resource tag into the compute environment so this will maybe propagated through to the aws resources and you can start to look at some of the the cost explorer for example and know the exact cost that you have associated with this as well so that's the kind of basic cloud cloud set up here you can go pretty deep into that as well typically it just needs to be set up once um there's a lot of documentation for how to do that um it's very similar for the other platforms here um there actually maybe even slightly less options to do so as your batch is it's pretty simple as well here you just choose now the vm types the vm counts etc with regards to uh google uh google cloud batch uh it's even it's even more simple you choose a location uh a working directory here and you've got a couple of options for this but it's uh it's uh it's pretty simple in terms of it set up as well thinking about now the schedulers so this is typically people who have got an on-prem slurm cluster or grid engine cluster or lsf cluster in this case you've got two options with tower can be running in this case right running in the cloud and we can connect into that cluster and you can either use ssh to connect in that way or you can use what's called a tower agent which is it's just a small piece of software which runs on your cluster and connects back out to tower itself either way you've got a couple of options you set up choose the working directory the launch directory and a couple of settings with regards to who you are on that cluster when you're running it um which queue you wish to use so this is typically like a slurm queue you're going to run the explore job on the compute queue and that kind of makes that available as as well all in all this is to say that you can create these computer environments when you go to run the pipeline then i can select simply go to run the pipeline launch that pipeline and choose those computer environments as they're as they're available uh to me as well a lot of this stuff and a lot of the kind of setup here is you know particularly focused at teams who want to collaborate or if you wish to kind of do this across your organization all of that can be done far from the participation settings here so you can see here that that we've got um up enough who's an owner but i think you could make him an admin um essentially full rights and a maintainer is someone who's typically adding pipelines or modifying those pipelines but not the computer environments a launcher is simply launching the pipelines which are available there and the viewers can kind of self-explanatory itself and inside of the organizations you have got a concept of organization very similar to github here or kind of get systems and then inside of that you can go into the workspace and it's the workspace where the kind of boundary the security boundaries to find so each workspace has its own computer environments it's got its own participants it's got its own credentials uh so you can create as many workspaces as you want inside of those orgs here you can see that we can add credentials for that workspace including cloud provider credentials kubernetes bit bucket code commit github get lab etc container registries and connect all of that in and then from there that makes it that's kind of more on the config setup we've got secret management as well secret management is typically things that you're going to have like inside of your pipeline code so maybe you need for example an api token which you're going to be using as like one of the tasks maybe you've got something like a license key for some of the software that you use that can be sitting as a secret and then you never have to actually put the secret in the code itself and they're not stored in the logs other than other than that there the other kind of key thing around all of this is automation and we have a lot of people who want to run thousands or tens of thousands of next floor pipelines over a given period of time and a lot of that can be driven from from kind of api or from automation or typical use cases is triggering a pipeline based off some data coming off a sequencer or entering some state and from that we've got a couple of options for doing so we've got actions this is here you can trigger the execution of a pipeline based on a commit so if I have this this webhook here is basically if I commit to this github repository it's going to fire off the execution of the pipeline based on these settings this is kind of more of a CICD like setup so something for if you want to do like a full run of your pipeline when you're testing it going to production the other key one here is a custom endpoint for a hook which can essentially define an endpoint itself which you can then call from outside so I can hit this endpoint here with my token the same token I made you do the beginning and that will trigger off the execution of the pipeline with these parameters here and this can be really important if you want to link tower into other surfaces if you want to make it kind of available there as there as well other things here this is kind of like an easy in it's kind of like an easy access to the full API so tower has got a full API which is available here if you go into the docs you can find that so I'm just going to go into the docs and look up it's actually here and on the API and then look at these endpoints and these endpoints here you can see the full the full API and itself here you've got these credentials everything I'm showing you set up of the organizations set up of the users etc if you are running tower and you know particularly you're you know you've got a group of people maybe in your lab maybe in your organization maybe in your university you can come along here and you can set up your organization therefore kind of making it available then you can add people in and essentially end up working kind of collaboratively together on that as well another option here is is the the next floor tower CLI and this is a command line tool which is for interacting with tower itself so instead of doing next floor launch sorry it's every next floor run where you're running where you're running the command you can use tower launch and then in that case essentially you're offloading the the management of the next floor run to tower tower will take care of it you can go monitor it you can get all the information and you can interact with tower from that so you can see the pipelines you can see actions and I think one of the really key things here is is more is more around doing infrastructure as code likes it up so you can import and export pipelines from the CLI essentially they're defined as jason so you can just keep those files and the git repository and that's exactly what we are that's kind of exactly what we're doing here so inside of the community showcase all of these all of these pipelines here are essentially defined in a git repository and when we make changes to them we can import and export means that we can define this whole environment in a kind of open way and it's sort of easy to easy to do as well so I think my time up is is just up on that if you want to try this out so it's the kind of the model for this is you can go try this out get this to go on tower cloud this is available there we also have commercial deployments of this so if you're an organization who wants to install this on your own cloud etc you can get in contact with us at secara and we can help you out in doing that there's also a community channel for this over in the next flow slack under hashtag tower help you'll find that over there we'll put a link to the slack channel there otherwise I would say thanks so much everybody for your time this is the end of us I think we're we started this at five o'clock in the morning well five o'clock in the morning on Monday so it's been it's been a pretty mammoth effort running three trainings and three time zones across across these three days but I will I will stop there and you know thanks so much everyone for your for your attention it's been fantastic to see so many people to join join this training event over the time and it's been you know a really amazing effort as well from everyone on the team here thank you so thank you so much for it I'll let the other folks come on come on to camera now and and and say hello but but yeah really really uh I'll sort of biggest thank you to all for everyone for your attention nice thanks very much Ellen it's a fitting conclusion I think to sort of see how tower brings together all these different components and uh and kind of brings all this under under one roof so to speak yep so not much to add thanks Evan um thanks Harshal thanks Marcel uh thank you everyone for attending um it's been a really successful three days uh I don't know if many people realize this but we we started off organizing as thinking a handful of people that might attend and in preparation for the hackathon so we might get 30 or 40 people uh per time zone maybe and we ended up with what over 800 total so uh fantastic to see just how much interest there is out there and how much enthusiasm there is for for learning next learn and of course and hopefully this has been a useful few days for you all um we will send out an email next couple of days hopefully with a survey asking for a little bit of feedback from you guys so if you have any suggestions about how we can improve the format uh and uh you know if there's anything we can do differently or anything in the content that would be really really helpful um if you have anything specific feel free to jump it into the slack channel now as well it'll be really really um appreciated otherwise hopefully see some of you in person next week in Barcelona see others ever you uh online joining the summit remotely and the hackathon um and otherwise we'll hopefully see you all on Slack soon and and um getting involved with the community thanks very much thanks folks