 Good morning everybody and welcome to day three of the NextFlow and NFCore online training that we're running in October 2022. My name is Phil Youls, I work at Secure Labs and I'm the lead developer advocate there. So my job is to look after the NextFlow and NFCore communities and try and get as many people involved as possible and make sure that everything kind of works well with the community. Also a co-founder of the NFCore community. So today's training is going to be a bit different to the last couple of days. Up until now you've been doing a real kind of deep dive on how to write NextFlow code. It's more of like a kind of a code language tutorial and training. So hopefully you've got really solid foundation on NextFlow itself. But today is going to be more talking about the community and talking about tooling and seeing about how you can be involved with NFCore. But also just use the NFCore kind of tooling stuff. So just to give you a quick overview of what we're going to go through today. This is the schedule on the events page. So today I'm going to kick off with just to kind of some slides introducing NFCore and what NFCore is. I'm going to go through how to create pipelines and work with NextFlow schema and NFCore schema. We're going to have a little break and then Harshal Patel is going to take over from me. He's also one of the core team on NFCore and he's going to talk about DSL2 modules, which will make more sense at a later date. We're going to have another little break and then Evan is going to come on. Evan, CEO of Security, I'm going to talk about different, totally different topic really going to come on and talk about how NextFlow Towers and how you can use NextFlow Tower to manage workflows. And so that's kind of a nice cap to the end of all this training. We're also on Slack answering any questions. So just as with the other days, same Slack channel drop in there. If you have any questions, drop in and say hi to introduce yourself to everybody. It's nice to see who else is kind of following our live streams. And as we go along, just, yeah, I'll try and do my best to respond. Plus we have a couple of guys on hand, Chris and James who have been chatting with you yesterday and the day before. And they'll be able to help you with any questions. Right, so let's kick off. Okay, first quick recap. NextFlow, you guys covered all of this. It's a language and it allows you to connect different processes within a workflow. Stick them together with channels and encapsulate this whole thing within one or more workflows. Hopefully you'll understand how processes and NextFlow allow you to have implicit parallelization. Also, NextFlow gives you the benefit of re-entrancy by which I mean you can resume the workflow and it will kind of jump back in where it left off and kind of cash all completed tasks in a smart way. And also give you reusability of your workflow so you can reuse it again and again, tweak parameters to make it run in exactly how you want it to run each time. NextFlow itself is this language and underneath that language, the runtime can manage the software stack for you and the compute stack for you. So handle the software requirements using containerization, interact with all your different compute environments, submitting to cloud or cluster. And also manage sharing of the pipeline code itself to your code repository of choice. So you know about, you've gone over that in lots of detail yesterday and hopefully you're now totally solved on the fact that NextFlow is both reproducible, very, very highly reproducible and highly portable. And that's why you're here. So if you've covered all that, what's the difference between NextFlow and NFCore? NFCore is kind of a sister project in NextFlow where two projects are slightly distinct, although there's a lot of overlap between the two. And NFCore we kicked off in around 2018 after one of the very first NextFlow user meetings or hackathons in Barcelona. And the idea behind Next, behind NFCore is that anyone can write a NextFlow pipeline however they like. It's a coding language at the end of the day. So although you can make pipelines highly reproducible, highly portable, you could also make pipelines not very portable. You can hard code all of your own personal file paths. You can use whatever unversioned software you like and things like this. What NFCore was started for was to try and establish a set of best practices for writing NextFlow pipelines and to try and form a community around sharing the development of those pipelines. Because NextFlow pipelines are so portable, it means that really for some of the first times in bioinformatics, you can share workflows around different people can collaborate on the same set of code for workflows and actually form a community in this way. So that's what NFCore is. There's a community around NextFlow and we're focused on kind of existential tooling and kind of content. So we provide kind of, there's kind of three main things we provide to people who get involved with NFCore. The most obvious thing is we have pipelines. So offer shelf pipelines which are ready to run your analysis with. You can jump in, see the pipelines we have, run them on your own data. If you don't really need to know anything about NextFlow or NextFlow coding, you just jump in. We also provide a lot of kind of third party tooling is kind of how I call it. So those are tools both for people running pipelines and also developing pipelines to try and make that process easier, a bit more user friendly and also to have some of the NFCore community specific functionality in there. And finally kind of a new one, first time I've had this on this slide is we have modules and modularity and kind of pipeline components. And that's what Harshal is going to talk about a bit later today. Maybe if you're kind of familiar with some other workflow repositories and things like this, there's a few things to point out about how NFCore is different. We are very, very strongly community focused and that really means that we say instead of coming with a pipeline you've written and adding it to NFCore, you come with an idea for a pipeline and then you develop it with the community. You propose a pipeline and hopefully before you even start coding, get feedback, maybe get collaborators at that early stage. And it's a community effort to develop a pipeline. Everyone must use the common NFCore template and that allows standardization and also allows them in a tooling which I'll go through like automatic synchronization so that you receive updates to the boilerplate code, which are again collaborative and community driven. And everything is done through this collaborative code model on GitHub, where we use issues and we use pull requests and code reviews to ensure a high quality. And this is the kind of a foundation of how the community works as we rely on other people reviewing your code to make sure it works. So if you look at the pipelines we have in NFCore today, as of last night, we have about 39, which have got kind of a green tick, which means they've got stable release. This is a special status for NFCore. That first release in NFCore pipeline has to get a really high level of review of the entire pipeline. And so that's kind of our rubber stamp. We say that's our seal of approval. Once you have your first release, we are confident that this is a super stable best practice pipeline. We have another 24 pipelines under varying levels of development. Some of them are very early stages, but some of them are quite mature pipelines, which maybe already be in use in production, but they just haven't gone through that first release process yet or are on their way there. And then we have six which are archived. So typically we never delete a pipeline and we want it to stay around so that people can come back and we run for reproducibility in the future. But maybe development on that pipeline has stopped for whatever reason. We archive it so that it's clear that development efforts should go elsewhere and that one is now stable. So those are the Rothfischoff pipelines. In addition to those, we also have now modules and these are these kind of components of pipelines. And today we have 632 modules, which is mind blowing. And basically each module corresponds to a single process that you've been like you've been doing in the past few days. So you have salmon index, for example, to create your reference gene name index. That would be a module because it's a single process for a single task. And then you have salmon align and that's another process. So that's another module wrapper. We're going to do a deep dive on these later, how to use them and how to create your own. But these modules, these DSL2 modules are real fantastically powerful additions at NFCore. It means you can build a high quality workflow in minutes. And it also means that you can now collaborate with others even beyond the pipeline level. But if there's different pipelines which share components, you can collaborate on that level as well. If there's a fix to a Picard duplicates module, then all of the pipelines using that can receive that fix, which is phenomenal. So if you go to the NFCore website, you'll see a big button that says pipelines. And you can search through all these different pipelines and see if there are any which kind of grab your attention. And I imagine that many of you all have hopefully already done this probably part of the reason you're here. So in terms of the community itself, we kicked off in around 2018 and looking at the web traffic, you can sort of see pretty linear growth in visitor numbers over that time. And I was looking at this last night and I think it's quite funny. You can see a couple of peaks in this plot, this one over just after 2021. I haven't completely sure what that is. I was wondering if it was the NFCore paper, but now I'm looking at it and wondering if the time is wrong. So I'm not totally sure what that peak was. And there's another peak right on the far right. And that is you guys. That's this training as we've seen a big bump in traffic through NFCore website. But you can see that as that scroll passed, it's a real nice slow progression. And the community is just getting stronger and stronger the whole time. To date, we have just a little over three and a half thousand people on the NFCore slack. This is different to the next, next day slack and is basically meant to be focused on the NFCore pipelines rather than next place in tax itself. We have close to 500 people within the NFCore GitHub organization. And we have nearly one and a half thousand people who have actually actively contributed on GitHub, whether that be writing code or kind of creating issues to label bugs and discuss features and things. So these numbers, I have to refresh the slide every time I give a talk because even a few weeks go past, there seems to be a significant bump. It's a massive community now. And this is really good for you guys because it means whenever you run into trouble running a pipeline and doing any development work, you can hop onto Slack. And there's pretty much always someone around, even in the APAC time zone. And there's always people around to help and to test and to kind of get feedback on your work. And that's especially if you're kind of used to a fairly small bioinformatics communities, this is a real strength. And this is a map of kind of roughly showing our kind of global distribution again on web traffic. And this is something we're really kind of actively trying to push now is global representation. Some of you may have seen Marcel speaking on Slack and giving one of the other training sessions. He's based in Brazil. So he's really focused on trying to improve our representation in South America. Chris, who was talking yesterday is our kind of APAC representative. So he's focused on you guys trying to get everyone in Asia and Pacific up and running the next day where possible. And we're really trying to kind of, through our Transakaburg initiative funding, really trying to push our inclusivity of the community. Right, I don't have a lot more to say before we get stuck into the real meat of this training. And just points out that we have a paper for NFCOR. There's a paper for next floor and there's a paper for NFCOR. It's back in 2020 now, but it talks a little bit about kind of how we created the community. And especially if you dig into the longer supplemental methods has lots of kind of background information about how we how we run everything and how we how we do everything within NFCOR if you're interested. And with that, I think we can kick off. Today is a bit of a kind of step change. So the last couple of days we've been following the secure training material for next floor with that same kind of long training document and with the same get pod environment. Today we're following stuff purely for NFCOR. This is actually the first time we've run this particular training for Hartrell and myself. We've done various training for NFCOR over the years, but this is the first time we've done this exact format. So you guys are the guinea pigs. And if you go to the NFCOR website, you can find there's a web page dedicated to this content that we've been building up over the past days. So you can kind of follow what I'm doing. And if you missed anything I said or didn't understand something you can see it written down. This is not a very friendly URL. Apologies for that. If you pop over to the NFCOR website and go, you should be able to find the same URL through documentation. So go to docs and then go down to tutorials. Change my screen resolution. It's dropped off a button. We need to fix that bug. But underneath the tutorials, you should see one called creating with NFCOR. And you'll get to this link. So this is all the different material we have here. And really the most important bit is this launch get pod button. So if you click that, it will kick off get pods. And I've already got my workspace running. So it's telling me that I can use that there. And just the same as before. But you can see now we're basing the get pod image off a different repository. And we're doing that just to get the very latest version of the NFCOR tooling. So we actually did a release just a few hours ago. So for that bleeding edge copy, we need this different get pod environment. So you start that, run it on the web browser if you like. Or what I'm actually doing is I've got a get pod connected to my local VS code installation. So I am sat here still connected within VS code. And but running on get pod, if that makes sense, which is how I like to do it. And I'm going to drag that over here so I can follow my notes as I go along. Right. Hopefully that that's a kind of good enough general general intro. So let's let's kick off. First things first, hopefully now everyone can follow alongside me now. This is a vanilla get pod environment. I haven't done anything to this after I clicked the get pod button. And because of the way that we have get pod configured configured, we've dropped into the tools and NFCOR tools repository, which is useful if we wanted to develop code there, but that's not what we're doing today. So my suggestion, you don't have to do this, but my suggestion is that we start off with a fresh, fresh directory here. So the way I'm going to do that is I'm going to go to the home directory and get pod. I'm going to make a new folder called training. And I see the answer that training. And then if I do find my current working directory, it's home get pod training. And then I'm going to change this file explorer on left. I don't really care about this directory. So I'm going to get a totally new one. So I'm going to go to file ads. No, not add folder on open folder. I did this in my browser session. But I think it is a shift and control. Oh, my dear. Yeah, open folder. There we go. It is if you're in the web browser version, certainly there's an option in the dropdown menu. Or you can get that back keyboard combination, which was shift control. It's also written down in docs. And put in this, this path and hit OK. And then I think VS code will refresh itself and sort of reopen with this new path on the left. And I'm going to say, yes, I trust the authors. Because I wrote it and there's nothing in here. And now we explore over on the left is completely empty because we're, we're sat in an empty directory. Okay. So next thing I don't really care about any file editors at the moment. So I'm just going to maximize the, the terminal window here. And now I'm running on the get pod terminal by asset section. And final thing, if you spot anything, if you create files somehow and you're not appearing on the left, remember, there's this little button here, which will refresh the explorer. So if you're confused as I had it once or twice, I ran various commands and files didn't immediately appear. And remember, you can hit that button if and down. So the get pods environment comes with NF core pre-installed. So you, if you're using get pod, you don't need to worry about installing next flow. You don't need to worry about installing next flow or NF core or any of these tools. You have everything you need already. If you're running on your own system at home, then you can find information on the docs about installing NF core. NF core tools is actually written in Python and like everything else you've done. And it's sat on the Python project, a project index here. So you can just do pip install NF core is also in by a conda. So you can run it. You can install off by a condor if you want to as well. And I would advise against using the Docker image of the NF core tools for now. Okay. Um, so let's just double check. We have NF core. Um, so let's do NF core minus minus help. And this should give some output that looks a bit like this. So tells us that we're running NF core on which version we've got 2.6 is the version that was released last night. So make sure that's the version you're running because that's quite, we pushed that out in a bit of a rush partly because we are changing some of the ways that we're handling modules and DSL too. And we wanted to make sure we were showing you guys the latest most up to date version. So, um, so make sure you're running version 2.6, especially if you've already previously had this installed. Um, and then we can see the different sub commands that come with NF core. So, uh, the help text here is grouped into kind of different, different sections and there's commands for users. Um, this is for people running the next flow pipelines, which maybe you want to run NF core or next play. Um, but NF core tools are designed primarily for NF core, but, uh, we hope that they most commands run with most pipelines. So if you want to, um, download a next flow workflow, you should be able to run NF core download on your own custom next flow pipeline. It may not work super smoothly, but especially if you're using other NF core tooling like the template, it should work with any next flow repository. Um, and the same goes also for launch and also for next flow schema stuff, which we'll be doing later. All of these toolings should work with any next flow pipeline, not just NF core workflows. So, um, so we can do, for example, NF core lists that will show us a list of all the available NF core pipelines. I've got the resolution set up really high, so it's not going to be very pretty, but this is basically a command line version of the website, showing all the different, um, the different pipelines. One nice thing about this as well as it also shows me if I have local copies of any of these pipelines. So if I do next flow, pull NF core RNA seek, or I could have done next flow run NF core RNA seek, next flow will pull a copy of this, this pipeline, um, from GitHub for me automatically and save it within the next flow cache. Now, if I do NF core list, you'll see that it says, I have pulled this, uh, this pipeline. I did it just now. They might say, you know, two weeks ago or something and it checks whether the version I have locally is the same as the latest release. So it's just kind of a little bit of extra functionality because the command line tools know what I have on my local computer. Um, okay. The other big one I'm not going to talk about is NF core launch. Um, if you're running next flow pipelines, I recommend you go and have a look. I wrote most of it. So I'm biased below. I think it's a really, really useful tool. Uh, it's either a command line wizard, which takes you to all the different parameters for your pipeline, whatever pipeline it is. Um, and prompts you for things as you go along with embedded help text and validation of parameters. Uh, or if you prefer, you can also launch that as a web graphical interface and fill in all of your parameters in a web form. Um, it looks very similar to some of the interfaces within tower, which Evan will talk about later, but this one runs purely on your own computer. The command line interface when it's completely offline is, it's just NF core here. Um, so that's a nice rich interface for launching workflows there. Anyway, you came into white pipelines. So let's, let's do that. Uh, we want this one NF core create, create a new pipeline using NF core template. Uh, with any of these sub commands, you can also run minus minus help. So we can see all the different specific options that NF core create that, uh, the NF core tooling mostly uses, uh, interactive prompting on the command line. Uh, if you don't supply parameters, so if I don't give it a name on the command line, it will ask me for it. So generally what I would recommend is just using NF core commands with as few command line options as possible if you're running interactively like I am here. So I'm going to do NF core create. And when it asks me here, prompts me for a workflow name. It does some validation as well here. So I'm going to start typing my favorite workflow. Then that's not a very good name for workflow. And in fact, the NF core guidelines say that the workflow name has to be lower case and without any punctuation. So just to prevent against that. So you can see there's kind of inbuilt validation here. So I'm going to call my workflow demo. I'm going to give it a little description. This is a new pipeline for training purposes. And my name is Phil Jules. This is a fairly new feature here. So if I select yes here, it will walk me through different parts of the NF core template. And I can switch them on and off. That can be the NF core branding. So I can generate a pipeline without any of the NF core logo and things. If you're never intending to submit your pipeline to NF core, it's just so super bespoke and custom to your setup. That can be a good option. Also, if you're running a pipeline that's not genomics, then you can use this to strip out the parts of the pipeline, which are parts of the template, which are specific to genomics. So that's good if you're doing image or proteomics or ML or whatever. But for now, I'm just going to stick with the defaults. That's it. So it runs, it creates everything, a new pipeline called NF core demo. And it created this folder up here for the pipeline. And it tells you, reminds you come and talk to the community before you start coding because it's always good to get that sign off a new NF core pipeline as early as possible. Okay. So if I do a less, you can see NF core demos appeared in the file, file explorer up here as well. And you can see it's created a lot of files and a lot of code. So let's just quickly kind of have a quick poke around here. One of the first things to notice as well is that these, this isn't a flat kind of directory. This is actually initially initiated a GitHub, sorry, a Git repository. And I can, if I do get status, you can see it's kind of a valid, valid Git repo clone here. And if I do get branch, you can see it's even created three different Git branches. And these are the standard branch names that we use for all development within NF core. So typically, because if you do next-flow run GitHub URL, it will run the default branch. And we want the default thing to be the latest stable release. So we use a branch called master for that, which should always be default. And that should always just have the latest release only. You don't do development work on that. Development work happens on the dev branch. And then the template branch, you basically never touch. That's a special branch which is used for automatic synchronization of the underlying NF core template. And if I do get log, you can see that the recreation tool actually made a commit of kind of the vanilla templates if you like. So it's the NF core template without any changes. And it's made that commit across all three branches. And this is for the automated synchronization. So if you create a pipeline and add it to NF core, and if it's within NF core organization on GitHub, every time you do a new release of tools, you'll get an automated pull request coming to your pipeline with all the changes to the base code in the templates. And that's what this is useful. If you're running a pipeline outside of NF core, you can still do that. And you can see down here, there's a command called NF core sync. And that uses a special branch, looks at the latest version of a template available and will propose, propose updates to your code to try and bring in all these fixes and stuff. So that's why NF core create does this. And notes that if you're creating a new pipeline on, sorry, a new repository on GitHub, GitHub will give you an option to initialize it with a read me and stuff. Make sure you don't do that, because we've got the initial commit here. So scroll down a bit, the documentation, just under that when you create a new repository tells you how to do, how to do give me add and add the GitHub URL. And then you can just push the commits that we have here already. Hopefully that makes sense. Right, there's too many files here to go through one by one right now. So I'll kind of leave that as an exercise to the reader to have a poke around. Many of these files, you will never need to touch. There's things like, for example, Gitpod to config file. So if you want to launch Gitpod and edit your stuff, and this tells Gitpod how to run, you don't ever need to change that. Things like the code of conduct, if it's an NF core pipeline, you should never need to change this. But then a bunch of files should hopefully look familiar to you. From the training you've already done, there's a main.inf file, which is kind of the main entrance point for the workflow. In NF core pipelines these days, the main.inf file is very, very short. It just basically imports other files and runs them. You can see a config file here, which is very important. This is something you're going to have to do a lot of. And then up here, you can see we've got files for GitHub automation. We have assets, for example, the test sample sheet with the test data in there. We have custom scripts sitting in the bin directory. So we've got a Python script that we ship with the template to check the sample sheet for any inputs. And the conf directory here contains all of your next flow configuration. This is an important one. Again, the base config has all the defaults for all the different processes. So if you want to tweak how things are running, this is where you go. And if you want to tweak how the DSL2 modules are being run, then you do that in the modules.config. I think Harsha will talk to you about that later. Docs contains all the documentation. So this is your markdown file that hopefully you'll be developing and writing as you write the pipeline. Very important. And that will be rendered on the NFCore website if it's an NFCore pipeline. Lib, you should never touch. That's just some helper functions that we bundle out the way. And then modules and sub-workflows are everything that Harsha will talk about with modularization. And workflows as well, the real meat of the workflow happens. If we look in demo.nf, you can see this is where all of the actual channel handling goes on and importing the DSL2 modules. And this is where the real proper workflow building happens. First things first, before I start changing anything, I want to prove to you all that this is a valid next flow pipeline. I haven't touched any of this code yet. If I go up one directory and do next flow run, I can do NFCore demo, which is the file path in this case. So when you do next flow run, it can be a remote repository name or it can be a local file path. I'm going to do profile test and Docker. So that will run using Docker, which I have installed in Gitpod. And it will use the test config profile, which and the next flow, the NFCore template ships with this special config profile and it knows what all the inputs are. It has some remote data that will download and test automatically. You'll probably want to change this as you build your pipeline because it's very generic at the moment. But it's kind of how we always insist that you have a profile called test for automated runs. And then I'll do set an output directory for all the results, which is where the published stuff goes to. So I'll call that test results. And if I hit run, you'll see that next flow kicks off. And we'll hopefully run our brand new demo pipeline without any errors. Fingers crossed. Nice. So you can see we're immediately getting a whole bunch of stuff from the template here. I've got some nice NFCore ASCII artworks to go at the top of your run. It tells you the version of the pipeline that we've created. So we start on version one. But as you do new releases of your pipeline, then obviously you'll be bumping the version number here. It tells you some information about how we're running, like what directory we're relaunched in and the directory that the pipeline itself is sat in. And these should look familiar to you over the last couple of days. It tells us about the input and output options that we run with and our reference genome. Again, these are then configurable on pipeline level. And it tells us some of the other kind of core stuff here. This text at the top of the pipeline run when you run an NFCore pipeline shows you the parameters which you have adjusted from the defaults. So this list will look slightly different every time you run, depending on how you're running. And here we can see it kicks off with, it's done a process to check that the input sample sheet looks good, which it did. It's run first QC four times. It's a small desktop set, so it's already downloaded it and analyzed it. It runs this process, which is in special NFCore one. So when we run tools within NFCore workflows, we tell them to print their version number and we collect those. That means that we always get the accurate version number even if you're using some other kind of software packaging. So we don't trust that the version is what we think it might, should be from a pipeline code. So we collect them dynamically and put them into a YAML file. And then all of that gets funneled into multi QC. And if we look into test results up here, you can see we've got our output files. We've got our nice multi QC reports. We've got our fast QC analysis results and we've got our next row pipeline reports. Great, it works. You have your pipeline. Why did you need to do two days of training when you could just do that? Of course, this is where the fun begins. So although that's great, you can see when I started looking through these files, it's kind of pretty overwhelming. There's thousands of lines of code here. How do you know where to start? You're not alone. We have some help here. So if I just pick the main workflow file here, demo, for those of you with kind of eagle eyes, you may have noticed that this line here popped out when I opened it before called, and it's a special comments line. So it's highlighted because in VS Code, I have a plugin installed called to-dos, which comes in the Gitpod environment. And it's highlighting it. So if I just make it so it's not to do it, it's not highlighted anymore. So it's just a regular code comment. And we have these scattered throughout the NFCore template. And the idea is that they are like little flags telling you where to get started with customizing the template. So here, we're telling us, add all the file path parameters to the pipeline list. And as we scroll through, see if I chose a good example. I did not. As you look through the pipeline, you'll find these kind of to-dos comments scattered all over the place, telling you kind of where to start and what to do. The reason I have this to-do plugin installed is because it's quite nice way, actually, because it's a standard to-do comment. You can use plugins like this one to show you where all the to-do comments are. And you can just work your way through them all. So very easy to get started. A couple of other points about how we do coding within NFCore. We love standardization. It's a really key thing. The template synchronization with Git works well if all the code looks as similar as possible. And also, we're all about collaboration and contribution from many different people. And people write code in different ways, have different styles. Unfortunately, so one of the things we use a lot of is something called code lintes. This may or may not be something you're familiar with from other projects. But if I pop into, what would be a good example, if I pop into the YAML file here, or that's not very full, if I pop into the read me, that won't seem good. I've got a lot of markdown file code here and markdown is pretty generous. I can remove that white space here. I can, what else can I do? I can kind of mess around with the markdown in various different ways. For example, I can call that number one again. And this is valid markdown. This will still render fine. There's no problem with this at all. But it's not very standardized. And so what we can do is we have a tool called prettier and which is a linting tool. And prettier works on YAML and JSON and markdown and a few other languages. And we have other ones. For example, for Python code, we use black, which is a kind of a standard one. And these tools check how the code is formatted and lint for it and have certain code rules. And they check that it's valid. And also you can have editor plugins which automatically updates and correct the code based on those. So you see when I save here, save and the code updates itself automatically. I haven't done anything else. All I did was command S there to save. It's added spacing back in. So it's a standardized amount of white space around headings. And it's renumbered my markdown list. And so this is really, really nice because it means all the code is really consistent. And once you get used to reading it and once you start using these code lintes, you'll never go back. But if you're new to it and you don't have them installed locally, you might find that when you push code to GitHub, you get kind of test errors saying that prettier failed and linting failed. So don't worry if that's the case. All that's happened is this is just some kind of white space probably somewhere that needs just tidying up and you need to run these tools. But I would definitely recommend getting these tools installed into your code editing environment, be it VS Code or any of the others. They work against basically all the major editors so that you don't need to think about it. So the stuff just runs. So there are code lintes. And we also have a linter specifically for NFCore that we've written from scratch which doesn't check the code formatting, the white space and everything so much, but it checks the NFCore guidelines. And we have a whole load of automated tests in there which run through, look at your pipeline code and compare it to what we're expecting in the template and compare it against kind of predefined workflow rules and check that your pipeline is doing what we want. This is good for you because it means that you can get immediate feedback as you're writing your pipeline about whether stuff is correct or wrong. And it's also really good for anyone who's reviewing your code, who's reviewing your pipeline because if all the NFCore linting tests are passing off the map, we know that the workflow is already in pretty good shape and we just need to think about more conceptually about what it's doing. So this helper tool is within NFCore again, minus help, you can see it. It shows up here as NFCore lint and this is one you'll probably find yourself running a lot. So let's run NFCore lint on our new pipeline. It does a bit of thinking on the back end, downloads some files that it needs to run comparisons and things and then spits out a whole load of output onto the command line for us. Hashi will be talking about DSL2 modules. I'm going to ignore this section completely for now. But up here on the pipeline level lint results, you can see we've got a whole load of pipeline test warnings. And you can see actually all of these are about those to-do comments that I talked about earlier. And so the point of this linting test is just to warn you that there are some still left in the workflow template which you haven't yet addressed. So as you go through making those edits, just delete the to-do comments above it and all of these test warnings will slowly start disappearing until eventually when you finish them all, they all turn into passes. This number will be a bit higher and you won't have any warnings left. To prove this to you, I'm going to cheat horribly and do something you should never do, which is just change all of them without actually taking any action. So I'm doing a global find and replace here on the entire project. And we're replacing all of these nice to-do comments with the text demo, just so that they're not found by the linting. Cool. Right. Now if I run nfcore lint again, it will run through the same kind of all the same tests here. It has a look. And sure enough, we've still got those module level pipelines warnings, but the pipeline warnings have all gone. And it's gone from 178 up to 179 tests passed. Good. Hopefully that makes sense. Warnings are okay. We try and get rid of all the warnings when you do a release of your pipeline because if it's all green across the board, then that's ideal. But warnings are okay. They're kind of tolerated. They're allowed. If you get failures, that's less good. And if you have a failure on the nfcore linting, then that needs to be resolved before you can merge your new code into the pipeline. I can demo this by just editing a file that I'm not meant to edit. It's the easiest way. So the code of conduct is one of the files where I said that you just shouldn't really have a touch. You shouldn't bother changing. It's the nfcore code of conduct. So if I do change it, and then I rerun nfcore linting, it's going to complain about it and tell me that I shouldn't have done that. And so now you can see that I have a test failure. It tells me what went wrong. And in this case, it actually tells me I can automatically fix this by running this command. Each of the pipeline lint tests have a special name, files unchanged. And actually, if you alt-click on this, this is a hyperlink, even in the command line. Thanks to the amazing Python Rich Library. And so if I click that, let's see if it works within Gitpod. No. OK. If I was running in a regular terminal, which is not within Gitpod, which is just on my local system, then this URL would open in a nice, very smooth way and take me directly to the documentation about this specific lint test. Let's see if I can quickly find it manually, just so you can see what it looks like. Tools docs go to the latest. These don't look super pretty yet, but will hopefully be improved in the near future. But it should give you a direct link to this page. And then you can see the documentation saying with a list of these rule of files, which must not be changed from the template. Some of the files mustn't be adjusted, but you can append additional content underneath at the bottom. So these ones you can stick more stuff into. And some of the files have... Yeah, that's it. And there are other ones here. These are all the different lint tests here. So these are files which mustn't be deleted. This is how the multi-QC config should look. This is the one about the to-do comments. So all of these are documented here. But basically, you run NFCore lints and generally you just hit that link and it will take you to documentation. Right. So I said that you wouldn't be able to merge your code into your repository yet with its failing test. The reason for that is that we use extensively continuous integration. So if I go to a small RNA-seq pipeline, I'm going to pick on Alex because he won't mind. Then, here we go, yeah. So Alex here has created a pull request on the small RNA-seq pipeline within NFCore. This is a long pull request. So we don't need to think about what he's actually doing here. But what is important is within this pull request. So pull request is how we merge code into your repository on GitHub. It's how we use our collaborative pipeline coding. We can all review other people's code through this interface before it gets merged. And you can see right at the bottom and it says some checks were not successful. And you can see these are a list of automated tests which ran on the code that he pushed. And some of them have got a green tick. So the NFCore linting, which we were just talking about, did actually pass here. But some other tests failed. And because these have a red cross, this pull request cannot and should not be merged until that is resolved. If you click on the details next to any of these, it will take you to a web page showing you exactly what happened. And this is the NFCore run. If I click here, it shows the actual NFCore lint run. And here is the exact same output running NFCore lint. And you can see what the warnings were, the failures were and everything. Exactly as if you were running locally. So this is the real power of this tool. Is that when you create a new pull request and someone else comes in to look at it, if there's a little green tick there, they know that everything within the pipeline is up to NFCore standards. Final touch on linting. Maybe you are building your own pipeline, which is never going to be part of NFCore. And you don't care whether it has the NFCore logo properly formatted. You don't care about having the same code of conduct as us because you're not part of NFCore. Or any number of reasons for any NFCore lint tests, you might have a reason why, actually, you're okay with that failing. So you still want to be able to use this tooling. You still want to be able to use the continuous integration on wherever you have your pipeline. How do you get around that? And you can see that there's a counter here for number of tests ignored. And that's the key. You can go into the config file at the base of any NFCore pipeline called NFCore.yaml. It's pretty empty from the template. It just says that this is a repo. But you can go in to add to this file scope lint. And then you can give the name of any lint test. This is the same as the one, the name which was shown on the left here. So files unchanged is the name of a lint test. I have files unchanged. I could just set it to false to disable this test. But with certain specific tests, this one included, I can actually give it a list to give more fine grains control and say, actually, I only want to ignore this file. I still want to pick up any other files that have been changed. But I could set this to false to skip the entire lint test. I save this, come down here and rerun NFCore lint. And hopefully everything will pass again. And now we have two pipeline tests ignored. So it's just telling me that I've ignored something but the failure has gone. We have zero tests failed and we're good to go. Right. How are we doing? Hopefully everyone is still with me at this point. And the final thing I'm going to talk about before I stop is Nextflow schema. Nextflow schema is something that started off within NFCore which is why it's part of this demo. It's part of over NFCore tooling. But it can be used with any Nextflow pipeline. So what the stuff I'm going to talk about now is also valid for all the training examples you did yesterday. Any Nextflow pipeline you write should work with Nextflow schema and should work with the commands I'm going to show you. But it's just a historical fact that it comes from NFCore originally which is why it's called NFCore schema and why you need the NFCore tooling. So if I go back up a level, I'm going to run the next flow pipeline. I did Nextflow run NFCore test, NFCore demo. With any NFCore pipeline, you can run a workflow with minus minus help the same as you would with any other command line tool. And Nextflow will run the pipeline but it will just exit immediately and print some help text to the command line which is kind of just nice and easy for you. And so when I do that, it prints out all the different parameter options which are typically used and it hides the load by default which are kind of the boilerplate ones. We don't have very many here because I haven't added anything yet. But you can see that if I do add show hidden parameters that the pipeline comes with this huge list of different Nextflow parameters for configuring all these different aspects of a pipeline. Now something you might notice is that this help text includes a couple of things which haven't previously been possible to save up until this point. It tells us that these parameters should be strings for starters whereas this one is a Boolean, a true false. And it tells us that this one should be an integer. So within this help text we have parameter variable typing information and we also have help text. This is currently not possible to do within an Nextflow config file. You can't add this information there. That might change in the future but for now there is no way to have this in a Nextflow config. So the way we do this is that we basically created our own new standard. If you look within the pipeline source code here you'll find a file called Nextflow underscore schema and it's a JSON file. And we can look in here and it has a list of all the different parameters within the pipeline here. This is minus, minus input and it has this information that says it's a string. It says that here's your description which is kind of the help text and description. So that's a short help text. That's a long help text which could be written in Markdown and it has a bunch of extra keys here to help configure how these parameters are handled. This has a bunch of good things about it. This lets us do various things. It lets us print help text to a command line that you just saw. It also lets us render help with the same information within a web UI. So if I go to the RNAC pipeline page on the NFCOR website you can see there's a tab saying parameters. There's this for every single NFCOR pipeline and it prints some really nice help text here. This is all based off that JSON file. We've got the input. We've got the type. We've got the description and we've got the longer help text. So it allows us to render this kind of rich help text wherever you want. There's also a command line option for doing pretty much the same thing if you'd like. So printing Markdown. And it lets us validate the pipeline inputs which is really, really important. So if I do next flow run again I'm going to go to my previous command here. So down at the bottom of the screen where I did next flow run. I did minus profile test. When I run that before I specified the output directory and out there is a required parameter in the next flow schema. So if I delete that and try and run again next flow will run the pipeline. But the very first thing it does is it checks all the parameters against this JSON file with some custom code and it will immediately spot that our required parameter is missing and throw an error and not launch at all and just exit. So it fails quickly which is really important for long running processes. So it allows us to do validation and finally it allows us to build nice rich interfaces I can do. And of course launch that command I told you about earlier. And it looks at this pipeline and it says, great, I found a next flow schema and I can give you a user friendly graphical interface to do that. I can do that either by launching a web-based interface or you can do it on a command line. And here it takes us through all the different parameters. It tells us which ones are required. It won't let us continue because it does validation as we go along and it has all the help text and everything. So all of these rich interfaces are based on top of that JSON file. That's great. But if you started to panic a little bit when you saw this JSON file and imagined trying to edit it, you're not alone. You should never touch this file by hand. You will almost certainly break it. I do every time I try. It's very long. It's quite complex and especially as you build more and more parameters into your workflow it will get longer and longer and longer. So you shouldn't ever touch this JSON file manually. But what we do have is we have a tooling called NFCore schema which helps you to work with this. I need to speed up a little bit. So if I go into the next flow config file I can put in some additional parameters here. So I'm going to drop some new custom prams in. There we go. And then if I run next flow schema builds which I can run as many times as I like at any point it looks at the next flow config and it prints all the different parameters from next flow itself and it looks at the current next flow schema file and it says hang on there's a new parameter here that you've added and I don't recognize it. Do you want to add it to the pipeline schema file? Similarly if you renamed something or deleted the parameter and it was still in the JSON file it would prompt you to say do you want to remove it? So I'm going to add my new parameters here and then it's rewritten the JSON file now so it's updated it with an added a new pipeline so that's done automatically but we probably want to customize it to make it nicer and so it says do I want to launch the web browser for customization? And yes I do. Right. This is a weird thing with Gitpod. Normally when you do that it will kick off a new browser tab on your computer with the editor but because we're in Gitpod and Gitpod doesn't know about web browsers and stuff it uses a command line based web browser called links and that's quite confusing. So I'm going to press Q and it says do you want to quit? I'm going to say yes. Thankfully our tool is still running in the background so normally you'd never see that unless you're running on Gitpod and thankfully it prints the URL to the screen here. So this is what would be loaded normally. Right. So and this is our nice graphical interface for customizing Nexo schema. And here you can see all of the different parameters and all the different help texts and everything in this kind of rich interface. If I collapse all the groups so grouping is another kind of idea that comes from Nexo schema. It's not within Nexo itself. You can see even new parameters which are added automatically and they're sat at the bottom and they're ungrouped. So what I can do is I can add a new group and I can drag and drop these things around. So I can a bit complaining on my small resolution or it's just not going to work for me. I've done it. There we go. Managed to get it down. So we've got a new group down here and I can drag stuff into that new group. Or if I've got a lot to do, I can click this button and select all of the ungrouped parameters and move them all in one go. So our new parameters are now within this group and I can set an icon. So I can say this is there. Got a little picture of a hand and I can write this is a group of parameters. And then for each parameter I can go through I can set an icon which is used on the launch interfaces, the web interfaces. So I can pick a test tube for that one. I can just pick, I don't know, a cog for this one. But let's do some weights. You know, whatever kind of makes sense for your pipeline, for your parameters. You can see most of these ones which we already have have kind of slightly more logical icon sets. Then I can write a short description here and I can set the type. This one is clearly meant to be a Boolean. It's the default set to false. So I can click on that and select Boolean. And this one, it's guest being integer. So it's already set that to integer for me. I can get rid of null values there. I can tick if I want something to be required so that it would throw an error if it's not specified. And I can click on this to write longer form help text. And anyone who's used to writing markdown on GitHub this will feel familiar. I can write some help text here and click preview. And you can see that this is what it will look like when I do help text on the command line and this is what it will look like on the website. It's not rendered as markdown. Okay. But on the website it renders it as markdown. Take my word for it. Importantly then, the final thing on here is we have a little cog for additional settings. This lets us do extra special stuff and what we see here is based on the variable type. If it's a Boolean, we can't really do very much so there's nothing there. If it's an integer, we can set a minimum and a maximum or a float number. And we can set a list of enumerated values so the user has to pick form 2, form 6, or 8 and if they enter anything else it will throw an error. If it's a string, okay. Let's complain because I said that it hides a pick from these numbers and the default wasn't one of those numbers. If it's a string, I have the most options of all so I can set enumerated values again and I have to do it with a pipe character not the comma, apologies. I can set a regular expression for a string which the input must match also for validation. So the example here is if I wanted to do a .csv file then I can say that the input parameter must end with .csv and specifically when inputting files for example sample sheet, .csv files and things I can set here the format, this is very new and I can say it must be a directory or path which could be either or a file path. If I select file path I have additional fields here and I can say that this is a .csv file so I can re-click this link it will show you a big, massive list of all the different standard mime types so I can type in text .csv and I can even set downstream schema because that's very new and not very well supported yet. But this bit is really important especially if you're running on tower or if you're running on some other tool which has rich input forms because this tells that tool through the JSON file that this isn't just a string this is actually a file and the file should look like this and this then allows rich interfaces to be built around that. Right, if I just minimize my windows here you will see that the console is still running in the background this is what I've been sat here so that little spin is still going around and around and it's just waiting for me to hit save and I think this is really fun to show at the same time. So if I'm on the web browser here this is on the NFCore website but I'm done I click finished it saves it boom the command line in the background saw that I saved my file and it has saved everything into the JSON file. If I go up one now and re-run the help commands minus minus help we should see our new parameters that we added with a nice help text and all of everything showing that the schema is doing its job properly sure enough demo group string string brilliant integer defaults and everything there so great that's worked perfectly and you can re-run NFCore schema build as many times as you want do a little bit of editing, click finish run it again do a bit more tweak change add parameters, remove parameters just re-run it whenever anything to do with your config changes and play around with it and customize it as much as you can because that gives a good interface for people using your pipeline right with that I'm going to stop we're going to have a quick break not very long because I think I probably ran over a little bit and that's me finished for today so if you have any questions you have on NFCore Slack I'll jump in and try and help as well as helpers who are doing a brilliant job I'm sure of answering all of your questions and in a few minutes Harshal will jump in and start talking to you about DSL2 pipelines and all the power of collaboration that they bring thanks very much for your attention so far and I'll chat to you on Slack thanks everyone hi guys welcome to the NFCore training for today it's it's nice cold and dark in England at the moment or by the morning hope you all are doing well Phil and I and a bunch of others had a bit of a late night trying to get all of this together yesterday in preparation for the Hackathon and a bunch of other stuff that we need to sort out so if anything breaks then blame Phil because it's always his fault so I'm going to try and give you an overview of NFCore modules and how we are using them now especially in the context of Nextflow DSL2 and so NFCore modules initially was started off to be a repository where we could share these modules across the Nextflow community maybe I should introduce myself first I'm Head of Scientific Development at Secura Labs and so I've also been a long-term contributor to NFCore since the very beginning pretty much as well as to the Nextflow community so we're doing really cool stuff at Secura Labs and it's great to be able to still be here and contribute and do all of these really cool open source things in the community as well with NFCore it's an awesome community as Phil probably introduced earlier and we're growing rapidly and so as a result of that we are starting to collect a set of best practice pipelines on NFCore we're hitting almost 60-70 pipelines now and we need to be able to share functionality across pipelines so if you look at this particular treatment here this is for the RNAseq pipeline actually there's routes through this pipeline that could potentially be shared in other pipelines so there's modules like FastQC Tringo Law MultiQC even and others that won't just be used in a single pipeline and with Nextflow DS01 everything was kind of monolithic in that you had to put everything in the same script if you wanted to use the same process more than once you had to physically copy it into the main script whereas with the implementation of Nextflow DS02 what that allows you to do is to reuse these components within the same workflow and across workflows and so just a brief crash course into the terminology that we defined as soon as we started working with modules is that a module is a unit of Nextflow and so it's a single process that does and forms a specific task in a containerized job so say for example FastQC that's one process that forms a particular task that is what we would call a module similarly you may have tools with sub commands like SAM tools index and so that also is potentially what we would call a single tool a single command now these modules can also be chained together so there may be bits of functionality in your pipeline where you want to run SAM tools SAM tools index and then maybe SAM tools stats and so on and this particular sub workflow which is what we call it could be used in other portions of the pipeline as well because you may want to sort of ban file multiple times in the workflow and this is really where the strength of DS02 comes in because not only can you reuse the same components within the same pipeline you can reuse them across pipelines and so this is why it became very appealing to us to really start working on this and setting up the infrastructure where we can share these modules across pipelines a workflow is just an end-to-end pipeline and so that in itself will be composed of modules and sub workflows and so an end-to-end pipeline would be like the NFQR and AC pipeline that runs a particular workflow and I won't go into much detail about this stuff but we had long drawn out conversations about exactly what and how we go about implementing NFQR modules we love automation we need the pipelines to be portable for them to work for as many people as possible out of the box they need to be practical people need to be able to contribute these modules if they can and also flexible so that other people can install them in their workflows and use them straight away we need to be reproducible again very key to any sort of next-door pipeline development and science in general to get the same results we need to have documentation for these modules as well and we need to be able to standardise them so you don't see 50 different modules looking in different ways and stuff we need to have lint tests and ways we can check this over time to make sure everything looks the same and also this just makes it easier to update everything the day before or something like that if we're using the same syntax the same standards and all of that sort of stuff then it makes it easier to roll out these sorts of updates without having lots of things in a repository that look completely different to each other so in terms of how this all started we started having conversations in July 2019 when Paola was toying with DSR2 and really coming out with some good preview versions of DSR2 and so I initially created this repository at ISMB in 2019 thinking that we need to do something and if it's there we will do something but it turned out that just got distracted I got distracted and we didn't touch it for a while we had a few sprints in between the quick hackathon in person which was the last one before the pandemic we managed to squeeze that just in and then since then we've been mainly having remote hackathons so we had one cubic where we again made a bit of a dent but we needed a proof of concept we needed something that we could use to then really roll out modules across NFCore and so at that time the RNAC pipeline needed a bit of love it was still DSR1 and we've had various people that have been working on that pipeline and this is one of the testaments of having it on GitHub that other people can take over it's not just the pipeline doesn't die with a biometrician in-house when they leave for example and so I picked up the pipeline and I went about converting to DSR2 and through various iterations of updates and changes and the syntax adaptations I kind of knew where we wanted to go but I didn't anticipate where we would end up and so the only way you can really go about doing that is making changes and more changes and more changes asking questions from the community, getting opinions and then asking others to use it and then eventually you would find what you had initially and so we went about doing that and then in December we added this to the pipeline template we decided we wanted to switch all pipelines to DSR2 and in March 2021 the 45-ish modules that I added initially from the RNAC pipeline then became 105 within a few months as people started picking it up so we almost doubled within the space of a few months these different modules for these mainstream tools like Bust2C sound tools and so on we added CI tests to make sure they work all the time as well as test data to make sure they work as well and then we also went about trying to add commands to NFCore tools which is the Python package for the introduced where we can really attempt to automate some of this stuff and standardize how we use these modules today from 105 we now have 632 modules which is phenomenal the community is active very active in terms of adding these modules for their own use for general use and we've now also got sort of 150 contributors that are contributing now to data to NFCore modules so it's really growing and there are people coming in from outside the community just to adopt the standards we're using even for NFCore modules because these modules are tested they work we've got CI tests and so they're very appealing just to start straight away to constructing pipelines in your own settings we keep them up today with best practice there may be breaking changes every now and then but I promise you basically because we went back to the corner and need to make them so apologies if that's broken we have a massive toolkit that we've built up around GitHub API and we've got sub-workflows coming soon in fact that would be probably one of the main focuses of the hackathon next week so rather than just installing single modules in your pipeline we'll have additional tools in NFCore tools and proof of concepts that we've already added that allow you to now install entire sub-workflows in your pipeline so you get a chunk of functionality rather than just a single module there was various things that I think, I won't go into this much detail but there's various things that were important especially when I started out writing and developing this and one of those was the ability to overwrite non-mandatory arguments because it would have been nice for these tools to be as flexible as possible to the end user rather than having hard-coded parameters that are very difficult to change once they've been added to the pipeline I wanted to have something where end users could add their own non-mandatory options or change these non-mandatory options if and when they needed to tweak the behavior of the pipeline now what that means is that rather than creating another release specifically to accommodate these different parameters you now have the option of using exactly the same release but just tweaking the non-mandatory parameters for the use case and that's been really nice because it's just meant that we've been able to reuse the same pipelines with just by providing an additional configuration file so you can see at the top there if you wanted to change for the trim galore module in the RNAC pipeline you wanted to add an additional non-mandatory argument it's pretty simple to do that but as I mentioned I won't go into much detail you can also change resource requirements and retainer definitions as well with this sort of thing we've got a massive toolkit that I'm going to demo to you now and hopefully show you the utility of all of this and how useful it can be when interacting with modules and there's a bunch of resources so hopefully I think these slides will become available when the training is finished we'll put them up on the website and so these links should work you can click on them there's a bunch of resources here hopefully to cover what I haven't found on all of these different communication channels and I'm going to say thank you in advance so I can get out of the slide and we turn so let's close that and so now I'm going to jump straight into the demo I hope you guys can see my screen or I've got a split screen with the NFCOR website and NFCOR tools repo and so what I'll do now is so that document that Phil was going through earlier is what we'll go through as well so if you go to the NFCOR website let me make this bigger bigger and if we go to the NFCOR website we'll go to docs and then come down we'll go to contributing on my screen and then create tutorials creating pipelines with NFCOR so this is what Phil was going to do earlier and I'm basically going to try and do the same thing and so in my other tab I'm going to load up a Gitpod environment so I've just come to the NFCOR tools GitHub I'm going to create a Gitpod environment that we can use and just find the section so we'll be going through the NFCOR module section I'm going to do this directly in the browser so let Gitpod just warm up quite early in the morning and so now you can see you've got NFCOR tools installed here and so I'm just going to create a pipeline quickly like I've built it maybe if I come out here and go to the top level directory and line it up for the page the template it's a little bit bigger so now I've got this demo pipeline that I've built in seconds and I will now open that folder so we can see what's in it so I'll just cancel that and get in the browser again and so now you can see the top level directory here is this NFCOR demo that we've just created so what I'm going to do here is show you how we can use how we can use modules on quickly going through one NFCOR lint and so the linting checks that everything looks okay there's these 2D strings that Phil mentioned in his talk that you can start to adjust and get rid of eventually as you amend this pipeline as you run the lint and you remove these 2D strings that will disappear from these warnings this is something we need to fix and we'll probably fix in the next release to get rid of this one so what we need to do here is you can see that to add a workflow to add a process to the main workflow the way things are organized here is that you have workflows sub-workflows and modules so this is what I introduced earlier in terms of terminology the workflows this is an end-to-end workflow sub-workflow is just a chain of modules and these are the modules here so if you're going to belong in another pipeline you can do them in this local folder and then you have others like these that have been installed from NFCOR modules for example multi-QC, fastQC and a couple of others so let's try and add a process to the pipeline template the vanilla pipeline template will go to this demo.nf file which is the main script that we're actually executing here and I will copy in that process let's run let's run the workflow as it is running run dot file docker that should take a couple of seconds so what this is doing is it's running next flow the dot here is just running in the current directory so it picks up the main script and runs that automatically and then it will run to the test profile in this particular pipeline which has a minimal test data set up fastQ files by default and it will use docker to run as a containerization engine any published results will be put in this output directory called test results so this will take a second or two to finish you can see all the parameters that have been done here to screen to show you exactly what you set when you run this pipeline which is pretty nice because it saves it just shows you directly which parameters you changed compared to the defaults when running this pipeline so as that tags along maybe I'll talk to you about as that tags along maybe I'll talk to you about so I'm just going to revert back to a single screen so I'm going to just shift this here you guys can follow along as part of the training online and so I'm going to make this slightly bigger so you can see what's going on I'm going to stop sharing and move this across to make it easier for you guys to see what's going on is that better guys on the inside sweet okay cool so you guys can follow along on the website as I do this I thought it would be nice to do it side by side but if it's difficult to read then there's no point in doing it that way so our pipeline is now finished this is the vanilla pipeline we're working out the box that we've just created and now we want to add a process in and so this process all it's doing really it's taking the output or we'll take the output of this input check reads channel which we can view to see what it looks like I'm just going to tag with you on the end so it only runs processes that aren't required to be cached so now what should happen is the contents of this channel should be done on the screen here and that will show us how we need to configure this echo process to take as input here you can see when you view the channel you've got this metamap at the beginning which just contains it's like a python dictionary but in groovy it contains an ID whether the sample is single end or not and the path to the fast queue files and so this echo process here is now taking as input the channel of the same structure which is the metamap and the reads and it's just going to print the reads to screen and so here where I resumed the pipeline now I've uncommented this nothing should happen because I've just copied the process in we actually need to invoke the process in the main work loan so we've got further down in the document so this will just cache this will just run the pipeline as it is and now if we want to invoke this process we have to stitch it up in the pipeline so here under the fast queue see what I'll do is I'll call the process that we've added here and it will take the output of this input check process and just echo the paths to the reads to terminal there you go so this is just a simple process it's just to highlight how you can add processes to the existing pipeline template now say you wanted to add a module let's create a module and move that process across instead of having the process directly in the main script it's much nicer to have it's much nicer to import it from somewhere else and then you can reuse it multiple times I'm just going to create a local module here I'm going to cut this out paste it in here save the file come back here but now that I've pasted it in the external file I need to now tell Nextflow where that process sits and so what you can do is have another include statement now that is including that process from that particular module that we've just created and so now if we resume that process is not coming from the main script it's coming from that local module so the behavior here is we still should see the modules actually being called from a different location so that should finish in a second but now the true strength of DSO2 as I mentioned earlier is that you can reuse these modules and so you can call this same module more than once and so what you can do is you can alias that module so I'm going to remove that initial call and I'm going to call or import this particular module with different names and so what we can do here with minimum adjustment is just rename to whatever we used here so you can reuse the channel and so on so here twice so now we're invoking this process more than once and so what should happen here is that the file should be printed twice instead of once now because we're calling the same process more than once and it's exactly the same process we're not having to change it so you can have this similar sort of behavior with sub-workflows as well and you can import these modules in the same pipeline or different pipelines and so here you can see that now these FASTQ files have in fact been printed more than once, they've been printed twice to terminal okay so hopefully that highlights how easy it is to add a module to the pipeline template now obviously your use case depending on whether you want to use FASTQ files or not and whether your pipeline is completely different will depend on how easy that is because you may need to configure the channels differently and the inputs differently but it just shows you how you can use modules directly with this vanilla template you can list the modules that you've got installed in your pipeline and so this is now where we start looking at the NFCore modules commands that we have written and made available to you by the NFCore tools package there's a number of options here to do various things like creating modules creating test YAMLs which I won't go into much detail for the rest of the time you can lint, bump versions and also test test these modules as well so for now let's just see which modules we've got installed in our pipeline and NFCore modules list local and so you can see the NFCore modules that we've got installed in this pipeline by winning this directory so we've got local modules we've got NFCore dump software versions past QC and multi QC and all of these modules are being tracked in this JSON file so this JSON file is how we eventually decided we would track the version for these different modules you don't need to touch this JSON file NFCore tools should interact directly with it, it should update this file whenever you install a new module or whenever you update a new module and so this is just a way for NFCore tools to track what you've got installed in this particular pipeline and then it makes it easier for you to update this and use it across time and to also get the versions of the modules that we're using so let's run the NFCore modules update command now this is quite nice, in this case you'll see that you can update all of the modules in a single go so if anything changes on the NFCore modules repository NFCore tools will create the GitHub API, see what's changed and it will just automatically update everything for you you have the option of looking at the diffs for what's changed between what you have locally and NFCore modules, it's up to you you can dump that to a file or terminal or you can just brute force update everything and so in this case what it's just showing you is that all of these modules are up to date and that kind of makes sense because we just released tools yesterday and we made sure everything was up to date and since then nothing has changed on the NFCore modules repository so everything is looking good here but when things do change over time this particular command in the NFCore modules toolkit is actually really nice because it does it all for you in an automated way and all you really have to do then is potentially if any channels have changed or anything else that has changed within the module you may need to tweak a few things to get the pipeline working again need to be up to date with what's on NFCore modules you can remotely look at the modules that are installed on NFCore modules but this is a massive list now as I mentioned there's also 632 modules and so rather than that we've added all of this to the NFCore website and so there's a modules page which you can see here which is just NFCore modules where you can search for these different modules by various attributes and so on to make it easier for you to find these modules there so now what should we do towards the local modules we updated the modules we just did the remote modules now let's try and install a module from NFCore modules so if we do NFCore modules install they only want to install a PSP module and that's it so you'll notice now that that PSP module has been added to this modules JSON file so this gets updated automatically the module itself is downloaded and added directly to the NFCore modules folder so you don't have to go and grab anything you just need to tell the toolkit what you want to be installed and it will go and grab the latest version of the module for you you can even pin it to get SSH as well if you want to stick to a particular revision of a module you can even pin it pin it to that as well and so if we look here this is the module that we just downloaded and installed in the pipeline it's tested it's fully functional and what's nice is everyone updates essentially and then you can use and install it directly in the pipeline so let's now try and call FastP in our pipeline and to do that we need to include it so here you can see there's an include statement that's been printed to screen if we just copy that out and include it here that's from this now where we are essentially importing this FastP module making it available to the main script and then we also need to invoke it within workflow so copying directly from the website and on this now so all I've done really is we're passing the same channel of reads because this module as well just takes a metamap on the reads and it takes a couple of boolean values for whether we want to save the trimmed failed FastQ files and whether to save the merged FastQ files and in this case we're not interested in those and so for simplicity I've just set those both to false so now let's try and run the pipeline again with FastP in it see eventually that FastP is now being called and it should finish and run with that as well and that will give you all of the information about this particular module as well so there's another way of grabbing information about input channels and so on for this particular module as well as other metadata that's been added to the this metamap file here for the module so it's just a nice easy way of accessing this information directly in the terminal especially when you're stitching these pipelines together so now we have FastP working say for example you want to change this module but it's a custom change and you don't want to commit that to NFCOR modules because it's something that you'll be using and so in that case what we can do is patch the module itself and so say for example we add echo in the script section here we run the linting for that particular module what you'll see is that now you get a failure for the linting because the local copy of the module the pipeline is now different from the one that's on NFCOR modules and that's because you've made this change and so NFCOR tools is clever enough to know that actually you've changed this module but you want to use a standardized version and you put that in the module section so we need to do something about this if you want to keep this module in this pipeline and so what you can do in that case is just run a really nice command for NFCOR module patch so what that would do is now you can see in the fast P directory there's this fast Q diff and this is the diff of exactly what you've changed in that module and this is a record of that which then also gets tracked here in the modules.json and so this allows you to make amendments to existing NFCOR modules and still have NFCOR tools track and update everything it knows where to incorporate these changes for you so that's a really nice tool that gives you a bit more flexibility in terms of how you use NFCOR modules as well you can link all modules at the same time I just showed you one for fast P individually but you can link all of them at the same time and now you can see that that link failure that we were getting for fast P has disappeared because we've patched this module as well and so now we don't have any link failures here for that we can create a module if we don't see NFCOR we want to just create because we're running this in a pipeline repository yeah it's prompted to add a module save it I think we have it broken at the moment anyway so I wasn't even bothered trying to do that in a demo and then it just prompts you for other information that you can add we want to use a meta map and that's it and so what you can see here is that that demo module has been created for you in this local folder here directly it comes with a bunch of to-do statements where you can really start filling out this module there's some best practice that we've noted down here and things to take note of and then once you're done with that you can remove all of these to-do statements and hopefully add this module to the pipeline there is also another way to create a module so I'm going to come out up here and what I'm going to do is try and show you what happens if you're now trying to create a module in a clone of the NFCOR module repository so all I'm doing really is cloning the NFCOR modules repository locally and you might want to do this if you want to contribute your own modules to NFCOR modules and so here this is just a clone, you'll see a bunch of modules that are all installed directly in NFCOR modules and in fact this is where NFCOR tools is retrieving this information from the main GitHub as a tree installed in the pipeline so let's now check out a branch because we don't want to change master directly here and now if we run NFCOR modules in this directory NFCOR modules create even in this directory and add our demo module here what you see now is that more files have been edited so when you create the module in the pipeline context only one file is created and so this file is created because it's a local module to this pipeline it could be a module that you want to customize and add there directly however if you create run NFCOR modules create in a clone of the modules repository itself you get many more files created and this is because you're creating a standard module based on a template that we've added to NFCOR tools to allow you to contribute to NFCOR modules and so these files if we quickly go through them we might need to change directory here modules so now we're in this modules directory you'll see all the different modules that we have available each module has its own tests as well which is why you can see a number of files have been created in this test folder as well and so this allows you to add and fill out so if you have a main script with all of the same comments you would have had when you created it in the pipeline you've got a meta-yaml that allows you to add description for this module various things and this is the information that has passed when you use NFCOR modules info and so this allows you to essentially document the module itself the inputs, outputs, what the process does license and so on so those are the files that you can see here if we go into the tests folder so there's a demo folder here as well so test modules on NFCOR demo and there's a few files that have been created here for you and so this just allows you to really start creating this module and filling it out to contribute to NFCOR modules when you do contribute to NFCOR modules for this particular module to make sure it's working this allows us to do CI tests and that sort of stuff unit test to make sure it's working we use pytest workflow for that I won't go into the details of that I've put a link at the bottom of the tutorial there that you can go and get more information for this sort of stuff but here the main script is essentially for the test all it's doing is calling that main script for the module that we put here and then it's running a minimal workflow to test that that module works we have an extra config file here at this point it's just setting a sensible publish directory and then we've got a test YAML and this YAML is what pytest uses to run this particular workflow so you can see here this command that it's using and what's nice is that all of the output files that are produced from this particular module we can check the integrity of these in various different ways, we can check for empty 5-sums we can check that they exist we can check for contents and so on and so this means that over time as you update this module you can track why things are changing and whether things are breaking or not and so it's very powerful we have these sorts of tests for most of the modules now and it just gives you a bit more confidence as to when someone makes a change to this module is it broken or not and we want to make sure that it isn't so whenever someone contributes or updates a module that it's always working and this approach allows you to do that and so I think that's kind of it for me in terms of what we can offer with modules and the functionality that we have in NF4 tools it's constantly growing and expanding we've got sub-workflows as I mentioned coming next week hopefully we're working on prototypes of that which is partly why we had to restructure everything over the past few days and so stay tuned and hopefully you're able to follow this, apologies for switching things in between but hopefully you are able to get the drift of everything and thank you for attending Awesome, thanks a lot Ashil, I think I can take it over from here maybe you can pass me over to the spotlight see if I can get on to there I just noticed there was one issue there Ashil it may be an issue with your clocks in the background the time never changes maybe you need to get those fixed for next time as well also thanks for joining everyone for the third and final session today we're going to be talking about my session is going to be mostly focused on tower itself so I'm just going to quickly go back and share with you and go back into the environment that we had previously so going into here and I'm going to give you a little bit of a background on tower first I'm going to give you a demo of the application and show you the various ways that we can use it and how it can apply to what we're doing so kind of stepping back a little bit so I'm going to go firstly to tower.nf you can find this if you type in tower.nf or cloud.tower.nf next to tower you'll find it and this is the hosted version of tower so kind of what is tower itself you can think of it as a full stack web application for the management of your pipelines and the way that this works is it's essentially a service that is in this case running in the cloud and we can use it to launch next to the pipelines but also monitor those pipelines and increasingly so we can think about it in terms of other applications as well things like the data management side of it or the creation of the infrastructure or the sharing of results as well and it kind of comes together in a way which is very similar to the philosophy of next flow and that with tower you can install it sort of anywhere you are so you've kind of got this idea that you can put it on your laptop if you want you can even install it in the cloud and install it on your cluster and then separate from that is the actual computation like where the analysis runs basically where your next flow pipeline is executed it can also run essentially anywhere that next flow can run so you kind of get all the benefits of this kind of flexibility so when you're looking at this it may look like some sort of like SaaS application or genomics in the cloud applications of which there's a lot of them out there but in reality what we're doing with this is we're doing this so it's almost like you can think of it as like a pane of glass onto your computer itself so tower cloud here is available for you to try you can log in, you can sign in here I'd recommend just signing in with like either GitHub or Google credentials there for the first time and once you do that and you can see that there's a couple of screens you can get like the first thing you'll see when you log in is a list of pipelines here you can see that there's like a list of pipelines which are already available and these are ones which is sort of part of the showcase here the way that this works is it's a little bit like GitHub or these other applications that you have for repositories in the sense you've got organization in this case I'm in the organization called community and I'm inside of this workspace which is called showcase and you can navigate around using this button here so for example I could go to this space and here sorry organization you can see I've got these different workspaces here the community one is available and everyone also has their own personal one so inside of here I've got my I've got my own one which is available here so the basics of this is that if you want to set up your one you should have your own one which you log into which comes from there if you want to create your own organizations and particularly if you are say if it's in your lab or maybe in your at your university or in your company you can come over here and you can create a new organization so if you go to new organizations you can add one here and then you can feel free to add your own one for example the secure labs one here you can see you can you can sort of write information around it you can add that information and and importantly you can manage team members so you can for example add people into the organization you can give them particular roles in the workspaces as well as create kind of things like teams for example you can group people together to make it easier to do that you can even have collaborators and the like so what we're going to do to start with is just look at like kind of almost a little bit going back into the history of of why this came about and and how this came so I'm just going to go back now to the to the get pod environment that we have here and here you can see that this is essentially script and the seven that we have and if you follow the if you follow the material there around tower so we're up to the section of how to use tower in there we originally thought about this as wanting to be able to monitor our jobs and have like a full history of our essentially our next flow execution in the database also with the UI something that we could handle that we could walk away from and sort of monitor from anywhere and that was the kind of whole idea of tower to begin with so that the tower first concept was around monitoring and tracking and being able to have a kind of a database of your of your execution so the way that this works is that if you go into here if you go onto the section number 12 so there's like getting started with tower the first basic one that we're going to use is we're going to run we're going to run our pipeline exactly as before and we're going to add on to the end of it this with tower option and what that's going to do is next flow is going to be submitting the task and the run information to tower which we're then going to be able to visualize and monitor as the task as the job is running and then when it's finished we're going to know about that and it's going to sort of let us know and obviously we're running very small jobs at the moment but you can imagine that you'd be running super super long ones for this when it goes through now the prerequisite for this to work is that you have to tell next flow like where you're tell basically your your which use you are in tower and the way that you can do that is to create a token so you can create a token inside of tower first and then the first thing you need to just do is simply export that token so you actually you don't need this line here for our work but we do need to add in here this line here so you're going to export the tower the access token and then we're going to run that and the way you can find that access token is if you go on the right hand side here and you say your tokens you can just add a token you'll get given one for example I could put through here if I don't have already created this before and then you can see you can just simply copy copy that down I'm just going to delete that token so given that this is the live streaming etc okay so once you've got your token you can just copy it to the command line and then you can export it into into here with that there okay so imagine that we've got that set up so we've exported our token now we can simply run next law exactly as we had before so here you can see I've got script number seven it's going to run for our pipeline I'm going to save this here and then I'm just going to run this and I'm just all I'm doing is adding with tower and go to the end of this line and launch this and I'm going to jump through to my my tower instance then and have a look what's happening there so in the runs page here you can see that instantly it's popped up and I can kind of see this and how this is running you can see if I jump into this I can see the next full command line that was run here I get some sort of basic information on the parameters on the configuration and any kind of reports that we have in this case there's no reports I'm defined but this is a very small pipeline I ran through but you can see that we get some basic information we can follow the status of those tasks as they run you can see that in this case that we ran this is the working directory we this is that there was one container this is the container that we use it was running locally and here is the version of nextflow etc and we can dive a bit deeper than that we can get information on the processes so you can follow the status of those processes as they go through aggregate statistics on wall time CPU memory etc there's a cost database I'll show you that later on but that's more specifically about the cost and what's going to be really useful is the task table so now we can very quickly find so imagine that I had many of these tasks I can find for example what happened when I ran on the say if we want to run for example what happened when I ran salmon on the lung sample I can find that here I can see the exact command which got executed so this is the equivalent of that dot command of SH file and I can see some execution statistics and all of this is stored into a task table in the back end which we can use here here's the execution log etc this is kind of more specified when you run from tower itself you can see some metrics itself and important to know that this could be run in a shared workspace so this could be run at the moment it's been run in my workspace but if I wanted to make this available to you I could also kind of run my pipelines there and then you could see that and the way that you could do that is if you created an organization I'm going to go down now into these Acura Labs I've got one here which is just a workspace that we use for testing you can see that we're running some pipelines here at the moment what I could do there is I could go into and actually run that pipeline I did before into this workspace so it's from the same way but then everyone could see it so imagine that we're collaborating to see when something is finished you no longer need to tell me when it's done we can all work together in a collaborative way so by going inside of here in this workspace say that a one I wanted to use with 0FI pipelines I can export this ID as part of the workflow ID sorry the workspace ID and then we can see that so I'll show you how to do that tower stop documentation as well so if you go into next floor or next floor tower help you'll find the information here and on the left hand side here you can see we've kind of got the different aspects of this so there's kind of the users and the workspaces as I mentioned a little bit before here's the section on workspace management you can see there's kind of different aspects of that surrounds as well you've got this kind of concept of the shared workspaces which is what I was mentioning before the idea makes it available for everybody everybody to use there as well okay what I was trying to find here is the basic thing that you just see if I find here inside of getting started okay this one's there you can see that there's this export tower workspace ID so I'm going to copy that and take that into my environment here and then I'm going to put in that same ID number that we have from before so I'm going to take in tower1 just going to jump over to workspaces and copy that ID and paste that in there and now every time I run basically because of this and this can also be set from a config as well but now when I run this this is going to be running that shared workspace and everyone's going to be able to see what I'm doing so I can go back into this workspace here see this information and you can see in a few seconds hopefully this will pop up and it will it will show you there it goes script number seven and you can see that that's run just takes a few seconds to to load there and we can we can kind of monitor that as well okay so that was like that was the very basic kind of start of tower and that was kind of what we sort of started with and things in terms of monitoring but it kind of got to a point where they wanted to do something so much more powerful it's kind of annoying that I have to go here and kind of run from the command line what happens if I want to call that pipeline execution and I want to run it in a way where I don't need to have a terminal open but also where I don't even need to have this running in this location maybe I want to actually submit the next low job maybe I want to call it from an API maybe I want to automate it or make it be able to use as part of the service as part of that then it becomes a little bit difficult to be dealing with kind of the command line itself here and sort of using next law in this way what it would be fantastic to be able to do is be able to have an execution basically save pipelines that I can come into maybe I can have a user interface as well but some way that I could essentially have tower manage the execution piece not just the monitoring piece and we started with that there's kind of a couple use cases that we can go and we added the ability to launch pipelines from tower as well so what I'm going to do is I'm going to go to the community showcase and you should all be able to see this if you log in you should access this shout out in the in the slack if you've got any issues with that but what I'm going to select here now is going to go through and choose a pipeline that I want to run and like keeping with the theme of the other days now we're going to run a pipeline in this case we've got the RNA seat pipeline from that we can run and these are like the full pipelines which are which essentially there from from any of core if I select this one here you'll notice that we get this like user interface we get this kind of ability to kind of see this and this comes about from a file which is inside of the inside of the repository there for the any of core pipeline that we can run through and we can add this and we can kind of see this itself what you've what we can actually look at what that looks like inside of here so I'm just going to go to the actual pipeline so I'm do the pipelines and see the RNA seat pipeline here inside this repository you can see that you've got the information here for the schema itself and you may have seen that I guess you've seen that previously but that's basically sort of something that we that you can do from from that piece and that's how that schema is generated let's go into like how how we actually going to do this ourselves or if we want to add in our own pipeline here not just not just run this in tower so if you want to run a pipeline itself I'm going to go to the quick launch here you should be able to see this in your in your own environment if not here what we can do here is this is a little bit like building the command line for next flow this is like the equivalent of building this up so what I can see here is I've got the run name this is a name that I could give I could provide something to make some sort of informative run information here I can add labels in and these are labels related to maybe a project that I'm running or maybe I want to run because I want to run the RNA seek I could choose a computer environment I will show you this in a moment we'll cover the computer environments later but that is essentially where the compute runs where the next floor job runs and where the tasks themselves run and then obviously we've got the key information which is like the pipeline that I wish to run and I want to run that same pipeline which we developed over the last couple of days so I'm going to go back here not to the NF4 one but into the the next flow IO one and I'm just going to look for the RNA seek pipeline it's a little bit simpler so we can see a little bit what's going on there so all I need to do here is just copy the GitHub repository and this is exactly like the other day when we ran the next floor run and we we put in the Git repository itself here I'm going to go over here I'm going to place in the the repository name and now remember this blue circle what that's doing is it's going to GitHub or GitLab or the bucket or wherever you have storage or your pipeline there and it's going to refine for us now all of the revisions so here you can see this is the releases this is also the versions we put in here a branch so if you notice that you'll see Dev and DSL2 and Hybrid etc if I go into here you can see that these are the same ones here so it kind of pulls those up for us but it also allows us to do releases so for example the different releases which are available here become seen here as well and allows us to do that so I don't need to put one otherwise it's just going to default on that this is the location in our case in S3 but it will be dependent on the computer environment that you use where the extensional work is going to go and then we've got profiles and the way this works is this is going to read the next flow config and show us our profiles remember how we had like dash profile and we could write for example batch or we could write any kind of config so this is coming from here it's essentially passing through a big file and showing us the profiles that are available so you can see we've got docker, slurm, batch etc and you've got docker, slurm batch all of these things are available there now one thing to remember is that tower is going to take care of the actual compute piece so that we don't need to specify again that we're going to be using AWS batch here tower is already going to basically add some config for the execution piece now of course we can overwrite there we can add something else to it if we wish but the default piece of saying run an AWS batch doesn't need to be done because this is what the computer environment piece here is doing so for the most part this will be used more for some settings or parameters etc there is also parameters that you can add in here and this could be like a JSON or a YAML file that you would have passed in this is the equivalent of the params file option so when you're running the next floor you can say run and then you can put your script that you need or your pipeline that you need and then you can say params file and that's essentially points to some file for example I could have my params.yaml which contain that and this is the equivalent of what's happening here and there's a couple of other options you can add some next door config some tower config all of these things come available we're giving it a name we're giving it some labels we're choosing the computer environment and choosing which pipeline to launch so basic stuff now when I launch this here it's going to take me through to a monitoring screen and what's taking place here is a little bit different from what's taking place before so when I'm launching this pipeline now instead of tower instead of next floor say running on my local machine what's happening is the next floor the next floor run job is being in this case submitted to the cloud in this case AWS batch and from there the individual tasks are also being submitted onto AWS batch so you've kind of got this batch squared approach and the great thing about this is that we can submit the next floor job onto a normal instance say not a spot instance and it means that from there we can basically walk away and we can completely forget about our job we can leave this running you can see that we have pipelines which run for a really long time let's go and have a look at the RNA-seq when I just triggered this off a few minutes ago and this is inside of tower so here this is the actual full RNA-seq pipeline you can see that we get some command line here parameters, configuration this is quite cool so there's a lot of information here if you've used any datasets I'll come to in a moment you've got the execution log also the next floor log those execution timeline etc here you can see like everything that you'd see from running that pipeline if you were running from the command line as well and there's reports which we'll see in a second as well so going through this here you can follow the information, those tasks as they go through, you can see there's a lot more information when you launch from tower in this case we know a lot more about what's going on we can for example stream those logs back live in terms of the actual tasks logs as well we have a little bit more detail on what's happening here here you can also go through and if we see that this pipeline's a lot of different processes in here you can see all of that information we get statistics here on time CPU memory cost as well there's a database on the back end with all of the costs of all the cloud providers so all the instances all the regions, all the cloud providers and what we do is we basically say how much of that instance are you using for that task and then just sum it up over the time so you know exactly if this task is being running for half an hour and it's using half the instance you can take the instance costs and multiply half by half so you end up with like a quarter of the actual hour on the instance cost and we sum all of that up, it's an estimate it's probably a slight underestimate but it gives you like a rough idea of how much you're spending in there we have some work coming up next week which will be releasing at the summit which provides you the actual cost which will place cost tags onto those jobs so you can kind of look retrospectively exactly how much something costs, here you can see the amount of cores that I'm running in this pipeline in real time how many tasks are running and then there's a memory in the efficiency of the CPU as well which is available okay going through this here you can see the task information here this is here and you can see here the information on the individual tasks themselves so I could for example look for say wild type say wild type replica 2 and I wanted to know exactly what happened when I ran Star Align so I can select that task I can see the exact command which was run there and I can also see the information on the execution so I can see it ran for 2 minutes 39 here I can see that if I remember that the task itself is right inside of a container here you can see that it ran inside the Star container it ran on this queue it requested 2 CPUs 6 gigs of memory in 3 hours and then I can look at that I can say okay well it actually got placed on this RF4 large machine type in this region on the spot model and it costs this much and this is how we get some estimates of those costs we can also compare that to the actual memory that it used we can also look at how much did it request versus how much it used and we can start to optimize the pipelines based on that information and based on that you can get a visualization of that when the pipeline is complete so let's jump throughout to see a completed pipeline here so I'm going to jump through to a pipeline which maybe we've run earlier here since from yesterday and you can see here that we've got some a little bit more information when the pipeline is complete so you can see the reports from next flow you can see the reports here and this is a way of defining essentially that the outputs of the pipeline that you may wish to share with people maybe used to to use outside maybe you wish to send us an email here you can see we can define a multi-QC report so you can define any HTML here you can see the information from that report as well you could also see that so here's our report makes it nice and easy just to find that that typically may be in some history bucket or the like but having that information there is really useful for us as well we also have the ability to have PDFs or any kind of images there I can open that one up as well and here you've just got some images which are created from that pipeline as well as things like the structured data itself so this is a TSV file in this case which has been output I can choose the delimiter there I can search through it as well if I wish to find the genes open that download that as well what else is available when we have this so when the pipeline is also complete we can see the information regarding the metrics so here is the CPU and you can see that there's a CPU allocated or how much we requested the memory and we can see like what percentage of the allocation here and you can see that in some of these tasks we're actually requesting a lot more memory than has actually been used by the task in some case there's a minimum of AWS batch of one gigabyte so you can't really get better than that in terms of that optimization but you can also see here on time so if you work in a cluster sometimes it can be beneficial to just request the amount of time that you actually need otherwise there's a very kind of a low score in terms of your prioritization on to that and you can even see things like read write as well okay I want to touch quickly before I'm going to go on to the computer environments because that's really where some of the magic is happening and how this is all possible but before I do that I just want to quickly touch on the data sets themselves and when you run a pipeline so some of these ones from the community showcase the ones that we have particularly with regards to the input schema and the schema for those input files you'll notice that when I launch say this pipeline like RNA-seq here I have this ability to select the input and I can actually just search for RNA-seq here and I get this file in this version of this input data which is provided to me the way that this works is this is a way for handling data sets in particular data sets that are sample sheets or structured data that is sort of in this structured form here if you navigate to data sets you'll be able to see this for example NFCOR RNA-seq test and this is a description of that data set but we also have in here now the actual file itself which I believe is a CSV or a TSP file and here you can see we've got the sample ID in this first column we've got the FASTQ1 and the FASTQ2 metadata around how that handled, how that would be processed now and what it's possible to do is by uploading a data set here and from that data set then you can use it as an input to your pipeline and the kind of way we were thinking of the direction that this is heading in and I'll point out another fantastic little tool that was developed by Phil who was speaking earlier on the SRA Explorer the concept here is that we are able to find data sets essentially metadata and from that we can import them into Towers so for example if I was to search for some samples themselves I could find these particular samples that are here I could select them for example created data set itself and from this data set I end up with this TSP file here this is an idea we've got the identification of those and the important thing is that we've actually got the file itself so here you can see we've got some FTP or we've got the SRA which we could also use we've even got the first ERA identifier here which can be used what I can do is I can download that TSP file inside of Towers and you can do this by API as well I could for example upload that into here and then by selecting this first row as a header now all of a sudden that could be the input of my pipeline so we've kind of gone from this very quick way to find data sets to construct a sample sheet in this case from SRA and then which could be used as an input to your pipeline now it's important to remember that for this to work your pipeline itself the way that you write the pipeline should match up so you would have to say that accession is the ID for example and that you want to use SRA URL as your file identifier but that could be fully processed in parallel and there's some tools for doing that which is part of the NFCOR pipeline for downloading fast queues etc which you can you can see okay so that's a little aside on data sets we're looking to evolve this quite a lot as well and so they continue going down this direction for for helping people out with regards to the compute so far when I launched this pipeline I didn't really mention it too much I basically just launched in that environment I'm going to just switch case to a workspace that I have here for my verified pipelines and here I've got when I go to launch a pipeline it's exactly the same I just got some more compute environment set up in this one when I go here I can select for example let's say I want a pipeline to launch I'm just going to select the same RNA-C pipeline for us now I can now choose a compute environment that I want to launch this in notice how my work directory here is S3 Paris it's because this compute environment is tied to that bucket and ideally you want your compute and your bucket in the same region in fact I think it's mandatory what we what we can do though is we can select that we are on the same pipeline maybe we are on the specific version of this pipeline and we want to run that now not just here but if I want to run that now on a sure batch in East US completely possible you notice that now my work directory is updated likewise I could run that on my Sloan cluster so I could connect into my compute resources that I have on-premise and Tower will submit the next floor job onto those resources and I'll kind of get all the benefits of using Tower there so this makes it possible then to connect in any of this compute and the way that this is done is by setting up these compute environments themselves setting up these compute environments is typically a one-step process something you do once and this can be defined from these compute environments themselves here so these compute environments here you can see I've got SLIM set up I've got Azure Batch and got Google Life Sciences when you create these you can add a compute environment here and when I add the compute environment I can look at here the actual platform so we have support for Amazon Batch for Azure Batch for Google Life Sciences we also have the schedulers here so this is the main one for Altair for PBS Pro for Grid Engine, LSF Moab and SLIM as well as for Kubernetes and there's like specific executions here for EKS, GKE and Kubernetes sort of the normal version there as well I'm going to show you through what it looks like to create this platform on Batch and then we can look at one for the schedulers they're all quite similar between them let's say so with Batch we've got a lot of options and how we can do this the minimal requirements I'm just going to choose a region I'm just going to choose in this case let's go with Singapore and then from here I can choose a working directory this is going to show me all the S3 buckets that are connected to those grid engines now as I said before you should select here a bucket which is set up which is basically set up for that region and the other thing that I'd recommend doing is that if you're going to use this as a work directory you're just using it for caching for one, two days maybe one, two weeks you don't need that data anymore because you've moved your data with published or somewhere else then I would set up a policy on that bucket so I would set up a policy that says after one week delete that data and this way you don't end up with a lot of cloud costs with regards to storage with the temporary files which can be removed so here I'm setting up a bucket I've set up the region and then I've got a couple of options I can choose to do a forge mode which will create everything for me which will create me a queue computer environment I can say I want 1000 CPUs and essentially set that up and that's kind of good to go from there if I create that now that will add all those resources and then environment and Amazon batch will be available for me one thing I didn't point out is like the credentials themselves they need to have particular permissions and if you go into the credentials section here or you add the credentials for the provider here you can see that if I wish to do that I should have these permissions here and this is a description of how you can add those permissions you can see the exact policy which is required so if you're familiar with AWS you can just copy paste this into a policy and make sure that the user has those permissions we have full instructions on how to do this into tower if you're unsure on how to do that that creates that computer environment makes it available for people to run if you want to connect then with your say your slurm or your on-prem environment maybe you've got grid engine here you can set this up selecting the platform that you've got so in our case that's select slurm and then the way that you can do this is you've got two options for connecting into your cluster you can use SSH which is in which case you provide to tower the SSH key and that tower will then be able to essentially log into your cluster and do the next floor run piece or you can use what's called the tower agent and the tower agent is this little piece of software it's a little tool which runs on your HPC cluster and from there it allows tower to connect back out to it so you can simply run this in here run that little agent you put the connection ID and then tower knows essentially the agent instead of tower connecting to the agent sorry to the cluster it's more that the agent connects out to tower and you get that connection which is in this way the rest of the options are very similar here except obviously you're not specifying the resources now because it's actually slurm it's going to be doing the execution piece so you specify the work directory for that execution you specify the launch directory this is where the next floor run command is going to take place and then if you're going to be using SSH you've got the username this is the username on the cluster that you have so this is basically the location of where you're going to be logging into and if you've got reports which are optional as part of that here you've got two options you can put the head queue in which is where the next floor job will run so typically where you'll set that up is that next floor will run on a head queue and then you'll have the compute queue which is where the actual pipeline jobs and this can be overridden by the pipeline configuration as well so again once that's set up you have the pipelines here you can run you can do a quick launch let's say I'm going to select that I want to run that same thing in slurm I'm going to choose my pipeline that I wish to run here and I want to kick that off and it's good to go there okay other aspects which are important here is that inside of all of these let's just go to another one let's go to some testing pipeline here inside of here the permissions boundary for all of what I've shown here the launching of running datasets etc is driven inside of this workspace so inside of the workspace here this testing workspace for example there's participants and those participants have different roles so here you can see that we're all admins because it's just some development testing environment but I could also be an owner I could be an admin which is basically for missions a maintainer is someone who can add new pipelines and change the settings of those but they can't add computer environments a launcher is someone who can launch pipelines which already exist and a viewer is the kind of viewer in itself so those permissions are kind of defining that the other important things on the kind of set up side of this you can add in credentials here those credentials can link into for example AWS in terms of in terms of the cloud providers you've got the source code management so this is how you're connected in to run repositories maybe you've got something in GitHub which is in a private repository you can add your GitHub credentials in there to do that there's also support in there for SSH for tower engine which I mentioned before for created into to do that as well other aspects of this is you can think of credentials as as like one piece of like the almost like the configuration of your infrastructure configuration of the next slow job piece often you'll have what we call secrets which are maybe API tokens or passwords or or keys that you need to use to access and they're used inside of the pipeline so maybe they're used by a tool inside of that pipeline example of example of this I can show is that in here we're running dragon for example and we've got the dragon password and the dragon username is stored in here in the secrets section we also have a license for running sentient we've also got an NCBI access key here and by having those things as secrets if we go into the next flow secrets here section we can we can either reference them in configuration or inside an actual process so there's a new directive called secrets which allows us then to have a placeholder for that secret and what happens is when tower runs it will place those secrets into a secret store so with an AWS batch there's a secret manager or if you've got something it will be a variable in your HPC and that means that the secret itself is never actually stored in the code or never stored in the logs and it means that you can get this essentially handling of these secrets without actually having to publish them anywhere as well it can be very useful for those kind of applications as well okay I think that's the kind of the main pieces with regards to towers this is a lot of development I should point out as well one thing that starts to become quite useful is around automation and so far I've shown everything from the user interface like I've been launching these pipelines creating computer environments or workspaces etc what is more useful though is to consider how we can think about actions here. Actions are a way to trigger a pipeline based on an event and we've got a couple of events that are available here and there's a github book here so if I commit to this repository here this will fire off the execution of the pipeline based on these settings so this is a way to trigger the execution of a run based on the commit and you can imagine this would be useful for like CIs and DC settings where you maybe want to test the release of the pipeline make sure it's actually running against like a full data set in a kind of distributed way things like github actions are great but they're not designed for doing really long running heavy heavy lifting jobs then you've also got RNAseq launch hoc here and the way this works is that there is creates an endpoint for you so I create a pipeline essentially define the webhook as the action and what happens here is it's going to submit the pipeline essentially if I hit this so if I copy paste this I believe this should actually work if I place my token inside of here this will launch the execution of the pipeline this is a really great way of connecting tower to other elements so basically if you want to launch a pipeline tower from some other environments as well what also can be great is considering that if you want to do this whole thing from the API so this is like an easy endpoint into the API but the full API is also published as an open API as and if you go into here in the standards here sorry in the settings here if you go into the endpoints you can see everything I've described so actions, compute environments credentials launch org etc all of that is possible to do here from these endpoints and this makes it sort of flexible and there's a lot of people who are building there's a lot of organizations who are building on top of towers so that maybe they're building their own vertical very specific front-end that they wish to use and they're kind of using towers as a kind of back-end management service for for the pipelines and for other things as well other aspects of that is that there's also I should point out a mixed flow tower CLI and this is extremely useful for situations where you are it's a scope here where you want to for example you're comfortable with using the command line but you don't want to you think you still in some way want to have all of the benefits of tower but running remotely having access to all of the information afterwards the logs the automation etc and instead of running mixed flow run you can use this new command line tool and it's called tower launch you can for example list the pipelines that are available inside of your workspace you can you can see the actions what it's even I would say even more useful for though is setting up the infrastructure setting up the research right so with tower here you can define for example you can import and you can export pipelines so all of this showcase here all of these pipelines are actually defined in the Git repository and we just use the command line we say tower import tower export all of this environment so that basically we can set up this whole thing completely reproducibly and it kind of fully kind of automated way as well so that allows you to import export these things and and really just kind of deal almost like with infrastructure as this code like setup for doing that so I hope you got I hope you got a good flavor of that the model for this is that there's tower cloud which is available here and we also have tower enterprise which is mostly focused in organizations that want to install tower in their own environment where they want professional support and the like and that's kind of what we offer as part of SecureLabs but I'd recommend giving this a try so you can just log in with your GitHub if you've got any questions on that as well put them in the Slack there's also a Slack channel in the next flow Slack as well which is available which is under tower help so if you've got any questions with that feel free to ask them my time is up I think our time together is up but I'd like to thank everyone for taking part also thanks a lot to the instructors particularly Chris and Tushill and people who've been getting up at 5 o'clock in the morning here or 4 o'clock in the morning but in some cases so it's been fantastic to have you all on board did you want to come and say some final words Chris otherwise I can I'm happy to first of all now in there no I think you covered it nicely there just my really warm thanks as well who've run for attending and just to make sure I really encourage anyone who does have questions just to dump those in the Slack it's a really great community and someone will get back in touch with you really quickly sorry I think everyone might have been on spotlight for all of that but hopefully everyone heard me it's natural awesome thanks so much everyone I really appreciate your time and do stay in touch I look forward to seeing you around the community