 Thank you very much for joining us for day three of the NextFlow and NFCore training. My name is Phil Jules, I work at Secura Labs and I'm a lead developer advocate. So that just means I'm basically in charge of trying to make sure that the NFCore and the NextFlow communities are big and happy and developing in the right way. So it's my pleasure to be here with you on day three. Today is gonna be a bit of a kind of a change in pace compared to what you've done in the last couple of days. We're focusing less on the core internals of NextFlow and how to write the NextFlow code and more on this NFCore community and all of the kind of ecosystem that exists around NextFlow. So NFCore is a, I'm a co-founder of NFCore but it's a big community and has lots of people involved and basically it's got lots of really useful stuff for you. So hopefully with the foundation that you've gotten the last couple of days this will make a lot more sense to you and you'll be able to really get stuck in. So let's kick off. I'm gonna go on to a very high level, very quick recap of where we are at NextFlow. So NextFlow is this programming language where you can chain together processes using channels and you put them together encapsulate those within a workflow. Those processes can be parallelized, have implicit parallelization. NextFlow also gives you re-entrancy by which I mean you can resume a workflow and it picks up where it left off and it uses cash results where it's already finished specific tasks. And NextFlow gives you very high levels of reusability and reproducibility. Underneath that kind of language level NextFlow handles software and also compute interactions. So with software it installs everything for you basically with containers or condo and with compute it talks to your HPC back scheduler or your cloud compute or your local server or whatever it is. And then also it talks to the wherever you're managing your code for the pipeline itself. So get or GitHub or whatever. Hopefully the main takeaway for you is that NextFlow is very reproducible and gives you pipelines which are very portable between different people and different systems. Hopefully you're already sold on this that's why you're at the training. But so how is this different then to NFCore? NFCore is a community efforts that is the separate projects NextFlow kind of sits side by side with a lot of overlap of course and was born around 2018 after one of a NextFlow meeting in 2017. And the idea basically at the time was that you can write whatever you like with NextFlow. It's a programming language at the end of the day. So although you can build very reproducible very portable pipelines you can also write whatever you want. You can hard code paths which would break portability. You can choose to write things in all different ways and different people will build pipelines in different ways. So NFCore was started as a way to bring together people building similar pipelines to develop a set of best practice guidelines and to collaborate on a single unified set of pipelines which can be used by anyone anywhere. And importantly we wanted to reduce kind of level of duplication where we could see lots of different people writing very similar pipelines in NextFlow and we're like, well, if these pipelines are so portable and anyone can run them why not get together and we all work on one pipeline and make that kind of gold standard. So NFCore community as someone coming into the community kind of offers a few different things for you. The most obvious thing is the pipelines themselves as off-the-shelf pipelines you can go in and you can pick the one that you need for your analysis and run it, great. There's also a lot of kind of tooling like I kind of think of it like third party tooling almost you don't need it but it can help you. It makes your life a bit easier a bit more user friendly for running NextFlow and also for developing NextFlow code. And also something that Harshal is gonna talk about in the next session after me is the modularity and reusable components within pipelines. And this is a newer part of NFCore that comes with DSL2 for NextFlow. So you have DSL2 modules. One of the key differences between NextFlow and other kind of repositories and things you might have come across is this very strong emphasis on community. So we say that we really want you to come and talk to us and be part of a community before you start out in your pipeline. We don't want you to sit there, write a pipeline and come and say, I want to add it to NFCore. We want you to come with an idea, see if anyone else has proposed it, see if you can find anyone to collaborate with you and really kind of collaborate and communicate with the community as we go along. We have a standardized template which comes with all the boilerplate code for NextFlow pipeline and tools that use that. So we say that you must use the NFCore template and we rely very heavily on this kind of Git and GitHub system of creating issues and using pull requests and through pull requests having kind of peer review of code and that's how we maintain standards. So we want to collaborate and don't duplicate. We have one RNA seed pipeline. And if you want a new feature, you add it to the existing pipeline rather than creating a new one. So just looking at the pipelines today, we have 39 which have a little green tick, meaning that they've had a GitHub release and 24 under development. The first release is particularly important because that's where we have very high standards on the community review. So if a pipeline has a first release, we're saying it's got the NFCore stamp, it should be super reproducible and good to go. Lots of pipelines in development and haven't got that far yet, but some may already be pretty fully fledged pipelines that may well be in use in production in various facilities in different places. Some may be earlier stage, but it's kind of a case by case basis there. But the idea is that they're all on their way towards getting at least the first release. And then we basically never delete pipelines, we archive them so that you can always keep using the old versions. And so there are some which are no longer under development for whatever reason and those are archived. As well as pipelines, we have these DSL2 modules and modules are kind of sub components of a pipeline. You can build a pipeline from sticking modules together, which are basically different processes. And as we're moving towards this new system of building pipelines with DSL2 and using this module system as much as possible, we're growing this big library of shared DSL2 components, which you can use for your pipelines. And today we have 632 modules, or at least we did last night, I don't know if any have been merged in since then, which is a phenomenal number. So in the training, you had a process for salmon index, like creating a new reference index. So that would be a DSL2 module and you have salmon align and that would be another DSL2 module. So it's not 632 different bioinformatics tools, but it's a very large number. And this is really been going for about a year or so now and has been kind of climbing steeper and steeper. So there's new modules going in all the time, which is really good to see. So the NFCore website, hopefully you've all been there, all the training material and stuff is there. And you will see there's a big button when you go to say as view pipelines and you can kind of scroll through all of these and find the ones that you want. On the topic of community, so I said we started around 2018, and this is a plot of kind of visitor statistics going to the NFCore website. If anyone has any idea what happened near the start of 2021, I'd love to hear your ideas. And there's one other blip. I don't know if you noticed it on a far right. Just at the end there, we've got this training and we've seen a big peak in traffic. But you can see it's been a nice steady climb in the size of the community and the number of people using NFCore and visiting a website and that shows no sign of slowing down. Today we have over 3,500 people on Slack, which is phenomenal. So it's a really strong point of our community that you can go on to Slack and get help pretty much any time of the day or night now. There's usually some people around to offer help and I think it's really enjoyable. A lot of people really love interacting on Slack. We have nearly 500 people part of a GitHub organization. We have one and a half thousand people who have made issues or pushed code and have actually made some direct contribution to a pipeline. And just a phenomenal amount of code. Yeah, all these pull requests, over 10,000 pull requests now across all of the different NFCore pipelines in the last four years. So it's a huge code base now. We are geographically pretty well spread. Like NextFlow, we originated mostly in Europe but have also got a strong presence in North America and we're really trying to push kind of global diversity and inclusivity now. So we've got funding from the Chan Zuckerberg initiative. So Marcel, who may or may not be around on Slack is based in Brazil, for example, and Chris is representative for Asia Pacific and we're really trying to kind of push representation around the world as much as possible. Just to wrap up then on the kind of general instruction we have a paper that went out in Nature Biotech in 2020. So if you're curious, give that a read. It's a very short paper, just gives some background as to how we created the community and why and if you dig into the supplementary methods there's quite a lot of detail on like how we've set up all the backend and how everything works. Parts of it will be out of date now but the concepts are still pretty much the same. With that, let's get started. You came here to learn how to build pipelines and how to use NFCore tooling. So let's do that. And this is not a super friendly link down at the bottom here but if you go to the NFCore training page here so this is for the one that has all the streaming links and everything we've got the schedule and everything. How am I doing? 20 minutes, I'm a bit ahead of time, great. So we're gonna talk about general intro. I'm gonna talk about creating pipelines now and schema. Then there'll be a short break. Harsha will come on and talk about DSL2 modules and then we'll wrap up this session with Evan talking about next load tower. Anyway, above these links you'll see here NFCore training and we've got the written tutorial material here. So you can click on that and get through to everything I'm gonna talk about and Harsha, there's all written down here. And most importantly, there's a big link here. I think I also linked to it in Slack. This page will stay up on the left here if you go down to so documentation, contributing and then tutorials, you'll see creating pipelines with NFCore. So that's the one we're going through today. So feel free to follow along with me as I go through all this and I'm just gonna move this browser into another tab so that you don't have to have all the space taken up. So if you hit launch Gitpod, you should be familiar with this from the last couple of days, you'll get a Gitpod environment pop up which looks like this. So I've got the S code running in my browser here. And most importantly, we're not gonna do so much coding in my session. We'll do a bit more with Harsha but we'll be using the terminal quite a lot here. Because of the way that this image works, we're actually based in the NFCore tools repository which we don't really care about at all. We just want to use the environment because it's got the software installed. So the very first thing I'm gonna do is just clear out all the explorer and everything on the left and make a new directory to work in. So I'm gonna go to my home directory and I'm gonna make a new folder called training. Training. And if I go into here and find the file path and then what I can do is I can actually change the explorer here. So it doesn't have that tools one and this desk has this directory which I'm now sat in. So if I go into open folder, so I did that a bit quickly, file and then file, open folder or yeah, for me it's control shift O. I think if you're on Linux and stuff it might be a different command line option here. And then you just put in our directory that we've just made and hit okay and then everything will reload. And now you can see we're in the training folder and there's no files in here. So it's nice and clear. Okay, so that gives us a bit more of a blank slate to start with. Just a quick note that a couple of times when I was doing this before I was creating files and they weren't showing up on the left. If that happens, hit this little button here to refresh the explorer and they might appear. They might not happen at all. Right, I'm gonna make the terminal nice and big. So the reason we're using this particular Gitpod environment is it comes with everything pre-installed. So if I do NFCore version we can see that NFCore tools are pre-installed. Also, I think NextFlow should be pre-installed as well. Okay, so we've got NextFlow 22.04 and NFCore tools version 2.6. If you're following this on your own system and or doing a later date on your own system you all need to install NFCore tools yourself. NFCore tools is actually written in Python and it's hosted on PyPy. So if you go to PyPy and type in at the top NFCore you should find it and then you can do pip install NFCore. It's also on Bioconda. So you can install it from wherever your favorite channels are. There is a docket image but for the purposes of this tutorial I would advise against using it if you can. Right, okay, so let's have a look at NFCore. What can it do for us? But the NFCore minus minus help will get spit out a nice kind of help text. Wave your hand and shout, by the way, if the text is too small, if you can't read the code here and someone will tell me in Zoom and I'll zoom in a bit but for now I'll crack on. So for do minus NFCore minus minus help you'll see that there's a bunch of sub commands and these are kind of grouped together here. So we've got commands for users. So these commands are primarily for people running NFCore and NextFlow pipelines. And then there's commands for developers which is what we're gonna be a bit more interested in today. But when you have a bit of time kind of have a poke around because these will probably be useful for you as well. I'll quickly show you a couple NFCore list. Just basically tells you what NFCore pipelines there are. It's equivalent to going to the NFCore website but a nice kind of extra feature is we can say if I run a pipeline, so if I do NextFlow run or NextFlow pull, if I pull the Chipsy pipeline here. So this is what happens as well. If you just run NFCore Chipsy NextFlow clones that repository to your NextFlow home directory. If I do NFCore list again, now it tells me that I have a local copy of this pipeline and it looks at when I downloaded that, when I last pulled that and it compares as to whether that's the latest release or not. So this is quite nice just to quick check to make sure that the pipelines are running up to date. Another one that's quite nice is NFCore launch and this gives you a nice kind of more user friendly way to run pipelines. So if I do NFCore launch, let's do Chipsy again, see if this works, I haven't tested this one ahead of time. You can see I can choose which release to run of that pipeline. So I released a run with version 2.0 and then I can choose to either launch a web based graphical interface for launching this pipeline where I can fill in all the different parameters and we'll talk a little bit more about that later or I can also do that on the command line which works offline and this launch tool works with any NextFlow pipeline. So it should also work with the pipelines that you built the last couple of days. And then it sort of takes you through everything and shows you all of the various parameters and gives you help text and everything. So it's a bit more user friendly. I'll go into how it all works in a little bit. Right, anyway, that's a kind of high level view of what NFCore tools does but what we're really interested in here is creating new pipelines. So let's have a look one more time. Commands for developers, the top one, one that you're most interested in right now is NFCore create and this creates a new pipeline. So I do NFCore create. I can use various different command line flags here to do things but most of the NFCore commands, if you don't use the prompts, you'll be prompted interactively on the terminal which is kind of nicer. Has things, for example, if I type in invalid workflow name it will tell me that I should use a less ridiculous name and have to be lowercase with no punctuation and stuff. So you get a kind of in place validation as you go along. So let's call my pipeline demo. Give it a little description, say here I am. At this point, you can choose to turn certain parts of the NFCore template on and off. So if you want to remove the NFCore branding, you don't want to have a big NFCore logo at the top, you can say so. If you're doing a proteomics or a machine learning or whatever pipeline and you don't need the stuff that's related to genomics with reference genomes, you can say so and it will cut that bit out of the template. So you can customize a bit interactively here but for now I'm just gonna say, give me the full thing, give me a regular, a regular pipeline. Not quite sure what's going on here. I found a new bug, I clicked yes, that's why. Sorry, let's try that again. I said I wasn't gonna do it and then I did demo test pipeline, I'm gonna say no, I don't want to customize it. There we go. So do read what it says here. Firstly it tells you to come and join the community which I've already hit home hopefully and it also tells you how to get started with your pipeline on GitHub or whatever kind of remote code repository. Now if you look in the file explorer up and left you can now see that we've got loads of files here that's created a pipeline for us in a directory called NFCore demo. And I'll quickly touch on a couple of these files but I won't go through all of them exhaustively. The first thing to note though in relating to these example commands here is if I go into this directory you'll see that it's actually initiated a Git repository for me. I can do Git status and it shows that there's nothing going on, I can do Git branch and you can see that there's actually created three branches for me here which are the standard branch names that we use within NFCore. Master should always be the default and it should always have the code from the latest release. So if someone runs a pipeline with NextFlow or download the default branch then by default they'll be running the latest stable release. Development work happens on dev and template is used for automatic template synchronization. So you never touch that mat branch that always just has like the vanilla template without any customization in it from different versions of NFCore tools releases. And every time there's a new release if it's your pipeline's part of the NFCore GitHub organization you'll have an automated pull request created which will bring in the changes for that release into your pipeline. And that's how we keep all of these 60-odd pipelines synchronized and if we fix a bug in the template it goes out to all of the pipelines in an automated way. If you're using a pipeline outside of NFCore community you can also do NFCore sync and run the same stuff locally and that's just what's done by the automation. So you can still get those improvements in as well. So three branches. And so typically what you would do then is you go to GitHub and you would create a new repository on GitHub. Make sure you don't initialize it with a read me or anything. You don't want any commits in there. You just want the bare repo. And then you come back onto the command line here and you add your origin here which you would get from GitHub. This is, you'd get an example commands to paste in from there. And then you push these commits up to GitHub and you say push minus minus all which pushes all three branches. That's quite important. Not just the master branch. If I do you get log, you can see it created one commit already. This one is quite important because this is that vanilla commit that I talked about which has just a template before you made any modifications. And this is shared across all those branches and this is needed for that automated synchronization. It's via this method that Git basically knows which bits you've modified from a template and which bits have been modified within the templates. So that's why it's really important that we initialize this Git repository for you to make sure that you have that first commit. Otherwise you're in a world of pain with merge conflicts later on. Okay, great. Let's have a quick look. Just very, very fast. I'm going to skim through some of these. Many of these files you will never need to touch. You should never edit. And some of them you will need to edit. So .github has files for automation on GitHub Actions. So generally you don't need to touch that. Assets has files that the pipeline might need. We've got a sample sheet here, for example, for the test config profile, minus profile test. Bin has any pipeline scripts that you want to run. So anything in here would be added onto the path for your processes at pipeline level. Conf you do need to pay attention to you because this is where all of your configuration for the pipeline goes. So base.config, for example, has all the kind of the defaults basically. And then modules has all the configuration for the modules that you're using. And Harsh will talk about this in his section. Documentation, ideally you add to these documentation markdown files as you go along. And keep, you know, having really good documentation is one of the guidelines we have for NFCore. Lib is just stuff you don't need to think about. That's just part of the template with helper functions. And then modules and sub workflows are shared code components. And that's what Harsh was going to talk about. Workflows, this is your main workflow where all the kind of meat of your pipeline goes. So this is where you set up all your channels and you import all of your modules and you run the actual main workflow. And that's imported then by main.nf and main has very little in it that basically just pulls in that workflow. And that's because you might have several top level workflows. Then many of these kind of files in the roots, you don't need to touch their things like config files for Gitpod, config files for NFCore, config files for code linting and stuff. Obviously citations and changelog you'll want to update as you go along. And next flow config, you'll need to edit as you go along to add new parameters and things. Okay, just to convince you that this is a functional pipeline that we've created here already. I'm going to go up one directory. So I'm sat here and I'm going to do next flow run NFCore, minus profile test Docker. So that uses this grouping of config. So we have a profile called test which provides a sample data which will download off internet at runtime and provides all the parameters that the pipeline needs to run basically. Every single NFCore pipeline must have a test profile which will run a very, very small test data set. And this is what we use for all of the automated testing and things. But it's also great for you because the first thing you can do with a new pipeline is run minus profile test. And if that works, then you should be well set up. If that fails, you know that something's wrong with your local system because it should work because we test this all the time. And then I'm going to tell next layer where to publish these directory files, these files too, so I'm going to say test results. Right, so next row should now kick off pick up our brand new pipeline and we'll see if it crashes, hopefully not. So you can see in the work directories popped up on the left along file and our test results file. So you'll see that the header from all the NFCore pipelines looks basically the same. We've got a version of the pipeline that you're running. So I've created it from the template. So I always start with version 1.0 dev and you push this version number up every time you do a GitHub release. And then it tells us to call next row options, some of which should look familiar from the past couple of days like project there, where the pipeline is and where I launched the workflow from, things like this. And then it shows me which of the parameters for the pipeline I've modified from the defaults. So here the minus profile test sets these parameters for me basically for input sample sheets and I set on a timeline where the test results should be saved. Right, and then you can see it's kicking off. The template comes with a sample sheet check, a sample sheet, and then it runs a first QC, it pulls out all the software versions from the tools which have run. It does that dynamically because we don't trust that you run what we expect because you could override however pipeline gets the software. So you could be using a local install or something. So we get the tools to print their version on the command line and collect those for reporting. And then it finishes up and it puts all of that into a multi QC report. And if we go into test results you can see here's the multi QC report, here are the fast QC reports and here's our next flow kind of pipeline reports. Great, happy days. We have a pipeline and it works. So we're off to a good start and our kind of starting position now is if it breaks from now on then it's something that I've done wrong. So if we look into all these files it's a lot of files here, it's quite overwhelming. Some you should be editing, some you shouldn't be editing and it's understandable but when you kind of first get hit with this you can kind of don't really know where to start but we do a few things to try and make this a bit easier for you. So for example, if I go into the workflow file here scroll back up to the top you can see here that I have a special comment line here marked to do NFCore. The highlighting here is because I have a plugin installed within VS Code which highlights them so if I get rid of that you'll see it's just a regular comment line so it's not executed in the code at all. But this is telling me then there's a little bit of help text just saying this is one of the places you should be editing the code from a template like this is where to get started. Go here, add this and in each one of these to-do comments has a little description of what's expected. These are scattered throughout the template we try and put them in all the places we know you're going to have to do something and if you have plugins then they're quite useful because they can pick them up in a nice way so this VS Code plugin called to-do tree looks through all the source code and picks up all of these to-do statements. So demo.nf you can see and of course here we go. And so if you're especially if you're running locally I'll actually find them all. And so you can use this as a to-do list of to-dos if you like and work through all of these and it gives you a good starting place and make sure that you haven't missed anything. So in order to standardize the code as much as possible we do a lot of testing and I'm going to quickly talk about how this works. So I'll start off with the simplest stuff which is that we have lots of different people writing to these pipelines, working on these pipelines. If you're new to kind of collaborative coding it can be quite a change in the way you work because if you're writing YAML or Python or JSON or whatever language you can write perfectly valid code and someone else can come and add to that code and they might have a different style and that's quite jarring. The other thing is we want to try and maintain the code diffs to be as small as possible because that makes all the synchronization stuff with the template much easier. So in order to try and standardize the way that we write the code we use code linters or code linting tools and these validate the formatting of code and also can automatically fix the formatting codes to standardize things for us. We use a couple of them, so for Python files we use one called black which is very commonly used in the Python community and then we use one called prettier which does YAML files, JSON files, Markdown files and so on. If I go into the change log here you can see this in action. So here I've got a heading and there's a bit of space but if I delete that space I can do some other stuff. Make a numbered list and do just number one all the way down. This is perfectly valid Markdown code. This will render fine with basically any Markdown rendering but it doesn't match the linter. And now I have a plugin installed actually down here so it's prettier. So hopefully when I hit save right now it will run that. It will see that there are some changes in formatting and it will automatically fix it for me. So if I hit command S now to save the file, moment of truth. Okay, it's because I'm running in the browser. I can also do it via command shift P to open the palette here. It's not gonna work for me. Okay, we'll do it in the terminal. If I was running in my local VS code you can set all this up so that when you hit command S it automatically formats for code for you. I was using my local VS code when I ran this session earlier and it worked pretty well. Okay, so I'm gonna CD into the NFCore demo at the bottom and I'm gonna go prettier minus, minus check file path. This will check all these files and it says, hang on, I found something wrong in changelog and then I can do prettier minus, minus writes and that will fix that file for me. Sure enough, there you go. It's put some spacing in around the heading here. And now if I run check again, everything's fine. Great. So because this is a command line tool we can run this every time you push any code to get up or try and open a pull request. We can do this to check that the code that you've written meets our kind of style guides and then we can easily update it to fix that. And like I said, if you have your VS code environment set up properly with or whatever code editor to use as plugins for basically all the major code editors you can set it up so that it just automatically runs every time you hit save and you never need to think about it. Once you get used to these code formators you'll probably never go back. They're really good. So that's for kind of just standard formatting, white space, use of single quotes, double quotes, things like that. Unfortunately there isn't one for next flow code yet. I would love one, but that's quite difficult to write. But what we do have is we have a code linter for NFCore. That doesn't do code formatting but it does do a whole load of automated code checks. So if I am sat in the pipeline directory here, if you remember when I did minus, minus help one of the options here was NFCore lint. And this is the tool that we run on a command line to do these tests. So I can do NFCore lint and it will check the pipeline that we've just created. So there's a little bit of setup first this bit's quicker the next time you run it. And it's gonna run a whole array of different tests on the code basically. We've just created this pipeline. So hopefully there's not gonna be anything too bad. There's not gonna be any failures, we'll see. And here we go. So it's just sped through a whole load of different tests there. And it's given us things at pipeline level and also at module level. These are in yellow because these are warnings and warnings are tolerated, that's okay. We try and clean up warnings before you do a pipeline release but it's kind of okay to go ahead with them. We can also see that a whole load of tests passed and are not shown by default. But we have zero tests failed which is good news. And if you look at what these tests are you can see it says it found a to-do string which is the same to-do comments I was talking about earlier. And this lint test basically just makes sure that you haven't missed any in the pipeline. So when you run NFCore lint you'll see this jump up and you can go and sort that out. So warnings are okay. Let me just, I'm going to cheat a little bit now. You should never do this, of course. I'm going to replace to-do through the entire pipeline just to get rid of all these comments. Okay, replace and then if I run this again hopefully now all of those pipeline level warnings would have disappeared. These module test warnings are a bit annoying. I make you a demo a bit less clean. No will be gone in the next time as a for-call release. But you can see all the pipeline level warnings have now gone. So that's great. Now some of these files I've said you shouldn't edit. So for example, code of conduct this is the NFCore code of conduct and there's no reason really that you should be editing this in your pipeline. And there are other files here where if you edit we'll kind of basically mess stuff up there and of course automation and things like that. So there are lint tests to check that certain files have not been modified or deleted or anything else. So to demonstrate this I can just add something to this file, hit save. And then if I run NFCore linting again you now see that I've got a failure here pipeline test failed. One test failed and it says this does not match the template and it should do. Something to point out with these lint tests is each one has an identifier here. This one says files unchanged. And if I'm running locally I'll see if this works in the browser. No, it doesn't work in the browser. If I'm running locally and I alt click this in the terminal it should open a hyperlink which will take me into the web browser. Unfortunately, one of the things about the Gitpod environment is that some of these web linking things don't work so well but I can just show you directly. If I go to the documentation we'll hopefully make this a bit prettier at some point but if you go into a documentation here built with Sphinx you can see all of these different pipeline level lint tests and if I search for the one called files unchanged you can see the documentation about this specific lint test and what it does and basically figure out how to fix it. So you can see there's a big list of files here and these are the files that you mustn't mess around with or modify. Now there are some files which you can modify but the start of a file needs to stay the same. So you can append extra content to it but you can't delete stuff from it. And so you can see there's pipeline lint tests there's also module level lint tests and that's for the DSL2 modules that Hush will talk about later. So that's a good place to start and then if you still have problems after that you can always come onto the NFCore Slack and we have a channel called Linting where you can ask for help. Great, final thing to say about Linting really is that a couple of things that NFCore lint commands we use extensively within automation, within continuous integration testing, CI testing. So to give you an example, I'm gonna pick on Alex because you won't mind. So I'm gonna find a pull request this is on a small RNA seek pipeline here on NFCore it's a long pull request but if I scroll down to the bottom you can see that a bunch of CI automated tests here some actually failing. So it's trying to run the pipeline with a test data here and something's gone wrong. So that's a failure here and some are passing with a little green tick and I can actually find NFCore Linting with NFCore here and if I click details it will show me exactly what's been running NFCore and then I can see you run NFCore Lint and sure enough, this is the command that we recognize that we were just running locally it's running on the code within this pipeline within this pull request it's throwing up some warnings but there are no failures and because there are no failures it's given a little green tick and this pull request is allowed to continue if there are any failures it would get a red cross and then the pull request you wouldn't be possible to merge the pull requests. So this really by having this automated code Linting ensures that we're not merging in bad quality code or accidentally breaking anything and it also means that as we update the pipeline template and as we update the guidelines and things it kind of forces you to stay up to date with everything because we will add some new Lint tests to check that your pipeline looks how it should do and things. So it ensures that all of the pipelines kind of stay up to date and stay best practices. So I mentioned when I started off that you can kind of create an NFCore pipeline without NFCore branding and you can use the NFCore tooling for your own pipelines outside of the NFCore ecosystem and that's very much encouraged we want to make these tools as useful as possible for everybody not just for people within the NFCore organization. So there might be some cases where actually you want to delete this file or have your own code of conduct, for example or change something have and as a valid reason why there should be a test failing and you're okay with that. You can still run the NFCore testing and still use a continuous integration but everything would be failing which would be kind of bad that would kind of defeat the whole point. So the way we get around that is we tell we have a configuration file and we tell the Linting code to ignore certain tests. In your pipeline you'll find a .NFCore YAML file which is by default almost empty it just says it's a pipeline and if I go over to the docs over here I can copy and paste a chunk of config code here. So this says for the Linting now is that pipeline to do's test which we saw earlier and we could just disable that entire Lint test and then some of the Lint tests we can have more fine grained control this is one of them we can say it's okay for this file to fail but still test all the others and that's a case by case basis as to whether you can kind of fine tune in. So if I hit save on that and run NFCore Lint again now you can see it tells us that we're ignoring a couple of tests and sure enough we have zero tests failed because it ignored that file and you're good to go you can continue working with your pipeline and using this kind of automated testing strategy. Right, hopefully everyone is with me at this point and I haven't kind of lost anybody. That is the overview of creating NFCore pipelines and kind of how they look and how we develop with them. Next I'm going to touch on some stuff with the next flow schema. So to give you an example of the kind of things that we use this for if I do next flow run NFCore demo again but instead of actually running the pipeline I'm going to do minus, minus help. You can do this with any NFCore pipeline and it will show you just a little bit of help text about how to run the pipeline. So it just runs an X flow to do this and it's within the pipeline code basically to print some help text and then exit. So here we've got all the different parameters which are available to run in the pipeline. So you can see there's kind of input out there email reference genome things and in fact there's a whole load of other kind of boilerplate functions which mostly you don't need to change on a regular basis but you can use this to show you the full list. Now, if you remember when you were editing the config file yesterday there's not really very much extra context for doing stuff in here. I go to nextv.config and set up syntax highlighting. You can see here that these parameters here are just kind of strings or null they don't really have any additional content but yet here down on the help text we can see that some of these are strings if I ran it with hidden parameters you'd see that some of them are set as Booleans true false, some are numeric integers or numbers, floating point numbers and also we have this kind of description of what the parameter is. So all of this information has to be stored somewhere but we can't do it within nextv.config. So what we did to provide this is we developed a new standard within NFCore which was since I'm very happy to say sort of spread to the entire nextv community and we do it with this JSON file always called nextv underscore schema.json and this is a flat JSON file and all it does its sole purpose is to define additional information about the parameters in the pipeline. So we can go down to the reference genome here for example, prams.genome minus minus genome and it says here that this is a string it's got a short description of what that parameter does and it's got longer markdown help text. If I search for bool you can see about this one there's a flag so that's a Boolean and it's also hidden from the help text by default. So you can see with this file that we kind of build over this additional information. This is really, really helpful. Not only can we generate help text on the command line like this we can also build web pages from it. So every NFCore pipeline if you go to the pipeline page on the NFCore website you'll see a tab that says parameters and it shows you all of the different options for running that pipeline. It says what the type is, it's got your description and it's got the longer help text here. So we can kind of render more rich information based on this schema file. We can also really importantly not just documentation we can also render launch interfaces. So that command line interface I showed you earlier the command line wizard that took me through each of the parameters is based on the schema and that also validates them as I go along. So it says this one is required. I have to enter this, it won't let me continue. And also if I go to any of the pipelines go to chip seek, you can see this launch button gives me a launch interface with a rich interface here on the NFCore website. And when I submit this it will give me a Nextflow command here Nextflow run with a parameters file. So I can just run that with no additional software just with Nextflow or an NFCore command as well to run this on the command line. And I can also launch directly into tower. Evan will talk about Nextflow tower a bit later on and you'll see a very similar kind of launch interface there which is also based on the same schema file. So by describing all the parameters we can kind of build this extra kind of ecosystem of tooling around pipelines to make them more user-friendly. Finally, we can also use them to validate the parameters at launch time. In the past, it used to be that if you run a pipeline you might get a spelling mistake or something and the pipeline would only crash kind of halfway through after which would be quite annoying if it's been running for a long time. But if I do Nextflow run again, this is the same command I ran earlier but the out there parameters required. So if I remove that and run it now, hopefully Nextflow will look at all the parameters I've given for the runtime, compare that against the schema file and validate them and should exit immediately saying that I am missing a required parameter. Here we go. And you can see that's very fast. So failing fast is very, very important and the Nextflow schema allows us to do that. That's great. However, I imagine some of you saw this JSON file and are feeling a bit nervous. It's the big JSON file. This is just for template as well. I haven't added anything here. This is minimal pipeline without very many parameters and it's very complex. It's highly nested. It's very easy to break. So thankfully we've come up with some user-friendly tooling to work with these files. Generally, you should never ever touch this file. You should never have to manually edit this ever, really. So I'm gonna close that and I'm gonna do a quick demo. Need to speed up a little bit. So if I go into the Nextflow config file for the pipeline, it's annoying I haven't got automated. I like to set up here. Go up to the top and I can just add some new parameters here. So I've got a new parameter which is a string and then this one is obviously numeric and this one is a Boolean, but Nextflow doesn't know about those types. It just kind of treats them all the same here. Get rid of that and then I can go into the pipeline directory and do NFCore schema build. This is not a one-time only command. I can keep running this command again and again and again whenever I want to keep tweaking the schema. So it looks at the schema file and it also looks at the Nextflow config using Nextflow itself and it compares them. And here it said that I found some Nextflow parameters which are not in the schema. Do you want me to add them? And if you rename some or deleted some, it works the other way as well. I can say this one isn't no longer in your pipeline do you want to remove it? So it's updated my schema file for me right away. So he's now there, but they're not customized so we don't have any special information in there. And so it says do I want to launch a web builder? Yes, I do. Right, this is specific to Gitpod and it's a bit weird. Normally when you run this locally it will spit out a browser tab automatically. But Gitpod is a bit funny it doesn't really know about what browsers you have available. And so it uses a command line, text-based web browser called links. So I'm going to just now hit Q to quit this because I don't want to use it and say yes, I do want to quit. Okay, this is normally what it looks like. Thankfully it tells us what the URL is so I can just copy that into a new tab over here. So this is on the main NFCore website but I've got a custom ID which relates to the command that I just ran. Cache ID. And sure enough here are all the parameters from our pipeline. All of these came with the template. If I collapse all these groups, you'll see down at the bottom we've got our new parameters that I just added along with their names and their default values you can get rid of now. With the schema we can group the parameters together which is really helpful for usability. So I'm going to add a new group or a demo group and then I can also add icons which turn up in the web interfaces. So let's have a little sparkling hand. A short description text and also longer help text. So, yeah. And I can preview what they'll look like on the command line and everything. I think this should be showing Markdown spoken on the website would be Markdown. So click save. Right, and then I'm just going to zoom out a little bit to make this a bit easier. I'm going to grab this little handle here and I can drag all of these around to change the ordering. So I'm going to drag that down to the bottom and I can move any of these other ones around or any of these parameters around if I want to as well. And then I'm going to zoom back in and I can drag these up into my new group that I've created. If you have a lot of them it's often quicker to use this little button here where it shows you all the available parameters which have not yet been added. I'm going to select all of them and just bulk move them all into the new group that I created. And then sure enough here we can go through we can set different icons with of course you would set things which are kind of relevant to whatever that parameter is about to try and this shows up in the navigation and things to make it a bit more easy to kind of visually look over it. And we can also set the parameter types as well. So we've got description again and help text and then parameter types here. This guest that this one is an integer already so you can see this is a numeric field which is great. This one looks like a Boolean so probably is. So I can switch that from string down to brilliant and now it's only true or false rather than being a string. And then you can see I can have a number which is a floating point number or an integer. I can set whether parameters are required for running the pipeline whether I have to be supplied and I can also hide them from the help text by default. This little cog opens up additional settings for each parameter but Booleans don't really have anything you can do with them so there's nothing there but with integers I can set minimum and maximum values and I can also enumerate the possible values that should be possible to choose. So I can say has to be two or four, 12 or 20. And now if the user enters anything other than these values when they run the pipeline, it will reject that. So it has to be one of these options. Strings have the most of all. Again, we can enumerate possible values. We can also set up a regular expression to validate the exact text that should be entered. The placeholder here in the background would be for checking that a string ends in dot CSV for like a CSV input file. But you can have anything here it could be one for an email address or whatever. And then very new only added a few days ago we have formats here. This will probably have more stuff in the future but right now you can use it to say that this string should correspond to a file. Could be a file directory or either. And if I select file path I get a couple more options here. And I can set the mine type. So this is basically the file type. You can click this link to get a big list. So if it's a spreadsheet and it should be a CSV file I can say text CSV. And you can also set additional schema. I'm not going to touch on this now because this is very new and there isn't very much tooling around this. But these two are really important. And everyone will touch on this later when he talks about Nexplo Tower because this is used by the ecosystem and surrounding tools again to know which parameters in your pipeline should correspond to kind of sample sheets and other data sets coming into the pipeline. Right, so I can hit save. And then I'm just going to pull this out for a second and put it side by side like this. Because I think this is really cool. So over on the left I've got my command line running here and you can see this little spinner is just sat there interactively waiting for me to finish editing. Whenever I'm done I hit finished over on the right here and this should, there we go. It sees that the website has finished updating. It downloads all that edited schema file which I was just creating and it saves it locally to nexplo.schema. So it's really quick and easy. Run it on the command line. It does all the command line stuff. Customize it in the browser, click finish and it all gets dumped back into the JSON file. And sure enough, now if I go to the get interface here if I look at nexplo schema diff you can see that I will have added the new parameters down here and they're grouped together and you can see all of that stuff I was adding in the web interface has been put into the JSON file for me. Really helpful. Right. With that, we have four minutes for a quick break and it's basically the end of my training for today. So it's been a pleasure to talk to you all. Thank you very much and dump all of your questions into Slack. I'll head over that way now and see if I can help out as well. So the next session will be led by Harshal and he's going to talk about DSL2 modules and he should run over a bit so in about three minutes or so he's going to start at about four o'clock and after that we'll have Evan talking about nexplo tarot. So thanks very much for your attention and I hope to catch you up on Slack soon. I will see you in person at the nexplo summit. Oh, I should advertise that before I forget. I should have said this at the start. The next most summit is next week and we have the NFCOR hackathon. The registration is still open until Friday for that, if you want to join the hackathon online that's what this training was originally set up to do was to prep people for the hackathon. So if you go to summit.nexplo.io you can click register and you can still sign up for the remote hackathon and you can also sign up to follow the nexplo summit remotely. That will be available online as well. So the summit's coming up, the hackathon is coming up, you can find out more information about the hackathon specifically on the NFCOR website as well, like how it'll work and all the different groups and everything. And the other thing that's just new and totally different is we have the second round of NFCOR mentorships which are applications have just opened a few days ago. So if you're interested in being either a mentor or a mentee for this program, have a look at this web page under about and mentorships or we sent out a few tweets earlier today and please do apply. Those are funded for their mentors and free for mentees. So it's a really good opportunity. First round was great. Right, that's all folks. Harshal will be on in a couple of minutes. Thanks very much. Okay, okay. Good afternoon everyone. Thank you for tuning in to the second set of talks that we're hosting in the EMEA region for NFCOR in general, the community, the tooling and also the modules which I'll be covering. So my name is Harshal Patel. I'm head of scientific development at Secura Labs and I'm gonna be talking to you today about NFCOR modules. I haven't slept since I gave the APAC talk in the morning so please excuse me and you might have to put up with me. The others have had a nice power nap and there's more vacation in between. So yeah, you might have to just stand there with me throughout this. So if we crack right one. So this is what a typical pipeline looks like. In this case, it's the NFCOR RNA C pipeline. It's probably one of the most widely used pipelines that we have on NFCOR. It's also potentially one of the most common applications for genomics where scientists will sequence bulk, mostly bulk RNA from samples to compare what's different between conditionings, for example. And so this is a tube map representation of this particular pipeline. And as you can see, there's particular modules highlighted along the way here. There's particular modules that are highlighted along the way here. Now these modules are very likely to be useful to be shared across pipelines and NFCOR. The majority of pipelines that we have that do genomics analysis, for example, on FastQC. Some of them we'll use from Galore. A lot of them use MultiQC, which we know is down to Phil Jules and his coolness when it comes to these sort of stuff. But there's also other tools that would be really nice to share across pipelines. And so that's generally where modules come in. So for those of you that don't know, next row initially started with a DSL1 syntax, which was very monolithic. And so within the same script, you'd have to physically copy and paste the same module code over and over again, as in how you needed it. Whereas with DSL2, we now have this ability to import functionality via modules. And so in this way, you can write a process once, for example, which is really useful because it's just a unit. In essence, a unit of next flow is what you would think is a process because it's the minimal definition of an extra pipeline, a single process. And so we came up with a bunch of terminology right at the beginning when we went about converting our pipelines to DSL2 for a framework like NFCore, it's incredibly important to share functionality. And so it was also important for us to adopt DSL2 as soon as possible and figure out ways that we can make implementing pipelines easier. And so a module in our terminology is just one unit. So in this case, it would be just a single process. A sub-workflow would be a chain of modules. So say, for example, you have SAM tools, SOAP, SAM tools, INDEX, SAM tools, STAT, SAM tools, FlagSTAT. And that chain of modules can then be reused when you say, for example, process another BAM file in your workflow. And so that's what we call a sub-workflow. It's more of a chain of functionality that you can directly include into pipelines. Our workflow is the end-to-end implementation. And so that would be a combination of modules and sub-workflows. When we went about trying to figure out how we go about this, we had lots of discussions about the best way to attack this. And we kind of kept just sticking to some key principles that would be important if we wanted to implement modules properly. Reproducibility is key, not only for modules, but also for science in general. And so we wanted to make sure that we sort of tick the box there. Flexibility, so it can be used in different contexts. We also have the ability to provide non-managery arguments to these modules and these pipelines as well, which allows users to customize the way they'd like to run the pipeline without having to physically release the pipeline. And so that flexibility was very important. It has to be portable, which is one of the benefits of Nextflow. Standardized, which we love on an F-core. It just means that it makes it easier to update lots of things at the same time and not have customization in place that breaks things. Documentation, again, something very dear to our hearts. We like good documentation to make sure that everything is crystal clear. And if it isn't, then we asked for contributions or suggestions as to how to improve that. And automation, when you have 60 or 70 pipelines and 600-odd modules, then doing things by hand is physically impossible. So whenever we go about designing these sorts of things, we need to factor all of this in. And so as Phil mentioned earlier, we've really gone from strength to strength with an F-core modules. It's a repository that hosts single processes for various mainstream mathematics tools. And so you can imagine one for FastQC, another one for SAM Tools Index. We've got a GNZIP module as well. And so these individual modules have now summated up to 632 modules. And it's not just the modules that we've added. We've also added tests for these modules to make sure they work all the time whenever you're using them. And that's also quite important because you could just add a module and not know whether it's actually functioning or not. And if someone, for example, also comes in and wants to change that module, we need to make sure that whatever changes they've made actually makes sense and the module works. These modules are kept up today with community best practices. So for those of you that saw those carnage the past few days trying to update this modules repository in preparation for this training session, we released a new version of NFCore Tools late last night and then put all of this together basically around that. And we also are going to be working on sub-work flows at Hackathen next week. That's going to be, I think, a game changer because rather than installing individual units and stitching them together in your pipeline, you can now install larger units of functionality directly into your pipeline. And we're hoping to build a tooling around that and also get some POCs together that will make it much easier to share this sort of functionality. As I mentioned earlier, it was key to us to make this as flexible as possible. So for example, with the current implementation of DSL2 we have because non-managery arguments are all passed by a config file. In this case, it's a standard file called modules.config which sits in the config directory in the pipeline. You can use as end users can customize how they want to run these tools. via custom configs directly. They don't need to rely on the autism pipeline to release the pipeline to see whether they can customize it or tweak the behavior of the platform. They can do this themselves. And there's all sorts of other advantages as well in terms of writing resources and container definitions. So as Phil was showing you earlier, we have bread and butter to automate a lot of this and to really roll out this sort of functionality is this NFCore tools package. We've got a lot of time and effort and I mean, it's quite pretty if you look at it partly due to adopting some of the rich stuff that Phil got on board quite early. But this allows us to do a lot of things quite easily and then that's what I'm going to be taking you through in a second. There's loads of resources here that are available for talks regarding adding modules, contributing to modules. I won't be able to cover this in too much of a comprehensive detail, but I have done in some of these talks. One of these is quite a mammoth talk. In fact, the bits of it may be outdated because that was in October 21 and a lot has changed since then but the core functionality is still pretty much the same. I'm going to dump these slides in the Slack channel when I've finished. So you should be able to click on these afterwards. And yeah, come find us on Slack. There's a modules channel, there's a sub-workflows channel now. We've got a modules team that you can add if you need any help or advice or whatever else, but typically just come and approach us in the appropriate channel and we're all willing to help. And I'm going to thank you in advance and get on to the demo all. So I don't have to open PowerPoint again. Okay, can you guys see my browser? Just want to confirm before I go on. Yeah, can see my browser. Okay, I assume you can. If you can't shout and I'll panic and do something else. So the page that Phil gave you earlier is essentially what I'm going to be going through here. I'm not going to, I'm just going to copy and paste and go along. So hopefully you'll be able to follow. This NFCore module section in that document is where I'll be starting off. And so I'm just going to copy this out with it here. So I can do this. The NFCore tools is where we're generating this GIP pod environment as Phil mentioned. And so I'm just going to wait for that to load up. I'll cancel this and open it in the browser. All right, so this should look familiar to what you had earlier with Chris. It's just warming up. So okay, the first thing I'm going to do is come out of where NFCore tools is installed and I'm going to install a pipeline or create a pipeline from scratch. Let's just check with NFCore which again will make this way bigger, clearer. Okay, so NFCore is installed. Let's run NFCore create. This will create a vanilla template for us. We'll call it demo or training demo pipeline. I won't claim ownership. Do you want to customize? I don't want to customize. There's that big warning that Phil mentioned. And so now we've got this NFCore demo pipeline sitting here with all of the right things replaced in all of the correct places. I'm just going to open this folder directly. So for the file, open folder and I've got demo and this should refresh. And so now we're at the top level of this folder in the terminal here you can see. But what I'm just going to do now is to make sure that this pipeline runs next little run dot which just runs it in the current directory go for test.com and this probably will fail due to validation. If I am correct. Yes, we set up there because it's mandatory. And now the pipeline should kick off. Because I didn't set that up there that's created this null folder which we can delete. And so now the pipeline would take a second or two to go through its paces and to run these simple processes. The main reason that we've got still fast because this came up on a chat earlier the main reason we've got these simple processes to run through is we need something to run in this template just to test it out. And so the most obvious selection for this was fast queue files and a lot of NFCore pipelines are genomics. And so that's what we chose to use here. For example, because we want to make sure that the pipeline actually does something and works. And so we've got minimal steps here to do that. So as this is running through you can see in this particular pipeline the way things are structured is that you have this modules directory and you can have local modules. So these by definition are modules that you want to keep in your repository that there's something that is specific to your particular pipeline and may not necessarily be useful for others or to add to NFCore modules for example where we can share these modules. And so these are modules that are more customized and specific to your use case. We also have NFCore modules installed within the pipeline here. And so you can see that you've got this custom dump software versions module that we use in most NFCore pipelines just to collect software versions from each of the modules at the end of the work for us, we know exactly which versions of tools we've used as fast QC and multi QC. So these have been installed directly from NFCore modules. There's a sub workflows directory. So again, sub workflows are chains of modules and within those you can either have local sub workflows and hopefully hot off the press next week. And of course sub workflows that you can install directly into the pipeline. And then you've got a workflows directory. So three levels as I showed you in that terminology slide the workflow itself is the main end-to-end implementation of these modules and sub workflows. Right, so this pipeline has finished. And now let's just run a quick lint to make sure everything looks good in this pipeline as well. And so the linting as Phil mentioned earlier is just to make sure that everything ideas to best practices, things look right and are put in the right places within the pipeline. All right, everything's good. Zero test fails, we've got a few warnings. Most of these are to-do strings. These are things that you need to change in this pipeline template. They've been added there for a reason because they're customized. And the whole point of this is to direct you to where you need to change certain aspects of this pipeline template so you don't forget. And so now going back to the training material let's try and add a simple process to this main workflow. So if I go to this demo.nf in the workflow directory and I just copy that process that I've put in the training docs there and put it in the main script. What I'm also going to do in fact is I'm going to switch the caching off for one of these processes because it will run through it quicker. But by default, we don't cache this particular process because it just means Multi-QC runs every time at the end of the pipeline because there could be changes in files that are required to stage for Multi-QC to run. And so in this case, we're just going to remove that caching because it will mean that whenever we resume the pipeline, it finishes quicker. Right. So we've added this process in. At this point, this won't do anything. If we run this workflow, we've just added a definition here. We need to actually invoke this particular process within the workflow definition. And so here's the workflow definition. Let's make some space here and copy and paste the next bit in which is us calling or invoking that echo reads process. But what this is actually doing is it's taking a channel that's produced by this input check process and the structure of that channel is that it's got a meta map. And this is quite widely adopted on NF4. We use it a lot to pass around sample information and IDs and that sort of stuff. It's a bit like a Python dictionary and it gives you a bit more flexibility in terms of how you pass information around through your workflow. So rather than having a channel with 10 or 15 different elements in it that you need to pass around and preserve within the workflow context, you can just have one map. You can amend and change this as in how you want within the workflow. And it gives you a lot more flexibility, especially if we want to reuse these modules because if I want to use strandedness in the RNA seed pipeline, that may not necessarily be applicable if I want to use antibody in the chip seed pipeline, they both won't work. And so it allows you to customize the information that you pass around these pipelines. And so here, before I actually run this, let's have a look at what this looks like so you get a feel for it. Well, let's have a look. Let's just view this channel we're passing as input is echo its process. And I'm going to tag on a resume at the end, which means that Nextwell will not rerun tasks that have completed successfully already. I don't know why that process wasn't cached, but okay. Did I save it? And what you can see here now is the structure of the channels that are within this particular output. And so here we've got a map. In this case, it's just an ID, which is a sample name for this particular use case. And it's suggesting whether this particular sample is single-ended or not. And so that's what a groovy map looks like. That's what we... Whenever you hear a meta or an NFCore, it's basically that's what we're referring to. And then here you just see a channel of paths. In this case, it's not single-ended, which is why you've got two sets of reads here. It is single-ended here, which is why you've got one set of reads. And so in this example, all we're going to do really is pass this meta map to this echo reads process. And that should print the reads or the read names to terminal here, because we've added this debug-true statement here. There we go. So now they're printing the names directly from the pipeline. So now what we can do, this main script, this isn't really reusable, this process where it's sitting at the moment. So what we can do is we can add a local module for this. We can add a new file called echoread.nf. And then just cut this out here, put it in here. But we also now need to tell, next we are where this file lives. And so we need to, in a way, import it from where we've moved it. And so here you see this echo reads. Let's run it again, and it should give you the same results. And next we should now find that process, rather than being in this script, it will find it from this echo reads. And that's one of the strengths of using and importing modules in this way, because you can sort of isolate where they sit and you can import them in various different contexts. You can import them in sub-workflows, in workflows, however you want in any given combination. What you can also do is, so here you see now it's being found and it's being printed from there. What you can also do is import the same module twice. And so here I'm importing the same module with different names, and then I can just copy this out. And actually refer to the names that I'm using to import it. And so here I'll be doing once and twice. And for those of you that are still awake or have had coffee, you'll know that this should now print these sets of reads twice because we're invoking that particular process twice now. There we go. So now it's printed these twice. So I hopefully that highlights to you that it can be relatively easy to add in your own processes to the template. There is customization that relies on the fact that at the moment this pipeline template is set up to deal with fast queue files. But once you've got your input sorted, hopefully it should be quite easy to then start testing and playing with this. It's up to you how much you keep, how much you strip out. And then it just gives you a really nice plane with which to start developing your pipelines. There's various other files in this template which I won't go through in much detail but we've got GitHub actions and other stuff that are automatically added and created to this pipeline which saves you having to set this all up your stuff. And for me personally, I always find it easier to delete things rather than scratch my head figuring out what I need to add and why and how to get it working. And so that's one of the advantages of using this pipeline template. There's also a bunch of other functions in the NFCOR module suite that we can use to really make things easy for us. And so we can, we can list local modules. So as I mentioned earlier, we've got this modules folder and in this NFCOR we have three modules. And so that's exactly what is being listed here. It's just telling us we've got these modules. They came from this repository. They have this version. And so we use the git hash to track the versions of the modules that we install in this pipeline. It even shows you the commit message and the date. So these hashes are actually being stored in a special file called modules.json. Pretty much like the next flow schema, Jason, where you store your parameters, you shouldn't need to touch this file. This file is specifically there to scare you away from touching it because you shouldn't need to make amendments to it. The whole point of it is to track the versions of the NFCOR modules and other modules you may have installed within the pipeline. So you, once NFCOR tools is updating, removing, installing other modules, it will automatically update this file for you. And that's kind of the idea behind all of that. And hopefully the automation for this means that you don't have to start doing all of this manually. You can update modules in the pipeline all at once. And in fact, I might even show you an example of where I've done that and how easy it is. So here, I just say I want to update all of the modules. In this case, we're all good to go because we just released a new version of NFCOR tools yesterday. And so everything is typically up to date along with all of the module changes we made. Everything is already up to it. So it's nothing to do there. However, when you look at production pipeline like RNA seek, I needed to update all of the modules in the pipeline just now. And it was literally just a single command to do that. And I've updated all of these modules directly within the pipeline as a result of all of this restructuring. And so I guess that kind of gives you an idea of how useful this sort of automation can be if you use it and adhere to some sort of standards here with the way that you adopt these modules. You can also list remote modules, but this isn't very useful as an output because they're sort of 632 and you probably go cross-sides going through all of those pages. However, we do have a modules page on the website, but it's just the website and modules where you can search for this sort of stuff to see modules associated with a particular tag or name and so on to trim it down to help you search and find the modules you want to use. We can even install modules. So now let's, we've got a pipeline now. We created a custom local module, which was just echoing things. Let's try and add in a real module from NFCore Modules that this module will be sitting on GitHub in the NFCore Modules repository. Let's add it in fast P. It's a really popular tool to use for adaptive trim and so now you can see with that single command, what's happened is that we now have this fast P module directly installed within the pipeline. This just has a main script here that various people in the community have contributed to to adapt it, finesse it, and to make sure that it's working and flexible. And then there's a meta-yaml here that holds information about the tool, any licensing information, DOIs, descriptions and the input and output channel information, as well as the authors that have contributed to this pipeline. So now let's see if we can, if we can include this module in our pipeline. So when this was being installed, you can see there was a statement that was printed here. And so all you really have to do is copy this. Let's put it here. And now we need to invoke the module itself, like we've done with that echo module. And so I'm gonna copy this again from the website. Now I'm just gonna get it going again. Let's see if the pipeline works. It's always nice to test pipeline as you're adding stuff in because then you just get that feeling of where things, where you've broken things. If you start adding massive bits of functionality and then it breaks, then it becomes very difficult to debug. And so this sort of iterative adding of things in and changing things is really nice in terms of the tip. So here, let's see what the channels look like for this particular process. We've got the meta map and the reads like we had with the echo reads, but we've also got two additional input channels, which are Boolean values for whether you wanna save the trim fast queue files or whether you wanna save the merged fast queue files. For our purposes, we're not really interested in any of that functionality at this point. So I've just hard-coded those as being false for now. But you can see that now fast P has also run and we've managed to wire it into the pipeline. And you can get information about module that I showed you in that meta YAML directly from the command line as well. You get this nice output here that tells you what the inputs are, what the outputs are and how to install it even. So there's a really nice way of getting information about this particular module as well. Say you wanna change fast P and do you wanna apply some custom edits to this particular module anywhere in the file? So for arguments sake, I'm just gonna put an echo statement within the script section. Now with the version control and everything that we're doing via NFCOR tools, we shouldn't allow this because we're tracking versions in that modules.json. If you change this module in the NFCOR folder, we need to know that it's been changed. And so if we now attempt to link this one module, what we would want to see is that the test fails because the local copy of this module does not match what's in the remote repository and on GitHub. And so that's a good thing because we're trying to track hashes and versions. If we change something, we need to know that it's changed. So either you need to delete this, update it on NFCOR modules and then reinstall the module or there's another way where you can do this, where you can patch this module as well. So there's another command. And so here what happens is a diff as to what you changed is stored alongside the module and in the modules.json file, you also have a path that is stored relative to where that change has been made. And so this is really nice. If, for example, you want to customize NFCOR modules but you don't necessarily want to contribute them to NFCOR modules, I mean, all contributions are always welcome. If you're benefiting and you think others will benefit, then it'll be awesome. And that's kind of why NFCOR modules have been so successful until now. But if it's something really custom that you want to keep local to your pipeline, you can do that too, as well as getting the advantages of using fully-fledged production type module for NFCOR modules. And so what you can also do is then not only individual modules, but you can link across all of the modules that you have in your pipeline. And so here now, we don't have any linked test failures, which we were seeing before. And the reason is because we've now patched this module. And so the linking is no longer failing because we've done this or accounted for the changes we made in the proper way. Okay, let's create a module now. Like this. Let's do this. When you want to create the module, now bear in mind that we're running this command in the root of the pipeline repository itself. NFCOR modules create. We'll give it a name. So demo module. Could not find Condit. So by default, what NFCOR tools would attempt to do is if it understands what the name of this tool is here. So say for example, it's SAM tools or BED tools or something like that. What NFCOR tools will do is it will attempt to go and automatically query the BioCondor API and it will insert the parts of those containers within the module for you. So you don't have to go looking for the containers. And by default, we use BioC containers everywhere on NFCOR because they're really useful. They exist already. There's, they've got a whole bunch of automation and infrastructure setup where they host BioC containers, which are Docker containers with Condor packages installed in them. As well as their identical singularity images which are hosted by our Galaxy. And so when we're using these pipelines, what we do is we have a container definition for the Docker and the singularity counterparts. And so that way when you're using singularity rather than doing a conversion from Docker to singularity you're pulling the container directly from the Galaxy repo as well. So there's lots of advantages in terms of adopting this. It's also means we don't have to host and maintain around containers whenever someone updates a Condor recipe or pushes a new version of a tool that automatically creates another Docker container which then also creates a singularity image. And so then that we can just, we can pull that directly and use that in pipelines without having to do much. But here I'm just going to say I don't want to enter a BioCondor. I'm gonna hide again behind someone else's identity versus single. So this, the process label here these are all set in this profile based config. You'll see here that this process single, process low, process medium and high. And we use this again a lot in NFCL pipelines because this allows you to group together the way that you want to run particular types of modules. So if you have modules that only require single CPU and I'm going to take very long you can give them a label of process single for example. And I might be able to show you an example back here. And so by default we have labels that mean that are standardized. And so on any NFCL module this means that depending on the requirements of the module if it's memory hungry or CPU hungry or memory and CPU hungry you can set the appropriate label for that. This may not suit all requirements because some tools in some settings will just go off the scale in terms of what they need to run. Star is one of the usual suspects there but for most instances it should fit what you require and we tend to be overly generous with these resource settings and a lot of that comes down to the fact that NFCL pipelines are used by lots of people in different settings on different genomes. And so rather than having the pipeline break we tend to be overly sort of generous with the resources that we set. So in this case I'm going to be nice to the planet and just say I want a single CPU for this particular module. I want to include metamac information. And so now this module is created in this location here. Where is it? Local demo module. You get a bunch of to-do statements pretty much like the pipeline template but we also have a module template and so this is created from that and we have guidance and information and other stuff that may be quite easy to forget especially if you're just starting out with this sort of stuff and worst case scenario again you just delete it and you know what you're doing and it's fine but otherwise it just serves as a guide for you to remind you what to do and where to do it within this particular module and everything is standardized again. It's all best practice templates that we use on NFCOR. It's a single command you can spin up a process directly and then just fill out the bits that you need to add in or take out and it's as simple as that. So this module I created in the pipeline context because I wanna add a module local to my pipeline. There is also another way that you can create a module and so if I come out of my pipeline now and I clone the modules repository from GitHub to go into that modules directory I'm gonna create a branch because I don't wanna work out, work and master so I'll create a branch. And now you can also create a module here. So NFCOR create modules create and I will give it a name. So let's just give it the same name we did before. I don't have a tool to do it as well. It's a single and I wanna give you a map. But now instead of one file that was created when we installed or created this module within the pipeline itself in fact maybe I'll come out here and open the folder here directly so let's open folder and open here instead. Okay, get forward. No, I don't wanna do that. I wanna open in a browser. So let's open it here again. Right, so now we're in the root of this modules repository. But what you see here is that now instead of one file be edited we've now edited six files as a result of creating this module. And the reason for that is because we're now creating this module and the way that this is actually detected is where is it? We've got this NFCOR.YAML that basically tells us that this repository is actually a modules repository. And so NFCOR tools picks this up and it will automatically create these files as if you're in the modules directory. And the reason that we've created six or edited six files here rather than the one that we did earlier is because when you contribute a module to NFCOR modules there's certain files and tests that we need for you to be able to contribute that module there. So this in effect acts as a quick start for you to start now filling out and adding your module as well as the tests and all the files that you need to edit are already added here for you. So if we have a quick look at those we called it, what do you call it? Demo, demo main.nf. For the template we use is the same as we did with the local module. You can see loads of to-do statements that you need to fill out when you add this module as well as a meta YAML with all of the information for this module we use that on the NFCOR website and NFCOR modules info to render the information about the channels and the licenses and that sort of stuff so you'd have to fill that out. But we also add tests as well so we go into this tests directory and modules demo. And here we have a simple script that runs the module that you have just added and we have loads of test data available on the NFCOR test data set repositories for these modules that we've accumulated over time that allows you to pull in various types of data to test these modules. Again, it's quite standardized and so it really makes it easy for you to share and create these modules directly on NFCOR modules. So here what we're doing is we're importing that module that we've created there and then creating a small workflow and you can have any number of these as well. So say for example, you also want to test this module on PairDend data. You can copy out this workflow and add another one specifically for PairDend as well and add tests for this as well. And so that way you can not only test the module works but you can also test providing different parameters or different options to this one particular module as well to make sure that it's working to get some comprehensive unit tests done. There's an extra config file which you don't really need to do much with unless you want to change the way that you run each particular iteration of this module. So say for example, you wanted to publish the files differently in this invocation of the module compared to this, you could tweak some options within this file. And I guess the most important one, important file is this test.yaml and so this contains information as to exactly how you're going to run this module and we use PairTest workflow for that. And so what PairTest workflow will do is it will look for particular tags that have been defined in this module and it will run this module based on the tags provided in the command line when you run PairTest. And so this is really neat because it allows us not only to do this locally but it also allows us to automate all of this on GitHub Actions and elsewhere to make sure that the module is constantly working. And so what you can test is various things, you can test that the file exists, you can test the MD5 some of this files to say for example, you add a module to NFCore modules and everything was working when you added it, someone else comes in and they tweak a few bits here and there this MD5 some will also be checked against the output files again to make sure that the module is working. If they're not, then you need to figure out why and potentially either change the MD5 some or fix your workflow. And so in that way, we can guarantee that everything is always working. There's various other things you can do with this as well, like checking for file contents and other things, checking the files don't exist as well, there's various logic that you can use here. So this is quite nice and that allows you to unit test these and there's been a lot of chatter recently about NF test which is more of a groovy next load type native implementation that we can use for this sort of unit testing. And it's something that we're really interested in. A few guys in the community are playing with it, Nicholas I believe was playing with it and a few others are playing with it at the moment. And yeah, hopefully in the future we may transition to using something a bit more groovy native for this kind of stuff. And just to show you quickly how the tests work you simply just invoke high test. But here's say for example, I want to run it on this index. All I'm doing is setting a temporary directory and saying I want to run with PyTest with Docker. These are pretty standard tags that you provide as options and that's it. So what PyTest should do is it will find the tag for this sound tools index test which is fitting here, there we go. And it will run through all of these workflows. You can see the entry point for cry which is what it's testing now here. And then it should go to sound tools index cry which is what it's testing here. And then it's doing the CSI which we'll do at the end and it will just run through these workflows and check that these MD5 sums match. At the end of it, we tell you that you've done a great job and you deserve to go to the bar and have beer. I think I'm gonna stop there. I hope this has been useful. Thank you for listening and apologies for any inappropriate humor. I'm a bit delirious because I haven't had my sleep. So yes, thank you. And I'm gonna pass over to Evan now who will take you through the awesomeness of next door tower. Thank you for listening. Cool, awesome. Thanks a lot, Arshil. So I think we're gonna have a quick break now for five minutes or so. So if you're on European time zone, it's gonna take us back to in around four minutes. Sorry, yeah, sorry, about five minutes or so. So let's go to 1551 and we'll come back in and reconvene and then I'm gonna go through a section on next door tower and show you some of the ways that we can run our existing pipelines on tower and just kind of show you some of the features which you may find useful and hopefully will make a lot more sense given the kind of background that we've seen over the last couple of days. So I'll see you back in five minutes or so and we'll reconvene. Awesome, thanks Arshil. Okay, welcome back everyone. I hope you've had a short break and ready for the final session now that we're gonna be doing a little look at an next door tower. So I'm gonna start off by providing a little bit of an overview on tower itself. And if you wanna follow along with this, you can head over to tower.nf or if you type in next door tower into Google, you'll come across this page here and this is the hosted version of tower. So you're free to come and try this out. You can log in here with your GitHub credentials or your Google credentials kind of recommended way to do that. And once you log in, you'll kind of see the pipelines, you'll be able to see some of the things that we're going to be running through today. So just a little bit of a background on tower. So after developing next door for several years, we really realized that there was a kind of key missing piece to the development and to the execution of these pipelines, particularly across multiple environments. And particularly when you want to work, pray not so much work on the pipeline maybe by yourself, but you want to work on this in a collaborative way. Maybe you've got other people who are launching those pipelines or maybe you need to call those pipelines as a service. And you can think of towers of full stack application for the management of those pipelines, but increasingly so for the management of the infrastructure and the management of the data related to those pipelines as well. And one of the philosophies around towers, it's not intended to be like a SaaS application in the cloud like many other applications are where you simply run and you're running the application or you're running the compute, say on a hosted server here. The way tower works is firstly, you can install it in any location. So you can take it, you can install it in your laptop if you want, you can even go and install it in cloud and in HPC, but you can also, essentially where the compute runs is separate again. And we've seen exposed capabilities with regards to the flexibility of where that compute takes place. And the same principles are really applied here with tower in terms of that execution. The idea was to really to develop, say a full stack application that was able to do initially the monitoring of the pipelines, but also to have a full history of the execution. So you can go back in time and know exactly what you've run, where it's run and almost like have a history of that information. We also wanted people to be able to essentially call their pipelines from API or from a GUI interface as well. So we weren't really restricted to having to use a command line interface, which is fantastic for many applications, but in some situations it's not great if you want to call those things from being services. We also noticed that there was some complexities in terms of setting up the cloud infrastructure. So part of tower's role here is to really make it easy to set up that infrastructure to create computer environments, for example, in the cloud and then all in the whole time really just connecting that through to ensure your actual compute. So this here I'm showing you here at Tower Cloud, this is the hosted version. So you can go and try this out. What I'm going to do first is kind of build up to, I guess build up to the evolution of what we've been building the tower and kind of starting with the simple pieces first. And the first thing I'm going to start with is do exactly what we've done previously with regards to next flow, running next flow from our Git pod environment and where we're going to have tower, which is essentially going to be a monitoring piece of that. So the way that you can do that, if you look inside of the training, there's some notes on how we're going to do that. If you want to follow along here, you can see section 12. What I'm going to do is jump into here and look at the execution side itself. So to be able to do this is a couple of things that you need to do. For the first thing, I'm just going to be running the exact same pipeline script number seven. So I've just enabled Docker is that and you can also run with Docker if you want. Everything else sort of remains the same. The other thing that I need to do is I need to add a token into this environment or something to my next flow conflict, which connects tower to my user. And the way that you can do that is if you log into tower, you can create a token in the top hand corner here. So if I go into this right hand corner here, you can see your tokens. I can add a token here. And when I add a token, it's going to give me a nut of essentially a unique token ID, which I can then export into my environment. And the way that you can do that, just going through the training material here is by, this is all described here, but you can go through and you can export your token here into your environment and then launch. So this is this theory line that you need to do and that will connect the tower instance itself with your execution. So going on to running that, if you want to run this, we can do exactly as we had before. So we're going to say next flow run here. And I'm going to point it at this time at strip number seven. And to run with tower, we simply say with tower on the end of that. Now, when this is going to be to launch, it's going to submit all of the information for the tasks, for the actual run information. And we're going to be able to monitor it in the application here. And by default, it's just going to be taking place in what's called my own personal workspace. So I can jump in here, I can see my own workspace here. Here you can see the different runs, this is different executions that we've got running here and this is a way to monitor that. So I'm going to go back over to Gitpod. I'm going to press enter there and then I'm going to head back over to tower and where we have a CDIS execution as it comes live. It will take a second or two for it to submit that information to the server. I should point out that this Gitpod environment is just obviously just running in the cloud. I'm not sure where tower itself is running here and AWS in London, but those two things can be completely independent. Likewise, whether the compute takes place is independent of where towers installed. Here you can see that script seven has come up for us here. If I was to click into this, you can now see that the tasks have completed very quickly. So the tasks, there was eight tasks of this. I get some information on the command line that was used, some of the list of the parameters here. And when you're running in this way, so this command line with tower, you do get some information, but it's a little bit limited compared to what you'll do when we launch from tower, which I'll get through in a moment. So you get basic information on the processes, you can follow those through as they go live, the aggregate statistics, and this becomes really valuable more when you have very long running pipelines that you need to say monitor over hours or days and you can track all that information. I'll explain a little bit later on the costs here, and this is really more relevant for cloud compute. The really nice thing there was having a task table. So very often in a situation, you wanna know out of tens of thousands of tasks exactly kind of what's happened here when I ran, say, seven, and I want to know exactly what happened on the long sample. I can select that there. I can see the exact command which was launched. And if you remember, this is the command from inside that script section of our next flow process there, and we can get some information on the execution. So in this case, it just ran for two seconds. We know the container that was used here, how many CPUs if it requested any information. This is all pretty basic information given that we're kind of running locally here and we're just using the towers as monitoring, that provides that sort of basic information there as well. So this is highly useful for monitoring and something I sort of recommend for doing. I'll also point out the way that Tower works. You notice there that I changed into my personal workspace. Well, you can do the change that by selecting here. You can choose different workspaces. And it's a little bit like the GitHub model where you have organization slash project. Here we have organization slash workspace. And particularly if you're working with other people within your organization or maybe within your own lab, you can go across here and you can create an organization. So you can go your organizations, you can add one in here and then from when you create an organization, you can then define different roles that people can have so you can invite people in there. And you can also invite people into those workspaces to make it very collaborative and really get people share your work with others. The security boundary is the workspace itself. So you can have as many people as you want in your organization in this case. And it's really the role that you give them inside of the workspace, which is the real key point. Okay, without going into the details of this too much, let's consider if we want to now launch a pipeline from Tower. And when we're launching pipelines from Tower, it's a little bit different because when we're running locally, the compute in this case or not just the compute the actual next-flow job, the next-flow run command is running inside of this environment here. So it's actually on that machine. So if that environment shut down for whatever reason or I was running on my laptop and then I closed the terminal, then in that situation, the actual head job of next-flow would end up stopping and then the whole pipeline itself would end up shutting down. So what I guess can be a much more efficient way to run is to submit the next-flow job, essentially submit the next-flow job as a job in itself. And typically we'll put this on a queue that doesn't get shut down, something with some minimum amount of resources that will run and then we can kind of like let it go. We can even if we can treat our next-flow jobs and next-flow run jobs in a way that the head job almost has a service job in itself. So we can construct this by choosing a quick launch. To do this, you need to have a compute environment set up and we'll go through through that in a moment. But just starting from the basics, you would select a quick launch here. This is very much like building the next-flow command line. So you'll see many of the same components in here. We've got a run name. This is just to give it an identify that you wish. You can also add labels in here. So maybe that I wanted to have some labels because I want to run an RNA-seq pipeline and I know that this is a, maybe this is a particular version or this is some testing that I'm doing. And I know that that's a particular project that I'm working on. And I want that information to be known. I can add those labels in there. The next thing we're going to choose is the compute environment. I'll leave this for later on. The final kind of key part to this is really choosing which pipeline that we wish to run and a revision. I'm going to select over here. I'm just going to go to the RNA-seq NF pipeline from next-flow. The same one that we've been building over the last couple of days. And we can just use that as our default. So to enter the Git repository in next-flow, and same as in Tau, you just copy the Git repository. Here we have the ability as well to link in with private repositories as well. And I'll show you how that works with the credential management. I select that, and notice this little blue circle. This is going to GitHub now. And it's going to show me now all the branches, all of the versions, any commit IDs I can select here as well. So here it's showing me all of the ones that are directly related to this environment here. And that kind of shows you that it allows you to do this kind of full reproducibility of what we wish to do. So I could select a particular version, 2.1, let's say, and then go and launch that pipeline. You'll also notice here that we've got a computer environment and a work directory. In this case, we're running an AWS batch. But I could very easily take that same pipeline, say, and run this in these different environments, which I've got set up. And I'll touch on those computer environments later on. Once you've got that set up, you can choose version, et cetera. If you leave this blank, this is just going to do the main branch that you have there into the latest version of that. You kick that off. And once that's kicked off here, what happens is the next load job, the next load run job itself, it seems to get submitted to there and runs. I'm going to now go jump into another pipeline, which I have, which has just been running for a little bit. This is the next new NFCore chip sec, which got released about 48 hours ago. So I'm just doing some testing here. And you can see that I'm following along with some of these tasks. You can see it's got the next load run commands, parameters, configuration. If I used any datasets, they would be here. And I'll explain those in a moment. And you've also got the execution log here. So I could download the next load log or I could follow along with this in this updates live. There's reporting functionality, which we'll see in a completed pipeline afterwards. Here you get a little bit more information. So I can see some more information about my run. And I can see, for example, the computer environment. I can follow the status of those processes. And what you can notice with the gray ones here, that means cached. And the reason for that is, I've actually, when I've run this pipeline, you notice on the command line, you'll see that we've done this resume. And because I'm resuming, I haven't had to rerun these tasks, which already can be completed previously. And it's there showing up as gray here. The ones which are still running on the blue, I've still got a little while to go on this pipeline. You'll notice that something different on the added statistics here is we also have a cost. And that cost is obviously going up as it goes. We have a database on the backend of all of the cloud providers, all the regions, all the instance types. And we can look at each task, each job on those instance types and say, okay, that's taking up half the virtual machine. The virtual machine costs us much. And we know we occupy it for this much time. And that allows us to provide a kind of a rough estimate of the costs itself. Next week, we have some new work coming out, which actually provides the actual cost, which you can look at in your cloud provider itself. Here you can see the load. So we can see that we're running 41 CPUs currently. And there's 71 of those in the maximum amount that we've been running. And likewise with tasks, we've got eight tasks running in Max. Here is a memory and CPU efficiency work. We're actually doing some work now on optimization. So this is looking relatively good compared to some of the earlier runs that we've done. And again, I've got the task table here. If I wanted to do, for example, find out exactly what happened when I ran Sherman Galore and I wanted to know the exact say sample that I ran with that. Like, I don't know the sample names here. They're actually a little complicated. But if I was to select this one, for example, I know I exactly ran that sample. This is the command that got launched. All of the information here is the same. What's really nice though is it allows you here to get the execution log of the task. This is often really difficult to find. This is something which is like coming from the container. And it will maybe be on some history bucket, if you're lucky. And then you have to kind of dig in and find that. Here you can easily find this. If there's an error, it makes it kind of apparent to you. You can see exactly what's going on in that task. Find that information and maybe change some parameters and retry as you do. Here is some information as well, which you go into those ones. You can also see that in this case, this task ran on this trimgal or container. It was placed on this queue. It requested for CPUs for gigs of memory. It requested 16 hours. It ran on batch. It ran with the C5-4X large. That was the machine that it got placed on. And it got placed in this region here with the spot model. And therefore we can figure out that it cost around $0.07 to run that task as well. We can then compare how much it requested to how much it used. And from there, we can start to optimize those pipelines as well. And that's some of the work that we have coming out next week for the summit. So stay tuned for some of that work as well. Okay, I want to go through down to one which is completed and it's got a little bit more information. So when the pipeline's obviously finished, you can get some more information from that. I'm just going to go through and select one from the community showcase. You can follow along with this. You should be able to, when you log in, you should access to this community showcase. And you should be able to jump in and see exactly which one's going to go. So I'm going to choose the third one down here, which is NF Core RNA-Z. Select that one there. And when I scroll down, you'll notice across here that now inside of the data sets, I have the data set that's actually used. I'll mention this in a moment, but this is a way that you can link data itself so you can have a kind of version of that, which is kind of provides some reproducibility in there. We also have reports. And this is a way where you can define outputs of your pipeline, typically HTML, PDF, structured data, images, et cetera. And this makes it very easy for other people to come along and find. I want to say, find the exact, say the multi-QC report of that. I'm going to open this in the new tab. And a little bit like I was mentioning before, this is actually pointing to the real location of the data. So when I'm pointing to this here, you can see the full multi-QC report, but the location of that is in my S3 bucket. And when I say mine, I mean like the location where I ran that. We're really moving the data itself, I'm sorry, we're really moving the pipeline itself to where the data is and then running in your environment. I think this is one of the really kind of unique things about Tower and very much enabled by the portability that Next Floor provides with us when we run this. Likewise, with PDF files, you can go in here and you can see images, et cetera, wherever your pipeline generates, that the pipeline's from NFCore, obviously produce some very, very useful outputs. If you've got structured data, for example, here, this is some file, you can choose the delimiter on this, you can view that, you can find that information, download that, et cetera, from there. The other key information that's available once you've finished is the resources that you have for you. So here you can see I've got the CPUs, I can see the usage of that across the pipeline, each one of these lines, the process, and obviously like the different processes have got different requirements. Here you can look at memory, you can look at allocation, et cetera, and you could start to manually modify those resources as well, so you save a lot of money in terms of costs when you go to run that. What about if you're a developer and you're developing an extra pipeline and you want to make it available for everyone, maybe people in the wet lab, you simply want to run that same pipeline often with your own data, how can you make your next pipelines, it's a lot more accessible to people. Well, one way that you can do that is to create what's called an actual pipeline itself within Tower, and a pipeline in Tower is a combination of the repository, so like the Git repository that you have, combines with some parameters, so the parameters that you choose, along with the compute environment. And those three things come together and allow us to define a pipeline and make them available. There's another thing which was mentioned earlier on when you would have looked at the next flow, the schema, the kind of schema that was used for that. If you have a schema, which is part of your next flow pipeline, and many of the NFCore pipelines now do, that makes available that when you put that pipeline into Tower here, you end up with this pool, like, at least if you get a UI for your pipeline for free. And for example, if I look at the RNASeq one here, you can see that we've got the input itself, we've got outputs, you can define exactly what those, what those icons are here, you can add help messages, you can add booleans, you can add drop-downs, et cetera, and it kind of makes it very easy to use. Another thing which is really useful here is the ability to define what the input is itself. And if you mentioned, if you've seen before, there was this ability to kind of define exactly what the structure of that input should be. So here I could, for example, find my RNASeq dataset, and you'll see it suggesting to me the datasets which match this pipeline. And that makes it really easy because you can upload datasets manually or via the API, and make it very easy for people to come in who don't have to go find them, and find where their exact data is. On a quicker side, I'll kind of mention the datasets themselves and how you can do this. This is either typically sample sheets or structured data where you have individual samples. In this case, you've got some, the location of those files, in this case, it's S3, but it could be any location where you're launching, where those files are stored. Obviously, in this case, metadata as well. So they could be sort of complex or as simple as you like. And from there, they can then be used from, Nextflow will essentially pass those files as well, and then kind of use them as the input. Where we are thinking of heading in terms of this direction and it's kind of continuing down this path, but linking it in with the ability to imagine tie in different data sources. And the inspiration for this is coming from another tool to develop by Phil, which is called SRA Explorer. With SRA Explorer, you can find, essentially, metadata. So you could, for example, search the SRA for, in this case, human liver samples with microRNA. I could select my datasets that I wish, or my samples I should say to possess, add them to a collection. Maybe I wish to do some manipulation on them, or I wish to collect some particular data that I wish to run through my analysis. I can then simply take from my TSV file, I can export those. So basically download here, and you'll notice that we've got some accession, some sample IDs. We've got ultimately the FTPs here. We've got all of the information. And from there into Tower then, I can drag and drop that TSV file. I can specify that the first row here is a header. And now I've got this dataset structured in here, which Tower will be able to read, be able to save, and next floor could use that then as an input for the pipeline as well. And you can see it sort of becomes available there. So that kind of ties together. And that's our really first attempts at trying to do some basic data management. But there's obviously a lot of space to go to explore this and to think about other data sources as well. Going back onto running these pipelines, so launching a pipeline then, when you've got a schema provides you this ability to provide this UI, like I mentioned, we could find the dataset that we wish to use before. Let's say we could define, maybe we want to do some options, maybe we wish to skip this particular step, remove the RivaZonal RNA, choose a different database, whatever. And then when we launch this pipeline, everything is essentially the same. So it's going to go from that launch step. We can monitor that. Here's one I kicked off earlier. You can go through trigger that. And if you want to, you can jump in there and monitor this one. All of this here is completely public. So feel free to go into here now if you want to go try some of these pipelines. Kick off the RNA-seq pipeline, get a feeling for when you run that. If you want to add a pipeline there, you can also add them in. And it's essentially the same as running a pipeline as well. If you want to go into the advanced options on any of these, you can select the pipeline and then the launch settings here, you'll take you back to this piece here as well. Well, how is all of that managed? So here you can see that I've been launching pipelines, I've been launching them across different workspaces. But where is the piece that comes in? Well, there's a lot of platforms in the cloud for running pipelines. There's even a lot of platforms in the cloud for running next-flow pipelines. But none of them have the capability to really connect in with the compute where you are running in your environment. And the way that we do this from now, if you remember before when I was launching this pipeline here, I could select the environment. I'll just switch over to the other space here. When I was launching this pipeline, I could select a pipeline that I wish to run. Let's run the empty stack. And I want to run this now, let's say, in this environment. Notice the working directory here is a bucket. It's an history bucket in Paris. If I select this here, though, and I could select, actually run that same pipeline, maybe with the same exact version. And I want to run that not in Aeolius batch here, which is in Paris. But I want to run that on East US here. And when I select that, you can see that the working directory here changes again. I could run that same pipeline on Google Life Sciences in my environment that I set up. Or I could run the same thing in SLURM. And in each cases, this is essentially running the same pipeline, we've actually got that full reproducibility of the pipeline here, where we're really pushing the workload to where the compute is and to where the data is. I would just obviously say much more efficient way for running that as well. Okay. How is that all possible? Well, it's all possible from these compute environments. And the compute environments are things that are typically set up once. Once you set up a compute environment, it's kind of available for anyone in the space to use, anyone in the workspace to be able to run that. Creating a compute environment here, you can see that we've got the ability to support the four Amazon batch for Azure batch for Google Life Sciences. We've got the schedulers as well. So if you've got like a SLURM cluster or a grid engine cluster, you can do that as well as Kubernetes. I'm going to run through Aeolius batch and then also look at how these ones work here. These schedulers are all very similar. So with regards to batch, when I select this, I've got a couple of options. I've entered my credentials and those credentials should be able to be linked to the resources that I want to create. There's a full kind of guideline on the permissions which are required there. I can choose a region. If I want to choose, for example, that I want to run this in EUS1, I can choose a work directory. It's going to look up with my credentials which buckets that we have are available. In this case, I've got these are the buckets that are tied to my account, so I can select the particular one and say here. And then I can choose this option where I can choose a forge mode where it's going to create the resources for me. How many instances do I want? Max, I want 500 of those. I want to run them on spot instances. I want EBS autoscaling is going to increase the size of the virtual machine. I've got a whole bunch of options for Fusion, GPUs, Dragon, anything that you can kind of think of when you're trying to run these things at scale, EFS, FSx. For the most part, though, there's really not many settings which are required. It's really putting in the credentials, choosing your region, selecting a bucket and launching this. One thing I would say is if you are thinking, when you're kind of thinking about setting up your buckets here, it's a really good idea to put some kind of policy on them. Typically, because these are holding all of the cache data, the kind of temporary files, it's great to put a policy on them, maybe over seven days or maybe every month that you delete all of that content, unless you really need it. And then ideally, you'll keep another bucket where you're actually putting your results for a different location. This allows you to have a working directory which you can simply spin out and you can sort of not have to worry about costs sort of going out of hand and being able to manage those as well. So creating this computer environment in this case would then make it available for people to go around. Let's consider how we can do the same thing with a slurm environment or a grid engine or any of the other on-prem schedulers here. With this, I've got a couple of options to connect and you can think that the way that this works, again, with submitting the next floor job to their cluster in this case, you can see that we can connect by SSH or I can also connect over what's called the Tower Agent and with the Tower Agent, it's a small piece of software which runs on your cluster and it makes a connection out to Tower as opposed to with SSH where we're connecting in. And this just depends on the settings of your cluster, which is sort of best to use for that. And we can certainly help out with setting that up. Here you specify a working directory. This is the location on your Sheffile system, typically where you're going to have the work directory. The launch directory is where next floor runs from. The username, this is the username on your cluster. The host name is how you log in to that. And then the rest of these things are pretty obvious. The queue that you want to run on and the compute queue as well. And really from there, that makes it available that you can then split that again. So you've got your computer environment there which you can run and kind of launch onto there as well. So that kind of checks that out. Let's just see if there's anyone running there. Okay, so that's the main pieces with regards to the computer environment. You'll notice that you can also connect in here to different repositories. And one of the settings for doing that is the connecting in with the different providers, for example, GitHub or GitLab code commit bit bucket. And this allows you to connect them with private repositories there as well. You also have connections in with your cloud providers. So in this case, you can take in those cloud providers as well as the Tower agent and the SSH, like I mentioned before. Those credentials are typically for the setup of the infrastructure or how you're going to run the pipelines yourself. If you want to think about secrets or essentially things which are inside of the pipeline code. So these could be like an API token which maybe you need to access which is used as part of the script section of your pipeline. Or maybe this contains a license key that you use to activate the software. This can be used and this can be stored with what they call secrets. And this is our directives and next flow. And here in the secrets you can define, for example, here I've got an example of using the NCBI access key or a password for running Dragon or license key. And by selecting those, I can then reference in my code that this is a secret in my next flow code and from there, the actual secret management takes place that next flow tower will export the secret to a secret store with regards to AWS batch. It will be placing that inside of the secret management. And then from there, when the container is launched, it will be the secret or be injected. And then you don't have to have any of the secrets inside of the code or inside of or really stored in the logs or anything like that. Other factors to think about here is really kind of in terms of how you can do this with multiple users. So I mentioned before that this is really useful for collaborating with people. It makes it fantastic that someone can very easily help you debug. But you can also share your pipelines very easily with colleagues and that you can even say, take over a pipeline and say, I want to run what you've run but with my own data, for example, makes it very easy to do that as well. Another great thing about the setup is that you can define the role that people have within the side of the workspace. So going back to this verified pipeline's workspace here, you can see here that Abhinav here is an owner but he could also be an admin, essentially for permissions. You've also got a maintainer, which is someone who's really going to change the pipelines but not the computer environments. A launcher, just launching those pipelines and a viewer as well. So all of that kind of becomes possible and defined inside of that space. A few final things then that have become really important is really around automation. We have people who are launching from Tower and launching extra pipelines where they have tens of thousands of runs a month. And the whole point of that is to really have Tower running those pipelines as a service. And the way that we can do that then is to automate the execution of the pipelines but also trigger those things from API. I've got a couple of options for doing that. So far I've shown you everything is kind of in the GUI but all of this can be run from API. When you're referencing specifically that I want to launch a pipeline, we have particular actions for that. So this is a couple of actions. I'll jump back into the verify pipeline space here but a couple of those, sorry, into the community showcase here. Those actions involved here, the commit, for example, if I commit to the skip repository, it'll fire off the execution of this pipeline based on the settings. I can also have a launch hook here and one of this launch hook, it provides a customized endpoint which if I hit that endpoint with these parameters, it's going to fire off the execution of the pipeline. And the real value here is that then you can connect Tower up to any other application that you're using. So if you have some downstream process, you can do that. There's also some blog posts on how you can do this. You can connect it up where you can say, based on data entering a bucket, I'm going to fire off the execution of this pipeline based on these settings. And you can link all of this together and some of this can be done via the documentation here. If you look inside of the API, you'll see that there is the full endpoints and this is published as a kind of open API that people... So the same API which is used on the front end is the one that's used and published here. So you can access any of these things. The computer environments, the launch criteria, set up organizations, et cetera, can all be managed from that API. The API is not the only thing, it's also the ability to link in with a CLI. So if I go into here, I just go for the explore Tower CLI. And this is fantastic tool, particularly for if you wish to set up your infrastructure code. So if you wish to, for example, take particular pipelines or computer environments, we often store them as a Git repository. So like the JSON of that, and you can input and export from this. You can also launch pipelines from here. So instead of having to say, next to run, you can say tower launch and essentially launch that pipeline in there. You can provide the parameters how you wish. You can view pipelines. You can really any of the other kind of things that are shown before actions, et cetera. So where it becomes very, really useful is the input and the export of pipelines, as well as like whole workspaces, which makes it really powerful for doing that as well. Take out a file notes here. File notes on here. So the kind of model for this, if customers wish to install the whole tower itself, this is what we offer as part of Tower Enterprise. But you can also try this out and sort of recommend trying this out at Tower Cloud. It's kind of the way to go for running that. If you have any questions on this part of things, you can also head over to the next floor slack. In the next floor slack, there is a channel called Tower Help. Inside of that there, you can get some help on that. We'll always be willing to help you out. And then obviously support you as part of that. I would point out on the next floor summit in around a week's time, we're going to be presenting a whole bunch of new stuff on this. So really excited to kind of show some work there around optimization, around the management of pipelines, around providing a more overall view of what you can do in Tower. And some of the cost management, particularly when it comes to cloud. So I will stop there. And I guess I'd like to firstly thank everyone for attending this course. It's been fantastic to see so many people attending. Also to everyone who's been helping out behind the scenes. And there's lots of people who have been giving up early, staying up late to try and cover these different time zones. But other than that, I will pass it on to the people who want to come on screen and for the final folks there. And say thanks a lot to everybody. As I said, we're always available on Slack. Hopefully you've got the feeling of that, that there's a whole bunch of people there to help you out in time. And so we're really excited to have you part of the community as well. And always kind of willing to support your work. Absolutely. Thanks very much, Evan. It's great. I think Harshal's already gone to go and have a sleep. Too early to start. But thank you everybody for joining. It's been a real pleasure. We've seen some great questions coming in. It's been really fun going through everything and chatting to some of you. Please stay in touch. And remember that we originally planned this training to try and encourage people to come to the NFCore hackathon, which is going to happen next week. So remember the registration for that for online attendance is still open. So if you can find the time, it's the perfect time to strike, jump into the hackathon, put all of these new skills to the test. That really is the best way to make it all stick. And all of these streams will still be available on YouTube. Going forward. So you can always come back and refresh yourself. We'll be sticking around. So if you have any comments or questions, please pop them on Slack. We'll send you an email with a feedback form pretty soon so you can let us know how you think we can improve this next time. Beyond that, thanks very much and I hope to see you all soon.