 So hello, welcome everybody to the fourth day of the hackathon. Today, we're going to have a talk by Phil Iwells about tools and guidelines for NFCOR. And after the talk, we'll tell a bit more details about the social event today. Join us later for a separate Zoom session for this social event at five. And yeah, I'm going to tell more details about it after Phil's talk. It's cool. Thanks for the introduction. Thank you everyone for coming back on day four of the NFCOR hackathon. I hope everyone's been having a good time so far. It sounds like there's been a lot of activity on Slack, so there's certainly a lot of commits coming in. So I think we're making great progress. Today, I'm just going to talk a little bit about the NFCOR help with tools package. Those of you who have been involved with developing things for the NFCOR will be fairly familiar with some of it already. But I thought I'd go over the different commands which are in there. Some of them are quite new. So I'll talk about the newer features in a bit more detail because it will affect most of the other one who is working with NFCOR. So I'll dive in. For those of you who maybe are just joining now, my name is Phil, Phil Ewells. And I'm working in Sweden at the National Astronomical Center Structure, which is part of the SILO Club. And so too, and GitHub and whatnot. And one of the founding members of NFCOR. And yeah, today is all about NFCOR tools. So this is the name of the GitHub repository. I tend to overuse the word tools everywhere. But basically what I mean with this is it's a Python package. So you can install it through Git, the Python package index, or you can install it through Conda. It's a Python module which you can import and use in your own scripts if you'd like to. But I'm not going to talk about that today. The main thing it gives you once you've installed it is a new command line and command, which is NFCOR. Not very inventive, but hopefully easy to remember. So what can NFCOR do? If you just type that without anything else or with the minus minus help, you'll get the help text. And that lists all of the different commands that are available to you. So there's multiple different subcommands. And those options at the top, the version, verbose, and help, they work with everything. So if you ever want the verbose, output, or the debug log messages to NFCOR minus B, and then the subcommand that comes first before a new subcommand. The different subcommands here are kind of roughly in two different categories. We have a bunch of commands at the top, which are all about running pipelines. So if you feel like a user for NFCOR, if you're using NFCOR pipelines, you're not necessarily developing anything. And I'll go through those in a second. And then the ones at the end are all about making pipelines and helping developers. So start off with tools for running pipelines. You run the picture, I like to go into presentations to throw people off the center. So quite recently, so all of these screen grabs are pulled from NFCOR development version, which will hopefully be releasing. There's my aim to release it this week. We'll see if it happens. And I've tried rewriting a lot of the help texts. So don't forget that this minus minus help exists because it's usually quite informative. So if you hit NFCOR list, minus minus help tells you what it does. It's from the simplest of all commands. It just tells you what NFCOR pipelines exist. So basically the same as the NFCOR website with a few extra things. So you can give it extra filter keywords if you want to. So if you just want to see pipelines which are related to RNA, you type NFCOR list RNA. You can also sort by different things. So by default, it sorts by the most recently released. You can have JSON output if you want to use this programmatically. And that's about it. I don't know why that's there. So if you run it, you'll get output which looks kind of like this. So just on the command line tells you all the different NFCOR pipelines and when they're released. But the useful thing really is tells you on the right-hand side the last two columns, whether you have a local copy of this pipeline. So that means have you used NextFlow run with that pipeline? And if you have, then NextFlow will have pulled that pipeline to your local NextFlow assets folder, which is usually in your own directory. So this command looks to all of those, finds them, tries to identify them, and checks when you pulled it, how long ago, and also compares that against the latest release which is available. So if it's green and tells you you've got the latest release, and if you just run NextFlow run via recon, you will be running the latest stable release of that pipeline. And if it's red, it shows you that if you just do NextFlow run, and of course, a taxicare in this case, I'd be running kind of a random development version, which is not the latest stable release. So it's good just to keep an eye on, and helps you know whether you should be doing NextFlow pull on any pipelines. Just a good, quick check to do. This screenshot, like many others, when you get version 10, 1.10 will actually look a bit different, because I've just been spent the last few days of this hackathon, mostly today and yesterday, rewriting all these outputs. So it's actually in a really pretty table now, using a new library called Rich, which was suggested. So I've been kind of having fun with that. The terms are the same, and the outputs basically are identical. So don't be afraid if it looks a bit different. Okay, next up, NFCore download. That's pretty much what it says. If you give NFCore download the name of the pipeline, such as here is NFCore download the taxicare, it will grab all of the workflow files from GitHub and download into your computer. It will also get the NFCore configs repository to download that. And it will actually edit the pipeline config ever so slightly, so that by default it knows the relative path to those configs, which means that all of those institutional profiles work without an internet connection. It takes all these files which have been downloaded and it compresses them. And this basically, the whole point of this command is that if you're running on a cluster that has no internet connection and you try and do next-play run on NFCore taxicare, it won't be able to pull the pipeline for you. So you have to do that on a different system. And this just simplifies that process for you. So it just gives you a nice single archive file which you can easily transfer to your cluster and then run offline. If you run with minus, minus singularity and you have singularity installed, it will also pull the singularity image for you and also put it into that same archive. And of course it's very nice and gives you the tar command that you don't need to use on the cluster. So you can just uncompress it very easily. Okay. And of course license is probably the least used of any NFCore commands. Most people even on the NFCore core team, some of them I believe don't know this exists or didn't until we were writing a manuscript and I described it. This has been like a side project. I just, I can't remember what we needed it for. There's one specific thing and I kind of realized we could do it because we're using conda, environment files and the Anaconda API has all the information about all of these packages. We can fetch the license information for every dependency within an NFCore pipeline. So if you run NFCore licenses, RNAC or TAXIC or the name of an NFCore pipeline, we'll fetch that environment file and it will pull all the licenses for every single dependency. I'll also tell you what all the different versions of those dependencies are which are in the environment file. Bit of a niche tool, you know, kind of cool. Right. Those are all the commands for if you're running pipelines. Now I'm going to go on to the slightly more kind of probably the more commonly used to be honest or certainly the more powerful commands within the NFCore tools package, which are all about making pipelines and helping developers. The number one command, which is probably the real core, the real crux of NFCore is the pipeline to create a new pipeline and of course create. By default, you can run this without any arguments and it will prompt you for a name, a description and an author. Using these three values, it will then fill in basically a template, a template pipeline, put those variables in all over the place and create a skeleton pipeline for you. It also sets that up as a git repository and gives you these commands just here. So you need to CD into that directory, add a remote. So we get a remote repository and then tells you to push all of them. This basically completely sets you up for all of the downstream things you want to do. And if ever you want to make a pipeline part of NFCore, it's really going to make your life much easier to start with this. One of the key things it does is it sets up that git repository but it also makes the first commit which is a completely vanilla of a template and this helps the automatic synchronization of a template that we did later on. This skeleton output that generates looks something like this which will be very familiar for any of you who've worked with NFCore pipelines. So it gives you a change log file, it gives you the NFCore code of contact, the MIT license read me file, all this kind of stuff and also gives you the actual pipeline main.NF file. This is a mini but fully functional X-ray pipeline right out of the door and should all follow NFCore guidelines from the word go. There's a lot of files here and it's a bit overwhelming, especially for newcomers to know where to start. So one of the things we've done is we've littered these files with little comments. These are specifically formatted as to the NFCore. So you can see here, this is a documentation usage.md markdown file. We've got a little comment here saying to do, document what you need to do here. So this is really useful. Many code editors can automatically find and highlight these kind of to do statements. So as a starting point, you can just find all of these and just work your way through following the instructions basically. Following on from that is the NFCore lint command. Now, with something the size of NFCore, we have all these guidelines that we ask you to adhere by, but manually vetting all of those all the time across all the different pipelines with so many different contributors is just impossible. So we have to automate this. And so linting is just a term that means basically code checking. It's this kind of meta code that checks code. So if you run NFCore lint, can you give it a pipeline, the directory that has your NFCore pipeline in it. It will run a whole suite of tests against that and check all of these things that we expect you to do and not to do within your NFCore pipeline. Many of these are stylistic and some of them are hard passes and some of them just give you warnings. So in this case, you can see that we had 85 tests passed, 36 gave warnings and there are no failures. So that's pretty good. Each one of those gives you, we're told about all the different warnings. So each one of them you can see comes with a URL of the NFCore website with the ID of that specific test type. So if you follow that URL, you'll find this documentation in about what the lint test is looking for, why it fails and hopefully it will help you to fix it. It also gives you a little bit of description text you can see here saying what went wrong. So there is a problem here with the GitHub Actions workflow. There are two condo packages where it's checked the Anaconda API and it's found that there's a newer version of MultiQC and FastQC available so you can update. And then it also finds all of those to-do strings that I mentioned earlier. So just in case you weren't sure if you had all of them running the NFCore lint, we'll tell you if it finds any. Pretty cool. You can run that on the command line and I recommend you do. But also this tool runs automatically with GitHub continuous inspiration on GitHub Actions. So every time you push a commit and every time you open a pull request to the pipeline, this will automatically run on your code. So this is a screenshot for GitHub Actions interface and you can see exactly the same command with exactly the same output. This is really great and it works beautifully and it means that when we are reviewing pull requests if you get a little green tick next to this you know that there's nothing seriously wrong. One of the new things which is going to come out in the next version of the template which I've been playing with quite recently and I really really like it. Hopefully it will be useful for everybody. One of the problems with this is that here you can see there were no test failures and that means that this automatic test got a little green tick. Basically there were no errors. But you can see there are lots of warnings here about content packages being updates being available and stuff like this. So these test warnings are kind of still interesting to see as a pipeline developer. However, if you get that little green tip the chances are extremely high. You're never going to click through to this interface to get a log output. So these are very often overlooked until something actually breaks. So one of the additions we've made is we've added a new step in the GitHub Actions workflow which checks this output and does something kind of a bit intelligent with it and actually automatically creates a comment on your pull request with the results of those NFCore lint tests. If you only got past tests you didn't get any failings and you didn't get any warnings this will show up saying what the overall result was and how many of those have gone up. It's basically a rich text output. And this little details tab can then be expanded and you can see all of the different test results and again click through to the documentation. So hopefully this will be useful and make those warnings a bit more relevant. Okay, on to the next commands NFCore bump versions. This is not a very interesting command but it's very useful. Each NFCore pipeline has a version tag which is very useful for stability and everything. But it's all over the place. It's in an actual config file. It's in the metafiles. It's in the script and it's in the name of the condor environment. It's just in loads of different places and if you try and manually update this version number everywhere you're almost inevitably going to miss one and that will just break all kinds of things. So we automate this and now you type in NFCore bump versions give it your directory of your pipeline and tell it the name of the new version that you'd like and it just goes through all of your code and updates it in all the right places. Simple. It can also have a minus minus next row flag so if you need to update the minimum version of the next row that's required to run your pipeline you can do that. And of course sync one of the most feared pipeline commands one I've spent a lot of time on and no one else has touched because it's a powerful and a bit terrifying. Sync command is for synchronizing your pipeline against the main NFCore template. So say that you run NFCore created with version 1.6 of the NFCore template and then we bring out new versions of tools which have improvements within that base boilerplate code. We want to make sure that all of those NFCore pipelines are all using all of those improvements that we need to synchronize the template files. But of course this is not trivial because you've added loads of your own code on top of the template, modified it in a hundred different ways. So the NFCore sync command automates as much as we can of the synchronization process. And if you've worked with NFCore pipelines and sort of seen mention of this mysterious template branch this is where this really comes in. And I don't think I have any screenshots because this is very code heavy and kind of text heavy so it's not very interesting to look at. But basically the template branch should contain vanilla template code. So exactly the output of NFCore create command and it should have a shared history with your main pipeline. And what we do when we do the automatic synchronization is we check out that template branch we delete everything. We run NFCore create again with the latest version of the template. That gives us a new command and a new diff on that template branch between what changed from the older version of the template and the new version of the template. And then because we have a shared history with your main pipeline we can create a pull request from the template branch in. And every time we do this it knows about the previous times it's been merged in. So the more times you do it the simpler it gets and hopefully it knows which parts of the template you've already customized and makes this possible. So when we do the pull request and we have a small regular updates this pull request is fairly small and it's just for one play updates. Unfortunately we're not very good at doing small regular updates. We always promise to do it. So what you might find is when this automated pull request is opened you have a lot of merged conflicts and you have to fix them manually. But that's there's a lot of documentation that we've written recently on the NFCore website. So you can run which is bundled within the NFCore package that you have installed. And that just works on your local pipeline directory. Again, it needs this template branch, which is there. But there's also a magic minus minus all flag, which is used in the automation. And so when we push a new version of NFCore tools, this is run on GitHub Actions. And it tries to automatically check out and sync every single NFCore pipeline. It is a terrifying thing for them. We've only written this and rewritten it twice now. So it's run about three or four times in total now. And it gets a little bit better every time and a little bit more stable. And each time it happens about five or six or seven pipelines fail, I try and figure out why. We're gonna do a new release of NFCore, like I said, hopefully this week or very soon. So this will run again and make me sweat for a few minutes and we'll see how well it works. And if any pipelines are actually synchronized. Brackets. For those of you who are keen eyed, you will have noticed that I missed off a couple of those commands at the start. Those commands were NFCore launch for users and NFCore schema for developers. And this is because this is a suite of new tools which are coming out with a new version of NFCore tools, new helper commands, which is completely new. But it's gonna become a hard requirement for all NFCore pipelines. So once this new version of tools comes out, all of your lint tests are gonna start failing until we update the pipeline. That's intentional. I'm sorry about that, but we just want you to update your pipeline. So what is a pipeline schema? Right, let's go back to basics. In NextFlow Pipelines, we have params.foo and we set it to a default value and we can use this like a regular variable. So we do NextFlow run and we get hello world. We can then run with minus minus foo. And of course, that's overwritten within NextFlow and we get this nice customized output. Now, as everyone knows, we use these params all over NFCore pipelines. How you customize every little nook and cranny of execution. And in the bigger pipelines, especially there can be a massive number of these. You have to use them really to run pipelines that are completely required. And it's quite easy to make typos and it's quite difficult to know what parameters exist and how to use them. At the moment, we have manual documentation which we've just written out. The developers just have to kind of make sure to keep that up to date. But of course, with so many parameters, it's easy to miss them. So sometimes parameters become undocumented or outdated. And this can cause quite a lot of problems. So what's coming? There's a new file for every NFCore pipeline which is called NextFlow Schema.json. It's a pretty simple file. We just describe every single one of these params which is used. So here we say it has a parameter called foo. It should be a string. So not a boolean or not a number. The default value is world. It doesn't miss default. It doesn't always have to match the NextFlow config but it should 99% of the time. And then we can write a description of what the parameter is. There's a few more fields we can add but these are kind of the core ones. Now this is written using a standardized format called JSON Schema which makes it easy to kind of tie in with lots of other systems. And it just allows us to automate all kinds of stuff. One of the most obvious things is that we're gonna use it for validation of user inputs. So that is if you do minus, minus foo but you give it a number instead of a string or that's a bad example but if it said a number and you gave it a string it would fail immediately rather than trying to run the pipeline and failing at a later stage. Also if you try to run with a typo then it might say as an unrecognized parameter or if you miss out a required parameter then it will just fail immediately. So this automated validation is gonna be incredibly useful and hopefully save you a lot of time by making it kind of a fail fast system which can be very validated. The other thing is because we're writing description and actually also help text within the schema files we can reuse this all over the place because it's programmatically available. Instead of being stuck in a static markdown file we can pull it out when people run things on the command line so we can have command line help. So when you do next row run pipeline minus minus help it will be able to use this file and spit everything out. We can also use it to generate the documentation on the website and we can also start to build user interfaces and the same text comes up place and place over and over again but you only have to write it in a single place. So this is much more efficient and much easier to keep up to date. Writing a massive JSON document is really dull. So one of the first ones we built to do this was NFCore schema and then build. And again, you give it a path to your pipeline. And this command does a lot of stuff. So let me walk you through it. The first thing it does is it looks at your pipeline to see if it finds a schema file. If it doesn't find a schema file it generates one based on the NFCore template which is of course bundled with the tool. And if you use NFCore create from the next release onwards then when you start your pipeline you will automatically have that schema file there already. If it does find the schema though it just checks it, make sure it looks valid. There's lots of code in here to validate a million different things. So revalidate the schema with the JSON in the right format basically. Once we have that schema passed we then start to look in the next way.config. Actually run next to a config and get a list of every single parameter that your pipeline uses. And we also scrape manually from the main.net file. Once we have that huge list of plans we check that against the schema and we look for any mismatches. These are then prompted on the command line page. So you can see in this example we found three parameters which were in the next row config of the pipeline but not mentioned in the schema. And it's saying, do you want this to be added to the schema? So this might be new options that we just added in the code. It works the other way around as well. So if it finds things which are in the schema and not in the pipeline it checks that they're definitely meant to be there if you changed or camel case to a set name or something like that would change it for you. So it works you through that on the pipeline and then it saves the updates of JSON file. So that's great. But of course you still don't have any description text or help text and we haven't ordered anything and really we want to do a lot of kind of manual customisation of this file. And again that's really difficult to do either manually or even on the command line. So what the schema build comes with is a web builder tool which is integrated with the NFC call. So here it says in verbal, do you want to launch this? Hit yes. And when you hit yes it should like open up a web browser for you. Importantly, you can see that the command line here carries on running. You've still got that little typically going bottom left and it's just waiting for you to use the website and it will sit there for a while basically until either you exit or it finds an updated status on the list. When that URL opens you'll be presented with something like this. And you should see all of the different things that you've got in your JSON schema. So there are three new options we added which are at the bottom and we can drag and drop them. We can give them a little nice icon. We can write some description text which can be marked down. And we can write some longer help text which is also marked down. So this is a really graphical tool. It comes with a whole bunch of keyboard shortcuts as well so you can move around quickly or do things patch and you can preview the help text as you like it. And you can set all the different attributes like whether things are required, different variable types, what the default value should be. And this gives you a lot of power to customize it. You don't ever see any JSON data changes. There's also the settings at the end. So you can set up things like regular expressions which were strings up to validate or if there's a list of values which were allowed to be set and you can see you're going to see values when you can set that list. So this is really powerful and that can help you very quickly to use. Once you have finished, you hit save, save changes and you hit that big blue button on the top right which says finished. And then what will happen is in the background that command which was sat there waiting with all the checks and pins of our site every couple of seconds and you'll see that the status is updated. It will download that edit with JSON schema and it will save it into a file. You can run JSON schema build with many times as you like and it will just update it, update it. And of course, if you don't change anything then it will just run a window. So this is both for building it in the first place and also for keeping up to date in the future. Cool. So now you've built your schema as a developer. What else can we do with it? The first thing which of course is super cool to do is we can start to make nice interfaces to launch pipelines. This is now pretty, pretty developed and there's also coming out with actually some other cool things. If you go to NFCore website some of the newer pipelines where the developers, no name of mentions got excited about this underneath feature have already started building schema. And if you go to a Tati Chipsy ego, I think maybe one or two others, you'll find that the button that normally says like new code now says launch version. And this is only showing when the schema file is found for the latest release. You click launch version and it takes you to a new page on the NFCore website. It looks a lot like the schema builder but now instead of customizing where these things go you have a list of all these different parameters and you can actually start to fill in values for them. They're treated intelligently so if it's a string, you get a text input. If it's a number, then you get a number field. If it's a list of new major values you can select drop down list. So it's kind of built intelligently. And also the JavaScript on the front end validates this as you get. So if you have a patent for example for this email, it tells you immediately no that's wrong or no you missed required input. When you're finished, you can hit launch on the top right there. And again, it saves as cached version much like you did with the schema but now it's saving the cached version of all the parameters that you just inputted. And it gives you these nice commands underneath. So now we have this NFCore launch command with minus minus ID and then cache identifier. If we copy and paste that, paste that into our terminal the NFCore command line tool now sees the ID. It checks the NFCore website for that ID. And if it finds it or download parameters file which you've created, and then it saves it in your current working directory and offers to run the pipeline for you. It knows which pipeline you built it with. It knows what version of the pipeline you've selected and it puts all of those parameters into the JSON file which you can run using the minus collection file. There's some options for example, if you don't want to use the JSON file you can tell NFCore launch to build it all on the command minus minus this, minus minus that, like you might do manually. And there's a few other options there. So this is now super user friendly. You can do all of this with a click and kind of very native interface which will be hopefully be really easy for both kind of experts and newcomers alike. And then you just paste a command into the command line and off your pipeline. Super cool. Natural thing of course is that it'd be rather nice to take these parameters and launch them in other places such as for example, I said, next row tower. That's not escaped our attention and have to keep your eyes peeled to see if any new buttons appear in the future. This is great. But of course everything I've shown you so far requires an internet connection. And lots of people use next row on clusters which don't have internet connections and things like this. So the NFCore launch command also works really well for pipelines which are offline. So you can run NFCore launch on a local directory. This can be any next row pipeline, doesn't have to be NFCore. Again, it checks that schema. And now you get a command line wizard which takes you through all of those same options, prints and descriptions, prints to help text, renders them up down on the command line which is super cool and validates all of your inputs here. So checks where the things are required. And again, you can see all these different options of groups here so you can kind of work through the equation. And this would be a numeric value and you can see immediately this telling you that. So again, it's very user friendly, maybe not quite as visual as the website, but very quick and easy to use and intuitive. And this runs on any next row pipeline with a schema and it runs without an internet connection. Pretty cool. Right, that's it. There are other things coming with NFCore tools. I haven't described, for example, a new series of commands called NFCore Modules, or which will be for working with it, NFCore Modules DSL2 code, but it's still very much in development. And there's a whole load of changes which have just been made this week, finding things that are making things nice. So when you do try this out, hopefully things will look even spiffier. Just like everything else, we have a Slack channel for tools. So if you're using this and have any questions or suggestions, jump on there and all the code is on GitHub, like everything else. With that, I hope that was useful and maybe hopefully everyone learned a little bit new with something new about the NFCore tools package. I'm happy to take any questions. Thank you, Phil. Very interesting developments and exciting developments on this NFCore, new NFCore tools for build and launching pipelines. Are there any questions from the audience? Please write them into the chat. And I have a question meanwhile. So for the NFCore build and launch, the parameters get caching the website for how long are they available there? Seven days, I think. Okay. Off the top of my head. Yeah, if you try and access it, it will be told that it doesn't exist or it can't be found. And every time this tool runs the website checks all the different caches into each of the old ones. If you're worried about data privacy, of course, use the offline tools, even if you have an internet connection, that's part of the reason that we've built those. Perfect, thank you. Yeah, we have a question from Alice. Could it be possible to have an assume yes option for the launch command to launch from a job on a non-interactive session? Yeah, so I thought about this with the NFCore schema build, there is that option, but it's not called assume yes, it's called from no problems, I think, but it does basically the same thing, goes through it and makes sure that the schema and the pipeline match. So the NFCore launch tool, it makes slightly less sense because if you're gonna run it non-interactively, you may as well just run next-flow run. That takes all the same defaults and that does exactly the same thing. So if you run NFCore launch without entering anything, all you'll get at the end is an next-flow run command with no options. So it doesn't really make sense to run NFCore launch in a non-interactive way because that functionality is the word you're available with next-flow. So no, that option doesn't exist in NFCore launch. If you can come up with a compelling argument for it, then I'd be happy to listen. With one exception, as I know, yeah. So you can also give NFCore launch parameters file that you've previously written. So for example, if you run an X wave and then a core pipeline with a schema once before and you wanna run it again with using exactly the same inputs you did before but tweaking one or two, then you can supply that file and it will go through and basically overwrite the defaults with whatever you did before. So you can go through and check and edit things and rerun if you want to. And I was gonna say you can run that non-interactive but that's a lie, it still has to be interactive. And again, if you wanted to run that non-interactive you just do it with an extra command. Okay, perfect. Thank you very much for this talk, Phil. We are all really excited to try it out. So we hope to push for a release soon. Yeah. Yeah.