 Hello, everyone, and welcome to today's PyTest talk. I'm very happy to have with me Adam and Satish, and they're going to talk about how to convert PyTest into NF tests. And it's off to you. Yes, that's it. Still a few more minutes. Hello, right. So, yeah, we thought it would be useful to give a little demonstration on how to convert an existing PyTest into an NF test. You know, so as you update your modules, you can migrate them to a new testing frame that we started to shift over to. Maybe some extra hints and tips that you might like. And just brief over here how to do it. So, yeah, PyTest workflow has been really good, to be honest. It's been able to test this massive model repo that again, of course, modules and some of the pipelines as well. But it's indirect. It's using PyTest to call a sub process. And then we have to write that sort of next low pipeline. And then we have to test the files that are coming out of this. So we're always a little bit out of the loop, you know, always waiting to see what files are created and testing those files. NF test offers quite a lot of better advantages because of its integration. So, for example, for PyTest workflow, we have to write this custom main NF workflow. And that uses, that basically wraps up the tool you want to test. And then it runs that workflow as the sort of default workflow. Whereas in NF test, we can have these sort of when and then statements so we can really find control what exactly what you're doing within each test. And like I said, in PyTest, you can only really test files that are created and just look at the hash that's created. Whereas NF test supports this snapshot, which is really useful sort of automatic data structures that just, you know, check the state of the file and then check that it hasn't changed. We can also because it's built in or extension of next low itself, we can actually do loads of different assertions. So we've only really stretched the surface here where we does all file contains and names like that. But there is additional plugins for things like faster files and we can start to really dig into the granular nature of it. You know, we can test these variants appear in the pipeline and stuff like that. And because it's extending next load and groovy, we could start to write our own code to extend this with plugins and extensions. And so I'm going to give a quick, oh, sorry. So firstly, let's have a quick overview of a PyTest versus of NF test. And so, yeah, at the top, we've got this tests main NF and that is a workflow. Like I said, it kind of wraps up the module or the process or the workflow that you are testing with PyTest. We've got a configuration file. We have to add this just to export the files. Most of the time, that's all it does. But sometimes it adds some extra settings, you know, like an argument to a command or something or additional setting that you might be interested in. And then we have finally, we've got the test.yaml file itself. So this is the actual file that PyTest looks for. And it contains a shell script that it's going to run. It contains some tags that are continuous integration looks for that says what the test is testing. And then it also contains the like assertion statements. We've basically come down to is this file created? Does the file have an MD5 or does it contain these strings? When we go to NF test, the first thing we've done is we actually moved where the tests are. So they used to be created in a slash test slash modules NF call fast P. This is a fast P example. And now we've moving them into modules NF call slash fast P flash tests. And that just makes everything self contained and adds a little bit of ease of use when you're writing them. Because you're not having to do is massive relative imports where you go around the entire tree. You'll see that when I demonstrate it. The actual test file is a main NF test and NF test is some really nice tool into generating a lot of this for you. So it does most of this for you. And this will take the role of that test YAML file from PyTest workflow. The snap file I've just mentioned is the equivalent of the MD5 sums and the contents and checking that the output, the output of the the process or workflow is the same. This can be quite complicated. You can actually represent, you know, things like the coming out of the process without actually writing a file or, you know, or values of the meta map and stuff like that, which isn't available in PyTest workflow. We've got a configuration file. This is actually optional. This could this generally you just copy it straight from the PyTest workflow if you're if you need it. But it's actually not that needed because you don't need that file writing configuration. Finally, we've got a tags YAML. This is a lot simpler. This is just a lot simpler than the equivalent in PyTest workflow. It's just a single set of tags to match the folder. And basically what happens is our continuous integration picks up that tags YAML and checks if any files have changed in that module. And that means that for NFCore modules, when we update it, we don't have to test 500, 600 modules at a time. We could just test the one or two that changed. All right, time to do a little live demo. Okay, so here I have a callista and this callista index. So it's a really simple test. We have one workflow here. This is the tests met modules NFCore callista index main NF file. We have a single workflow test callista index and all that does takes one input file and puts it into the callista index process. Here's that relative input I was talking about. We have to go back six layers and then back all the way through by putting it in a test folder. You'll see it's a lot simpler. Okay, so if you want to generate a new test, we can actually generate it using the simple command NF test generate process because we're testing process, but that could be a workflow. NF test generate process. And then we point up the main NF file that represents callista index. All right. So just looking there, it's created me a file here and I can open it. I'm just going to move it to the side here so we can do the side by side. We have here we've got the main NF test. We can see it's kind of got names, script, process. It's all fairly self-explanatory and the documentation can help lead through any of these bits if you maybe not quite following what they do. But the important bit is here we've got this when statement that says what the test is going to be and then then which is the assertions at the end. Now to lift over our high test workflow, what we need to do is essentially take its input and put it into the callista index process. NF test will do the process for us. So we just need to specify the input. So we can literally copy that file and we can drop it here. I'm just going to tidy it up a bit and get rid of this because it's just an documentation. And that's basically the start of our workflow. Now there's a couple of more administrative things we need to do. So firstly, we've started to do a relative import. So what we do is just go back one folder and import the main NF file. This just means that it's all just nice and self-contained. The other thing is we start to put it in a test folder. So NF test won't do this for you by default. So I'll just have to do it now. So I'll just create tests, main NF test, drop it in now, move. So this now means that it's in the test subfolder of callista index. Tags. So we should add the tags. I've just copied them from another screen to make it easy. But basically we have four tags for every tool, or three or four tags for every tool. And they just tell the continuous integration what this test actually is. So it says this is a module. It's an NF core module. This is what part of the callista suite. This is specifically callista index. And sometimes that should be slash. So we have now four tags and they'll be picked up by the continuous integration. We've got one more thing we do. So by default, NF test adds this assert statements one by one. It will go through each of those and test them. So the test is top one first and then test the second one. We've started to move over to using this other thing they provide called assert all, which basically does all of them at once. And if any of them fail, it will raise an error. This just helps capture any errors that you've got. There are use cases for using a certain zone, but in general, this is what we've had to do. So I think it's cropped off the edge there. Assert or set process to assess a certain snapshot matches. And this is the snapshot is what we're going to be creating to match our file. Okay, so now we've got, we actually don't need to prime to get rid of that one because there's no parameters to set. All right, so now we can test it. So I'm just doing NF test, test. I'm going to get the modules, but you could use a tag to do that just as well. I'm using Docker profile because I've got it ready on my machine. And then we've got a clean snapshot, but there's a couple of snapshot options for refreshing the snapshot or creating a new one. Here we can see the snapshot has been created. This is adjacent. It's kind of not human readable exactly, but you can certainly check it. One thing I've been starting to do is actually you can go and you can get the hash. So this is the old PyTest workflow file. And we see the MB5 of the output it's created. And you can see it's two we can actually find that in here. And we can see that the hash matches. So it says that the test hasn't changed anything. We can pretty much do a direct translation from PyTest workflow to test. And that's pretty much it. So now we just need to add our tools. And we've added the started building. Oh, I forgot the tags. So we need to add one more thing, which is the tags file. So tags.yaml. And this is really before we build NF Core Tooling, which we hope to do pretty soon. So we'll do a lot of this for you. But yeah, we're going to add a tag. And a tag is basically the name of the module, which matches this tag here. And then it is the path to the module with two globs to say any file within that folder is changed. So we can now that and we're ready to go. And if you can always just run that test again, make sure it works. And then okay, green tick. So I'm going to switch over to Satish now. He's going to show you how to do a daisy change module. All right. So I'm going to share my screen. Okay. You are able to see my desktop, right? My VS code. Great. Okay. So what we have, what Adam has shown us is an example of what we can call as a simple module that is a process on its own. It doesn't require any other processes for its execution. Callisto has another tool within its suit. It's called Callisto Quant. And this is the pie test workflow of Callisto Quant. You can see it actually needs two processes, the Callisto index for which we just created the test. And then the Callisto Quant process itself. So if you look at this test file, it has basically actually two tests, one for single end and one for paired end. And if you see any one of the workflow itself within the workflow first, it's running the Callisto Quant process. And then the Callisto, sorry, first running the Callisto index process, and then running the Callisto Quant process, which basically takes one of the inputs from the Callisto index process. So this is in pie test. So how do we do this in NF test? NF test provides a setup method. So I'll just quickly get started with porting it over. So first I'm going to generate the NF test file for Callisto Quant using the generate process command from NF test. So this kind of like gives you a boiler parade test module. I have already created my test folder within the Callisto Quant directory. So I'm just going to first move my test file into the test folder. And once you have that there, so we'll start with the changes right away. First is replace the absolute paths with relative paths. We're going to add the modules, the tags. So as Adam was telling about how we're going to add different tags. So the tag modules and tag modules NF core, we are adding them by default to all NF core modules, then come in the name of the suit of tools. And then we can add a specific tag for this particular process itself, which is Callisto Quant. All right. So I'm done with my adding of tags, removing the params block. We don't need, because we don't need to set any params here. And just cleaning this up. First, let's create a test for a single end. So I'm going to change, you can add a name to a test. So you can provide this as a name for the test. So this process for this test, we will just call it single end. And now going back to the pie test, you'll see that the Callisto Quant process actually takes four inputs. So it takes input of the data itself, and then it takes index and two more additional inputs. So what we'll, and the way inputs are provided in NF test is through positional inputs. So what we'll do is we'll create additional input files and give them their positional names. Right. But this is for the Callisto Quant process. So the when block and then block. But remember, we first need to run Callisto index process before the Callisto Quant process. So the way that is done in NF test is by using the setup method where you can specify to run a particular process before the primary when block itself. So the syntax for this is within the setup block, you can specify which, what is the name of the process that needs to be done. So in this case, Callisto index, and you open this run block itself and now provide the path to the Callisto index process. So which will be one folder up. So index main dot NF. And within and beneath that, we provide the process block to run this particular Callisto index process. So as if you remember from Adam's demo, Callisto index just takes one input. So it's pretty straightforward. And if you see it's taking it in this particular PyTest workflow, you'll see this is the faster file of SARS-CoV-2 genome that is being indexed. So you will need to provide that in the input. So we'll quickly, I'm just going to get this from my previous one. Right. All right. So what we have here is basically the setup method that is running the Callisto index process. This is the path to the index process. And within the process block, you're providing the input that is needed to run the Callisto index process. Once we have that, now going back to the PyTest workflow. So looking the first input for Callisto quant is the input block here. So I'm just going to copy this and provide that as the first positional input. The second input is Callisto index out. And it's basically taking the index of the Callisto index process. So to provide that here. But if you quickly check, there's a mistake there. It's actually called index not IDX. Now the index process, the index output channel, if you look at the output, it's actually a tuple containing the meta and the index. But for providing the input to the quant process, we actually just need to provide only the index and not the meta. So we need to map this process again. So we'll provide index meta index and just map out, take out the meta and just have within only the index within the process. And these will be provided blank, as you can see, as compared to the PyTest workflow. So, okay, so we have all our inputs now. And as Adam was suggesting, we are wrapping all our assertions within the asset all block. So that would be asset. There we go. Right. So we're getting there almost. So first, we have our setup block here that runs the Callisto index process. And then in the primary when block, we have the inputs that are needed for the Callisto quant process. And then we finally have our assertions. But if you look at the Callisto quant process, it also has an extra config. So the Callisto index process did not need any additional config. But the Callisto quant process needs these external parameters to be set. So what we will do is we'll just copy this next one config and put it in the test folder right next to the main.nf.test. So I'll just paste it into this folder. And once I have it within the test folder, you can specify the config that needs to be applied at the top of the test file. So in this case, I'm just providing it nextflow.config. Right. So it's got its nextflow.config as well. So we are pretty much there. So if we run this process now, you can always refer to a test or you can run a test by its tag. So in this case, you can use the dash dash tag, Callisto quant, and run it using profile Docker. And you can see it prints out the test name, single end, and notice this hash that is right next to the test name. It's quite similar to nextflow where each test is run within its own directory. And this particular tests are run within a .nf test folder that is automatically created in the parent directory. And you can check this directory for the work and the output of a particular test. So in this case, you can see we created one snapshot called single end. So now let's quickly go over and verify the snapshot. So that is just basically run the test again. And this is going to fail because we have already tested this. And this is probably going to fail because the output has some inconsistent files that generate inconsistent snapshots. So what do we mean by that? To understand that, let's have a quick look at the Callisto quant process itself. So if you look at the Callisto quant process, you'll see that it is emitting five different channels. So here, as I said, my test has failed because the snapshots don't match. And if you see it is these particular nf test now gives you a very visual way of diffing the snapshot. So you can see exactly which this is the previous snapshot to the left and to the right is the one that's being generated right now. And you can see there are some files that don't have matching MD5s. And this is a very probable case that you'll probably see if a file contains either timestamps or paths running for that is particularly about that particular run. So it'll be generating a different snapshot. So in those cases, the snapshots won't match. So just looking at the process itself, you'll see that the quant process is emitting five different channels, abundance, HD5, run info, log and versions. So in these, the abundance HD5 and the run info particularly the files that don't generate a consistent snapshot because they change upon every run. So in these cases, it's a decision to be made of whether to what to include within the snapshot. In this case, because the abundance file is primarily the truth file, the main output of the quant process, we can check only for this process and include only that in the snapshot. So in most cases, that should work. In cases when you cannot generate a consistent snapshot of your main truth output file, there are other ways of checking for contents of the file, particular contents within a file and not the entire file so that you can generate a consistent snapshot. So what we're going to do now is change our process. We'll specify exactly instead of capturing all the output channels with process.out. We'll actually now specify the individual output channels to capture. So what we'll do is we'll just take, we'll create some more assertion lines and we'll specify only process.out.abundance and within this match.match is the method that is actually comparing a previous snapshot to the current snapshot. And just like we can have named keywords for emit, you can also provide named identifiers for items within your snapshot. So you can name that abundance because we know that the abundance HD5 and running for do not generate consistent snapshots will not include those in the snapshot. Next, we'll include the log, give that a name and then the versions, process.out.versions and match that to the versions. Right. So now because the snapshot has already been generated, we'll use the update snapshot command to re-record any new snapshots that are, that have changed within the test file. So you can see that it gives you a warning that every snapshot that fails during this test is re-recorded. But I also want to point out another thing is when you update the snapshot, the previous entries within the snapshot are not deleted. So this is what you can see is shown in the summary here where it says it has created three new entries within the snapshot, abundance, log and versions for what we have specified as the named identifiers. But there's one obsolete as well because it's no longer present within the snapshot now. So NFTs provides another command to clean any obsolete snapshots. So the command here is called clean snapshot. And what this does is it'll remove any obsolete entries within the snapshot that are no longer presented than the test file. So if you, let's take a look at the snapshot file itself. So here you can see entries from the previous and the current updated version of the test that we have. But now that we have run it with clean snapshot, it's now only has the entries for abundance, log and versions. So that's what you see within the snapshot. Great. So now we have created a test for single end. So, but if you notice that the Callista Quant process also had the actual pie test module also has an additional test for paid-in data. So if we are to create another test, what you can do is you can have multiple tests within the same test file. So you can copy and paste this within the same test file and provide different inputs. But looking at the tests for single end and paid-in, both require the same setup of Callisto index. So when a setup, what we have done here is we have provided setup within that particular test scope. But if you wanted to provide a common setup, a global setup method for multiple tests within your file, you can move the setup block from within any particular test scope and put it above beyond the any of the test scopes. So that makes it global and it will make available the output of that particular process to any particular test, any number of tests that you might have within your file. So now that I have moved the setup block outside of the single end test scope, I'm going to copy the the test block for single end, change, call this paid-in and provide the input. Oh, actually, I think I've given paid-in data for the single end. So I'll actually change that here, my bad. So if it's single end, the input needs to be just one fast queue. There we go. So now I have the global setup method that will provide the Callisto index for all the different tests that are within the particular test suite. We have a test for single end providing the input here. And then we have a test for paid-in providing the actual inputs. But notice it might be just one test file, but the snapshot that is created, and there's only one snapshot for each test file. So if we have the same identifiers for your entries within the snapshot, they might collide. So what you need to do is provide some more identification or delineation between exactly the test entries within the snapshot. So for this is single end. So I'm going to change, add this identifier of single end to these identifiers for the snapshot and for the paid-in, we'll add paid-in identifiers. And there we go. All right. So that should be ready. But since I've changed the inputs, I'm going to update the snapshot. The single end has passed and it's running paid-in now. But because we have changed the names of the identifiers for within the snapshot, we now have some obsolete entries within the snapshot. So you can run the clean snapshot command to remove those obsolete entries within the snapshot and make sure you have only the entries that are in the current test file. So just as a recap, you can have a setup method that's global and have multiple tests within any single test file. If you're having multiple tests, make sure the snapshot identifiers are unique enough so that they don't clash within the snapshot file. And since we have, and now I have run these both for collector quant, one advantage of providing these multiple tags, now I can run all the tests for Calisto including index by just providing the tag Calisto. So when I do that, it's going to find all the test files that contain the tag Calisto. And in this case, first you'll see it'll find the Calisto index process. Hopefully there we go. And it's running first the Calisto index process. And then it'll find the entries for Calisto quant process and verify, run the test and verify if the snapshots match. So in this case, you can see it's first running single and again. And voila. So we have now with one command, all the tests for Calisto including index and quant as a final step, add the tags that Adam just mentioned about. These offer the GitHub continuous integration to pick these tests up. And once you have, you can just comment your changes. So it will basically be the test folder containing everything that you have added. So finally, within my test folder, I have the main.nf.test. I have the snapshot for the test. I have a next-float config because it's required only for this quant process and then I have the tags talking about it. So that's pretty much it. So that was an example of running simple modules, creating, converting PyTest modules for a simple module and also a chain module using the setup method. So we have started this migration during the last hackathon. So we have some more examples with nf.test such as Aplicate that has examples of both simple and unchained, FastQC and Highset2. These are just examples. There are a lot more modules have been updated with nf.test. So if you go into any modules repo and if there's a test folder, that means it has nf.test within it. If it doesn't, please consider creating a test for those. So with this, we actually would like to invite everyone who has previously contributed a DSL2 module to nf.core modules. Currently check the module repo that you have submitted. If it already has nf.test, verify if the assertions there are actually verifying the truth of the module itself. If not, consider creating one. And for this, we have added recently a step by step of creating all the steps that Adam and I have just shown you for simple, unchained and chained modules. This is currently on the nf.core.downs, two modules page. So you can find step by step instructions there. So with that, thank you so much. Reaches out on nf.core, nf.test channel. We'll be back with another bite size with nf.test for workflows and pipeline level. Thank you so much. Thank you very much. This was very informative and I'm sure it will have a lot of people Do we have any questions from the audience? I've dropped the pull request in the chat. That is actually the code we were doing. So you should, all those changes today, you can go and look at them if you want. Awesome. It seems that you answered all the questions. Actually, my question, you answered at the end. Sorry. I've dropped the documentation link there as well so everyone can answer it. Perfect. Then I would like to thank you both again. I would like to thank the audience for listening in and the John Zuckerberg initiative for funding our bite size talks. And I hope to see you all next week. Bye bye. Bye.