 Hello, everyone, and welcome to today's Byte Cestok. I'm happy to introduce to you Julia and Nicolas, and they're going to talk about NF Validation. So I'm handing over now to you, Julia. Thank you. So hello, everyone. We're going to explain this new plugin that we implemented in next load. It's called NF Validation. And we use it for pipeline parameter validation. And for this, we use JSON schema. So first of all, before starting, why is it important to validate parameters? So you may know that next-flow pipelines can accept different parameters, either through command line or through all the config files. And this is not validated by next-flow. So if, for example, your pipeline expects a string and the user provides a number, all your pipeline will run until this value is used, and then it will fail. So that's why it's important to have some previous steps to validate parameters and avoid possible errors. This in NF Core was already implemented using this JSON schema. And actually, all the NF Core pipelines have these validation steps in the template. Because if, as a pipeline developer, you would have to validate these things manually, it would be a huge chunk of code. And yeah, and so we use JSON schema, as I said. And this JSON schema looks something like that. And here you can have all your, like you describe all the parameters of your pipeline. It has some formatting. And then under these definitions, we have groups because you can organize your parameters by, for example, input parameters in a list, like organize them in different groups. And then inside properties you have, for example, in this case, foo, it should be string or bar, which should be string. This is, like this file can get very long. So the advice is to never edit it by hand. So in NF Core, I think there's another bite size talk about that. But in NF Core, you have this common NF Core schema build, which will open a web tooling, which helps edit this JSON file, this JSON schema. And it's like a drag and drop. So it's very easy to edit it. And you don't need to be careful with the formatting and so on. And then another thing that is new from this plugin is that you can also, so this JSON schema can be used for different things. For example, we also use it in NF Core website. But it can also validate other kinds of files. For example, sample sheets, which usually they are used in pipelines to provide inputs. So it's usually CSV or DSV file, where you have your sample ID. And if you provide files, you can have each column providing one file and maybe some metadata from samples or things like that. So you can also have a sample sheet. Sorry, a JSON schema to validate this sample sheet. The format is more or less the same as the one that I already showed. And the structure, where every, so you have, it's tiny bit different, but you also have properties. And inside properties, you would have the name of each column in your CSV or DSV. And it can also validate YAML files. So in this case, you would have the name of every entry. And then you can also have type. And you can validate different things. Like for example, in the case of being string, you will validate that the provided value is a string. Or you can also provide a pattern if it has to end with dot faster or things like that. So we'll go now. This was a little bit fast. But so I think we have other websites about JSON schema, which are more in detail. But for time now, I'm going to talk about the NF validation, the plugin itself. So this plugin takes all the code that it was started in NFCore. So if you have checked the NFCore template at some point, this is how the pipeline template looks. And you have here a lead directory. And then here we store some group code. And for example, this file is the one that validates next row parameters. So this was taken from here from the NFCore template. And based on that, we started the development of this plugin. So how to use it is very easy. Like all next row pipelines work, you can add in your nextload.config these plugins. And then you add the name of the plugin you want to use and the latest version. With this, that's all what you need. Then this will be installed with your nextload. And then in this case, it contains different functions that can be imported in your main.NF or in your nextload script. And you only need to use include the name of the function that then you can use in your script and from plugin NFValuation. Then these functions we have here, we have different ones. And so I will quickly go through them as a kind of summary. We have Params help, which is used to bring the help message for a pipeline. So just to show. OK, so you would use. Now I'm using launch.sh because the last latest, latest version is not released. So that's running my local copy. But usually it would be nextload run. And then if you have this in your nextload.config, you don't need it anymore. The name of your pipeline. And then we can run help. This uses JSON schema that I talked about to print the help message of the pipeline. OK, doing something again. If it's not working. Yes, perfect. Yeah, so here you see the help message with the usual command and then the parameters. Those are the sections where they are organized. And then you see the name and some description. Also the type of value. Then we also have Param Summary log on Param Summary map. This to work very, very similar. And they are used to print. So usually if you want to run a pipeline in NFCore, at least we print a summary of the parameters that change from the default at the beginning of every run. In case a user needs to check what they provided. This is generated with this function, Param Summary log, which provides this list of parameters in text format. And Param Summary map works exactly the same. But instead of returning a text format, it returns a map. And then we also have validate parameters, which is maybe most important here, which is the one that does the actual validation of the parameters. So in your main NF, you can use the function validate parameters. And then if you use this function before starting the execution of the workflow, it will fail in case there's some error before starting all the execution. And for example, here it says the parameter that you provided called input. It's sample sheet text, and it doesn't match the pattern CSV, DSVOM. And also it's a file that doesn't exist. It's also validating that this file should exist. So I'm going to show as an example how this looks. So that's the current template without using the plugin that we have in NFCore. And as you see, we use this chunk of code, which is initializing and also validating all the parameters. And then here I have the same template but modified in order to use the plugin. So here I imported the functions. And instead of... So before I had this initialized, which was using all the code inside lib. In this case, this has been modified and we don't have any more in the NFCore schema.gruby. And I have the code to print a help message and here the function to validate parameters. So if I run this pipeline again, the test, for example, it should validate all the parameters. And now we will see first the summary of parameters that I mentioned before. OK, so as you see, because I didn't provide the outdoor parameters, which is required, I get this error before starting any execution. So now I provide this parameter. And yes, so that's the description of the parameters that are different from the default. For example, you can see which input file you provided. And then now the validation passed and our pipeline started. Good. So and then the last function that we have is from sample sheet, which is creating, is reading the sample sheet, the input sample sheet and creating a channel. And I will leave this for the end because Nicholas worked on that, so he will explain about this. Also this thing, so that's also a new thing that we have now with this plugin. You can have schemas inside schemas. What does this mean? Is that in your original Nexlo schema file, for every parameter which is a file, you can have this new key called schema. And this one references to a path of another JSON schema. In this case, it's a JSON schema which will validate the input sample sheet. And this will also, now you'll see it when Nicholas explains more in detail, but so it's automatically whenever it detects that there's this schema key in a parameter, it will try to read the file provided by this parameter and then validate it using this JSON schema. And I also have a different example here for Erwin Isik if we see the main code. And that's exactly the same that I showed before where I import the functions. Also bring the help message and validate parameters. And Erwin Isik was one of the first pipelines that got this input schema. It was just like a proof of concept. And this is pretty old, like it started implementing this sometime ago. And now we have it implemented with a plugin so all pipelines can use it. And so basically here you have the columns of your sample sheet. In this case, sample ID, FASTQ1, FASTQ2, and Strandness. So this will automatically validate the content of the input. And if I go... Okay, I was gonna try to run this pipeline but maybe I'm talking too much and it's a bit long. So that's it. That you know that you can now automatically validate. And that's what works for sample sheets but also for any other kind of CSV or YAML file that you would like to validate. It doesn't have to be the input specifically. It will validate any of these files. Yeah, so now I will hand over to Nicholas if he wants to show this from sample sheet. Yes, let me share my screen. Okay, so I'm gonna show you a real quick example of how to use the from sample sheet function. As you can see, I have a simple pipeline here which validates parameters and then converts the input parameter sample sheets to a channel. I'm going to run it real quick. As you can see, I also use the non-sochelle bash script because the version is not released yet. So if I'm running this, you can see I get some outputs which is the channel inputs. This output has been made from the sample sheet CSV. As you can see, it has the name, surname, the likes and pictures from certain persons. For example, the first line Harry Potter, the full path to a text file, which has his likes in it and a full path to a directory which has pictures in it, which correspond to his likes. I use this sample sheet to validate the schema to validate the sample sheets. As you can see, all properties are inside of an items section. And you can see the name, surname. I had an ID number which isn't in the sample sheet but you see if it isn't in the sample sheet that the automatic key go to no. Then it has the likes which has a format file path and it checks if the file exists and also pictures which is directory which also contains the key dependent required. So if likes is not given but pictures is given the sample sheet validation will fail. I'll show an example of this. So for example, if I remove Harry Potter's likes and rerun the codes, you'll see an error which says that the likes fields should be defined when pictures are specified. And then it also shows which fields are not defined because you can also add way more fields to it. And yeah, I think it's a very nice error message. You can also specify the unique which will can take a Boolean or a list. If it's Boolean it will only look at the field itself. So all names should be unique if it's true. If you give a list to it, for example, surname all fields should be unique together with the surname. So for example, I can't specify Harry Potter twice. So you can see this. Oh yeah, it also gives the error for the likes fields which is not specified but as you can see the combination of name with field surname needs to be unique. And you see which combination is the one that clashes with not being unique. Okay, so this is a real small example of how the from sample sheet works. One small thing to note is that the unique and dependent required field parameters actually only validates if you run the from sample sheet because these are specific for the sample sheet conversion won't be validated using validate parameters. All the other schema fields will be validated using validate parameters. One other nice thing with from sample sheet is that it will create meta fields which are immutable from start. So I have a bit of code here to show you this. So if I try to change the name of every character to Voldemort it will fail because it cannot change the value in a meta fields. Oh, of course I have to make sure my sample sheet passes first. So as you can see cannot put items into an immutable map. This can cause problems though and some pipelines which are already built around this concept. So you can disable it using the optional key immutable meta by defining false. The default of this is true as you can see. If I define this and run it again I will be able to change the name of every character to Voldemort or it should do that. Apparently it does not. I don't know why. You can also do it with a parameter. Let's see if that works. It does not work. Okay, so normally that should work. I think I made it type or somewhere or something. You can find this all in the documentation of the validation plugin. Okay, and you can also specify from default this from sample sheet function will go to the assets schema input to the JSON schema file to convert sample sheets to a channel. You can also specify which schema to use by using schema then path to schema. It's weird that it's not working. So I'll try it again. Yeah, okay. Weird. Yeah, that's it for the from sample sheets conversion. Any questions? Thank you very much. You go ahead. I just wanted to share the last slide. I'm so sorry. No, no, it's fine. And just to share the point, why am I sharing now? So just a quick thanks to Phil and Kevin, Kevin started this code in NFCore and also to everyone on NFCore who contributed to this either by testing or reviewing documentation especially. And yeah, just to share some important things that you may want to check. So the repo of NF validation is in next flow. And here you have these documentation that I have been also using. It's a very nice documentation and pretty extensive. And also we have this Slack channel which is shared in NFCore and next flow called NFValidation. And also last thing to mention is that this will be coming soon in NFCore template in the next release. The parameter validation and also optional, not mandatory, but optional, obtaining this simple channel from sample sheet. And I think that's everything. Thank you. Thank you again. There was someone who had a question, I think, who wanted to join, but you can now unmute yourself, also start the video if you have a question. Hi, thank you for this presentation. It was very nice. I was wondering, Julia, will you be adding a schema built command for the sample sheet too? A schema, what sort of, can you say it again? And NFCore schema built command for the sample sheet. Okay, yes, exactly. That's not existing right now. So now we have this tooling to create the next flow sample sheet for parameters, sorry, the next flow schema for parameters but not for sample sheets. And yes, there's the plan to add it and highly probable to move all this tooling out of NFCore and make it also as a standalone. But I guess I can't actually estimate the date because that's quite a bit of work. So we'll see. All right, thank you. Are there any more questions from the audience? It doesn't seem so. In that case, I would like to thank both of you and of course the audience for listening. And as usual, the John Zuckerberg Initiative for funding our Bitesite talks and just a very small thing. This will be the last Bitesite talk before our summer break. And we will let you know when we commence after summer. So thank you very much, everyone.