 Welcome everyone. I'm Francesca Poonat, I'm the host today and here is Maxime and he is going to talk about coding styles with ESL2. And off to you. Thank you, then I'll share my screen. So hello everyone, Maxime here. So today I'm doing another style, another talk about like ESL2. This time I will try to focus a bit more about like some coding style recommendation. I will not talk much about syntax, but more about like organization of the code. So yes, quick overview. So first what has changed with ESL2. You already know the answer to that like since it's my second topic, what are modules and so then last what I think we should do with that. So, to begin like with everything as usual this is a disclaimer like this are my own recommendation. I think some other people like are agreeing with me on some of these views, but other developers might have like other views on that. And we're still trying like to forge the best practice we're still trying like to get to figure out how the, like what is easier to read what is easier to understand. So, it's might and it probably will still evolve and we're getting there and I might probably change you on some of these topics. But at the moment, this is what I think we should do. So what has changed with ESL2. So if we follow the Paolo announcement from like two years ago on the next log. I think like I link like the whole the whole blog post, but for me the most important phrase is this one like a module file is nothing more than an extra script containing one or more process definitions that can be imported from another next post script. So, this is what has changed with ESL2 module. So, and as you guessed, I will be talking about modules a lot in this talk. But what actually does that mean what is a module so of course obviously a module is a module. So this is the definition that that we just like so a process can be a module as well. A sub workflow can be a module and the workflow can be a module and they all can be interlinked together, which can be a bit confusing but actually it's pretty clear, but to get like even clear we agreed at NFCore to have like some proper definition. So for us at NFCore, the module is a single atomic process that can be called into like other other script. The sub workflow will be like a few chain modules. And then a workflow is an end when pipeline. So with that I think it's fairly simple and with this definition we can decide how we want to organize the code and how we want to do things properly. So I will explain what I think should be done. So for me, I think the code should be easy to read so that's why it's easy to understand it's easy to share easy to modify and easy to contribute because that's what we want we are all working in open source science and yes I think that's what we want that's what this community is all about is about like sharing or code sharing or work and working together to achieve like these goals. So for me that's what is the most important part. So we are going like for some examples so I followed like the same bit of code like from the module level to the sub workflow level to the workflow level just to explain for me how each all should work. So basically, first statement as we said like all the call for the process is in modules that's the NFCore repository. So here we can have a local module within the pipeline or NFCore modules within the NFCore repository. So in this case I'm going to showcase the ensemble VIP. I'm going to showcase the ensemble VIP module. So the code is fairly simple for like a module. We have like as usual, the tech definition the label that specify like the resources that we can decide on afterwards. Then we have the virtual environment or the containers that are specified for that. So here we have the input. So as usual for the input, first we have like the actual file that are like depending on the sample that we want to analyze. And then we have the reference file or the reference value that are needed for for this, the mandatory one and then we have all of the optional file. Specify some output. So as usual we have the version which is which is what we want in order for NFCore module, because that way we want to be sure that we can have the possibilities that we want. And then in this case we have some optional output. Otherwise you can have some regular like output indicator. We are doing also when statement in the NFCore module. And then we have the proper part of the script which actually just called the tool. And then of course some extra specification to specify some extra arguments or other part from the from the code. And then at the end, we just specify the version. This is just the part of the code which is like modular because that's what we wanted to do in NFCore. That way we can share the code with everyone. Then as a companion to that we have some modular setup in the config file. So for that particular matter closures are definitely our best friend. This is what I said like in my last talk and this view hasn't changed into that. So usually you can really like dynamically specify like what you want inside. So I think of course we decided to use custom name space like extra directive that allow us to have like some specific name space that we are using so we are using them for the arguments. And then as arcs to arcs three, we are using them for the prefix, we can even be crazy and use that for when. And then we can also use closure and use other directive at the process level, such as published here which is fairly common to use, I guess. And if you're feeling very, very crazy you can even go and change the container. And this is for example what we are doing at the moment in in Sarek, still for this module. So here within Sarek at the moment I'm, I'm having a condition to specify if I want this, this selector to be available or not, because otherwise we have some warnings that this does not exist and I just don't like to have like a lot of empty warning somewhere. So this is optional and I'm hoping we will get rid of that at some point. But here this is just starting here. So we have a prefix for this for this specific like tools that we want to use. So this is what will happen for this prefix. Here we have some arguments. So here in this case, it's a bit complicated, but basically what we do we have like some basic arguments that will be used in all case. And then we have some specific arguments that will be dependent on the input parameters that we specify on the command line. So in the end it's actually fairly simple. We just joined all of the argument together. So this is a list. We joined all that and then we train to get just one single string in the end that we can directly put into the into the process with the directive. Then, because in Sarac we like to do things like differently, we specify a specific container in that case. And then we are using our publicity to specify where we want to save our file. So depending on the extension of the file, we have, we might have some specific location for that. And that's all. So in the end, it might look complicated, but it's actually fairly simple. And then at the sub-workflow level, so I think we can actually have like several layers of sub-workflow. So for me, for like the first layer of sub-workflow, I try to keep it like as simple as possible. So what I want to do in this level, I just want to chain module together and just do some tiny channel manipulation if I have to. For example, you need to remap the output from one module to go into another module. Then yes, you can do that at this level. It will, for example, arrive to that. So this is the sub-workflow that I'm using, that we are using in Sarac, to actually like call the module ensemble bit, and then call the TabX module to TabX index, the VCFI. So here, this is what we are doing. As usual, we begin the workflow by taking the input data that is related to the sample, and then all of the reference genome and optional values that we need to share. Then here, fairly simple, still I call the first module, I call the second module on the output from the first one, and I emit everything out. And of course, we gather all of the version of all the tools we use. So that's still in this case. If I want to go still, we are at sub-workflow, but then it's a fairly higher level because the sub-workflow we can still include other sub-workflow in a sub-workflow. So what we can do there is that we can chain module together, or we can even like chain sub-workflows together, or sub-workflows and module, and stuff like that. We can also manipulate channels, and we can do here also what I think is good to do here, and not at any other, and not at the previous level, is to specify some execution logic with some if block. So it will look like that. So here in this case, I'm calling three different sub-workflow, actually I'm just going like two different sub-workflow. So I'm calling the ensemble of it that I just showed, the snip effect, snipf sub-workflow, and I'm calling the ensemble of it like twice, because one time I want to use it as it is, and one time I want to use it on the output of the other sub-workflow. So I need to do an alias to actually be able to use it twice in the same sub-workflow. So as usual, I'm using the, it takes as an input the files that are related to the sample, and then as usual, the reference data and the other value. As I explained earlier, I'm having this if statement to control the execution of the sub-workflow, and then I collect all the files that need to be collected. Same thing for that, same thing for that, and then we emit everything back. So if I think that if we follow this logic, I think we can get like some fairly easy to read and easy to understand organization of the module sub-workflow, sub-workflow, and that way it's easy to understand where to contribute, what to do, how to change and how to evolve stuff. And then at the workflow level, it's where we can do, where I think we should do like everything else. So we can still at the workflow level call a single module. For example, you might want to call just the multi-QC module here at the workflow level. Of course, at the workflow level, we want to change several sub-workflow, because if we don't do that here, where are we going to do that? And then of course, you still want to do some channel manipulation, because yes, that's your main workflow script, so that's where you want to do all of the magic. And of course, the execution logic still happens over there. So this is what happened within Sarek at the moment. So still here I have my execution logic, if like my parameters, my input parameters are right, then I'm going to do that. I just call my sub-workflow within the main workflow, and I gather all of the use of our version and the report that I need to have. And then that's all. So for me, this is how we should really organize our code for the module, depending on if we are at the module level, at the sub-workflow level or at the workflow level. I also have like some small syntax like recommendation. We are trying to actually make proper recommendation guidelines on the DSL2 syntax. So we are working on the document with several other people from NFCore. So if you want to contribute like the link is in the title here. And what I would like to say is first like indentation is your friend. I think that's like fairly good statement. And I think like a lot of people that are coming into bioinformatics will learn to converse Python. So I think indentation is already like deeply ingrained into your habit. So let's continue working on that and let's keep indenting. Make the code like nice to look at. And I like to have like a nice code to look at. And then I think, yes, in a process, I saw like several different ways to collect like several files just to call it like with the same parameters. And I think this is like a proper way to do so. I try like to enforce that in all of the JTK4 modules. But that might have escaped to me like in some point. So this is like, I think fairly small and it's a small one-liner that is easy to understand. Then I think for the channel assignments, that's something that you want to do in a sub-workflow or in a workflow. I personally prefer like to assign to specify which channel I want to assign things to first. But some other people might prefer to have like the channel in the end. I personally prefer the first version. I see that some people coming from an hard world prefer the second one. I think we really need like to figure out which community what we want to do, but I think I'm fairly like going towards like the first version of this line. Some last I think tips like before I open the discussion. So please add comments to your code. It's good for you. It's good for anyone who are going to have a look at the code. It's also good for future you because yes, believe me, like two weeks from now or like three months from now you're going to look again at your code and you're not going to remember why you did that. So you want some comments and right now you're your own best friend. You should be able to help yourself and do that. Also, I think it might be important. I notice this like sometimes also it might be difficult to differentiate between like Q and value channels. And I think it could be like a good idea to help to have that difference like in mind from the beginning when you design your pipeline. Usually we do use value channels for reference for all the reference files. And sometimes sometimes you don't understand why your process is not executed several times it because this channel that you believe was a value channel is actually your channel. So we need to be careful with that. Otherwise, I think like a good tips would be like don't hesitate to ask question. We have slack we are available on that and I personally like like to discuss with other people and stuff so discuss that's good and yes if you're on site with other people as a figure that's always coffee boy that's always good with to start discussion with anyone else. So, yes, as usual, I'd like to thank like all of the institution I work for and all of the institution like we work like I work for and with on on my project. I'd like to thank all of the institutions that are working with us on NFCOR and I would like to thank all of the contributors that we had so far as well. If you need some help, of course, it's like as usual or everything else. If you need more help we have some documentation and we have some previous bite size that you can check up or that otherwise and open for question. Yeah, thank you very much. I don't seem to show. Anyway, I have now opened the possibility for everyone to unmute themselves so if you have any questions. Otherwise we have also questions in the chat. Okay, let's begin with a question in the chat. Do you have a question. Oh yeah. Go ahead. Okay, so I guess this is my first question, which is, I'm wondering which specific slack channel discusses Eric in the NFCOR workspace. I haven't found any, if there's any specific one for Sarah or it's just one of the existing channel. We have a specific like channel to discuss to discuss about Sarah. Okay. We have a specific like channel for each pipeline and we do have like specific channel for every for every main topics. Otherwise, whenever you don't know anything, don't stay to go to the help channel and then we will direct you toward like the right channel. Okay, so he's like hash, Sarah or something like that, right. Yes. Oh, I see. Oh, I found you. Thank you. So my other one is more of a comment as a follow up to what somebody said about the last comment about the user of a pipeline. DSO one or two are not really important. I have as a developer as a developer it matters and yes DSO one to two conversion is a lot of work. So I tend to disagree with the first part of that which is, I'll say user DSO one or two are not really important it is important, because people are forced to actually go back to their code for those that actually did DSO one without specifying which how their coach to run seems next to pretty much just force people by default into DSO two and the code breaks. People are now forced to receive their work. So if it wasn't that important even for a user, then that would not be necessary. So I would tend to disagree and say it is in fact important to have pipelines in DSO to whether it works in DSO one or not is relevant at this point, because unless if you know you were thinking about the fact that DSO to be default when you were coding and then you had made it very clear in the each of your modules or each of your scripts that it was DSO one enable and stuff like that right so I just wanted to point that out. But thanks that was a great talk. Thank you. Yes, also question for Matias. Okay, let's go. Yes. Thank you for your great talk. And could you please elaborate a bit more on the extra arguments because I noticed for example that you wrapped them in brackets in your module example. Let me go back to that. I haven't seen that so far. That was that or not at all. So we use them in the module. No, it was was part of the code where where you had. So not this one is all right. I'm sorry I'm going I'm trying to find it back. No, not this one. It must be the conflict where you provide the extra arguments. Are you saying like the for the published year singing. No, exactly here, exactly here the palm stop that that loft tree or something you you frapped them here in brackets. Oh yes. So here what we're doing so we have everything like all of these XR is in brackets so that's a wall list. Yeah. Okay, that's the list but that's the list and and then you are not practice braces. No, is it fine. I always like confuse the name of this like even in French. So yes, it's a list and for each element of this list, I specify like either like directly the params, or directly here I have tertiary like if, if, which is a bit long, sorry, which is misspecify. If sorry, I'm going back there. So if this statement is true, then I have like this first string that will be considered, otherwise it will be like the second string which is an empty string. Okay, yeah, that's an end basically just have the braces because for consistency so if you don't evaluate the end then it doesn't matter whether it's in places or not. Or is that. I don't know I think the bracket like the bracket are important, because otherwise if we don't have that then it's not the least and I want, basically what I want is that I have several arguments. I could make like one complicated if like getting like all of the different like stuff but what I want is that I want one string on which I can happen like other argument on top of that. This is this is completely clear. I am just wondering about the mutations because sometimes it's it's like you you've frapped curly braces still around. This is when you have a metamap in there. Okay, you mean like that year. No, no, they probably we should just take this. There should be like some easy stuff for that. Sorry about when to use a closure. So that's, for example, used in EXT prefix. It, when you use a closure the variables are evaluated during the task, task execution. But if you don't so for example the EXT args. This one doesn't have the curly braces so this is not within a closure, and you can't use any variables from the task execution context. And it's evaluated as soon as the the config is loaded. So right at the beginning of the workflow before any of the tasks are executed, but if you use a closure, then one you can access variables within the task context anything from input. But also, it means that kind of things like params out there and whatever their their their evaluation is delayed until the execution of the task. Thank you. Yeah. So I know you understand what was the question sorry I misunderstood you. I didn't make myself clear, but it's still a big mystery in a bit to me in this regard here the details. Yes, then don't worry we can have like a more detailed talk about that another day. Then there's a question from Phil. Hi everyone thanks for the talk Maxime. No I just wanted to reiterate something and came up in the comments on zoom just now mostly in case anyone's watching this on YouTube at a later date. It was a really good comments about DSL one and two and how running a DSL one pipeline with newer versions of next level crash in a slightly nasty way. I want to note that that's actually only a fairly specific version of next low where that happens. So parallel switched it to DSL to by default we saw these crashes happening. So we spoke to spoke to him and now the newer versions of next low since version 22.04.3 should automatically detect whether the workflow is DSL one or DSL two so you can go back to just running it without any flags and it should just work whether it's DSL one or whether it's DSL to No nasty crashes anymore. But will we will we keep like DSL one inside the main next floor at some point will we completely discontinue it. At some point it will be completely discontinued so in the future I think it's some time it's mentioned in the next day blog. About that but I don't remember the details exactly. I think it's planned for kind of mid 2023. The released versions of next floor will stop supporting DSL one and from that point any DSL one pipelines will have to be run with older versions of next way. Yes, but then it's different because we can still install all the direction. Yeah, so the work flows will still run just not with the latest version of next way. And yeah if you're interested in converting workflows into DSL two, then we have a slack channel and of course dedicated to this topic called DSL to hyphen transition. And the same goes also for sub workflows. I'm sorry. Thank you Phil. There's another question from light on. Okay, good. I'm going to make it brief. So my actual question is you're sorry again, which is that if I know so you guys basically need to for the reference genome it has to be one of the existing ones maybe GRCH 3837 and the likes right. See if I have a scientist right who has is who has her own reference genome, basically right which is not one of the standard types how do I undo the situation. Then it's really simple we have these parameters that you can choose within Sarek, but can we talk about that more in the in the Yeah, for sure, for sure. This is going to be a long talk for sure. Thanks. No, just like to ping me on slack and I will Perfect. Thank you. Thank you. Are there any more questions from the audience. It's a question. Phil also posted a link. To Next floor. It's the next floor blog posts talking about the end of DSL one support. I'll put it in slack as well. Thank you. Also earlier Mahesh posted a link to carpentries training material for coding practices. I will also leave this link in this lecture. Maybe I can include that in my slides as well. Yes. Okay, if there are no more questions from the audience. Thank you so much Maxine for the talk. And I also would like to thank the Chen and Zuckerberg initiative for funding these talks. And as always, you can continue these folks in In slack on those hashtag bite size or specific to any of the channels for different workflows. And thank you very much.