 We will continue now with a summary of the hackathon progress. So we will get started with the hackathon team new pipelines. So if somebody from the team could make a short summary of what has been the progress for the last couple of days and that, oh, I will, of course, I will need to make you give your rights to unmute yourselves. You can also share your video. I can do the, I cannot let you on the progress of SimSeeker. So since yesterday, we've been working on a few things. First thing, me personally, I was working on writing a next module for, for violating inputs files that they have a right structure and I almost finished with it. We all been working on designing the tool and some details regarding, regarding how the tool, how the next pipeline should work. We are still working on making one of the aligners work, namely parasail. There are some weird errors we are getting. So if any has any experience with parasail, that would be great to join. So we are working, we changed our next floor repository structure to DSL2. I know that coming from our team, he asked me a question on Slack regarding the DSL2 and I know that it's not a recommended approach for us. Nevertheless, I think it's not much easier for us to actually start using DSL2. So we are just trying to go for it. And also, we are working on the parameters validation because there will be a few parameters that we need to make sure that user correctly specify. And also, we are working on the last number, which is Pavel, he works on making test data. So the CI can work with profile test with our pipeline. Perfect. Thanks. Great progress. Yes, using the DSL2 template as we've discussed before, it comes with some disadvantages. Of course, it's still under a lot of development, so it could change any time. But yeah, if you would like to get started with it, of course, it's also still possible. Regarding new pipelines, if somebody else wants to shortly brief on their developments of new pipelines. But I think that was mostly, that was one of the new pipelines being developed. We can switch now to the Hackathon team existing pipelines, and I think James will brief us shortly here. Yeah, so in existing pipelines, there are five that are being currently worked on, which is SARIC, RNA fusion, EGAR, MAG, and proteomics. So for EGAR, which one I'm working on, with Alex, Maxine, Borey, and Ceres, we've fixed a bunch of structural things in ready for the next release. So we've added a bunch of new figures for the output documentation, and I also wrote a tutorial on how you can use profiles that are stored in config files for reproducible science, which is currently actually in the EGAR repository, but Alex and Phil think it might be worth moving to NFCore, so if someone else is interested in how you can use that to ship a settings file with your publication, that then means other people can use the same settings that you ran for your pipeline on there, DQ cluster, or whatever, you can ask me and I can show that to you, because I'd like some feedback on that. Things which are still running for that is I just started benchmarking for a natural file publication, the tool, because we're quite close to finishing, but something which maybe if anyone has an experience with is with AWS, when we try to run EGAR with a benchmarking profile, we have lots of weird problems with files not being found, and or the path in the bucket not being not working or something, so if anyone has any experience in that, please let us know that would be really helpful because it's been driving Alex crazy for the last like three weeks or so. Then for SARIC, they have fixed a bunch of issues, they have also added a couple new parameters, mainly the bulk of the work has been running, has been starting to convert SARIC to DSL2, and they're trying to work out how to make sub workflows to build the indices, and this is with Maxime, Paul, Sylvester, and Friderica, I'm sorry if I put your pronunciation of those names, and the other update is that they're actually running real samples and it's not crashing on the latest dev branch, that's also good news. For RNA fusion, which is with Maxime Martin, they're just aiming for release, there's no updates there at the moment. For MAG, they fixed a few issues and improved some of their error handlings, worked primarily on documentation, and they also have a couple of outstanding issues, which is there's a non reproducible, but it does seem to happen every now and then issue with input files being missing, so this is actually being discussed on the existing pipelines Slack channel if anyone has any idea, and also they have an error on test data with Busco when using a scratch parameter, which doesn't seem to be consistent with question and clusters, but that's something else if someone has experienced a Busco or with scratches, that's useful. And finally, for Diaproteomics, the main plan or the main progress has been converting the current code to match the NFCOR template for initial release, and that is everything so far. All right, perfect, a lot of developments in different fronts, different pipelines, and yeah, exactly. Let's stay in contact in the channel in case somebody has ideas of the current, how to solve the current issues. Okay, so we can move on to the progress with the DSL tool and modules, Hackathon team. I think Harshal is going to brief us on this one. Hello. Yeah, so it's been interesting, more battling, I think. We have come up now, I think, with a relatively good prototype to, as for a definition of a module file. So Gregor has been doing some great work on tracking discussions and stuff, which has been quite tricky because lots of caveats have been coming up in terms of how we standardize things. So for example, how do we standardize publishing results and directories? Not everyone will want all of the files generated by tools, and so how do we standardize that? How do we standardize naming the output directories? That kind of stuff. So I think we've made a little bit of progress there, which is quite nice. We are now almost ready to merge an example module for FastQC, which will be quite helpful for everyone else now adding modules, at least, to figure out what sort of format we're going for and style and stuff. Phil has also updated tools now to install modules in the directory structure that we discussed yesterday. So it would be modules and of course software directory, and that basically allows for us to extend or add other directories to NFCore modules in the future and just make things easier to install that way, and so it keeps things a bit flexible. I personally have been sort of dealing with more of a local implementation, and I've sort of only today it's twigged how powerful sub workflows are going to be for this kind of thing because just stripping away and essentially compartmentalizing different aspects of your pipeline into sub workflows will just make them so much easier to share. So for example, this morning I've created a QC and trim sub workflow, which runs FastQC and Trim Galore. And obviously you can also provide skip parameters for this, which is quite nice, and so you don't have to do much. All the logic is buried away in that, which tidies things up as well. There's also a map reads one which runs BWAMM and then sorts and indexes the BAM file as a separate workflow. So these kinds of things, these smaller components you can imagine will be quite easy to share. Once we've got NFCore modules up and running nicely, it should just be plug and play, hopefully. Yeah, and Jose's been working on adding bed tools and stuff, but he's franked from being frantic because we keep on changing everything and he has to update his PRs. So he's been quite patient with that. And yeah, so there's just lots of people get involved and then obviously you guys are doing the SARIC stuff, so maybe you want to talk about that, Chisela? Yes, sure. I have a question though first with the modules. Now you've talked about the sub workflows. Did you decide or was there an agreement if adding the sub workflows inside of the main.nf or in a separate file? That's a great question. So I've been thinking about this. Having looked at various pipelines that are already implemented in DSL2, I find it incredibly difficult to figure out what's going on, but if you have a very simple main script and all of your other code is buried away somewhere else in your pipeline repository, you have to then look at 15 scripts to work out exactly what is included and what you're running. And I guess that's one of the biggest downsides of this whole modular structure is that if you really want to get to the crux of what someone is doing, you need to really dig in and look at all of these include comments and just have a fish around. So yeah, so my plan was to actually have a module sub workflow directory in the pipeline and that would be where you put all of the sub workflows. I'm still a bit torn as to whether to split those sub workflows up into separate files or to keep them in one sort of main script because then, like I said, it would just be easier to trace everything. But yeah, so probably out of the main script, but in terms of splitting up the sub workflows into their own files, I'm not entirely sure yet. Right, yeah. Yeah, we can definitely still discuss and see what is the best option there. But yeah, we were facing exactly this when trying to also have DSL2 implementation for Sarek yesterday with Rika and Maxim. We were also discussing we thought one of the, at least, building the indices, it would be nice to have a sub workflow for building the indices. And then we were thinking, okay, would it be better to include it into the main.nf or in a separate file? And probably it helps readability for the main.nf to have it in a separate file, yeah. Yeah, so it helps with readability. It doesn't help with traceability, but I guess the decision you have to make there is what's the minimal sort of sub workflow that I could share with someone that offers a specific functionality type thing. So, for example, mapping reads, which will be common to so many different pipelines, you're running BWMM and then typically you have to sort the main file and index it. And so then that would warrant being its own sub workflow as opposed to tagging on other things within that sub workflow, for example. So I guess you need to, yeah, you need to look at it from a reusability perspective as well in terms of, you know, making these sub workflows as atomic as possible in a way where you can share them and people can reuse them. So a question from Phil, in sub workflows, is it worth it if it's only a couple of processes, I guess, to have a sub workflow? Yeah, that's a good question. It comes down to a, that will ultimately come down to a personal decision as to who's written the sub workflow, but I would say possibly not. So with the example that I've given you, the Mac reads one, you run BWMM, and then you have to run Santel sort, and then you have to run Santel's index, and then you have to run Santel's flag stat, and then IDX stats and stats and all of those other different processes. And so in that instance, I think it's worth it because it buries away all of that code. And also you can imagine people sorting BAM files a lot in their workflows. So they can use that very same sub workflow in different places. However, having said that, the trim reads sub workflow that I've written is just fast QC and trim galore. But the beauty about that I found now is that when you specify the take directive in the sub workflow, you can also specify additional values for, say for example, skip fast QC and skip trimming. And so all of that logic is handled within the sub workflow there, which is quite nice and it's really tidy. So if you have, you know, if for example, you, when you run it, this will all become a bit more clearer when I have a proper implementation together. But when you, when you actually execute the sub workflow within the main script, you have your workflow name, the input channel, which would be your reads, and then you'd have, for example, params.skip fast Q and params.skip trimming. And so that way you can provide the parameter files to that and it buries away all of that logic in terms of creating channels and stuff. If that answers the question. Yes, I think it does. Okay. Actually, one question that I also had there is how does the conditional execution of modules there work? And also, I mean, I guess that works pretty similarly as the processes but I was wondering even if it's possible to have conditional execution of sub workflows themselves. Yes, I don't see why not. You would just be using if statements I guess in the main script for that. Okay. Yeah. All of the process type skipping would happen within the sub module itself by the provision of the appropriate Boolean parameters to that sub workflow. So there's different layers of complexity here because then you also have, you could have skipping logic within a process itself because say, for example, you have a merge. This is something I stumbled on yesterday. Say you have Picard merge SAM files as a module file, right? Now, as input, that would take a list of BAM files, for example. But say, for example, you have an instance where you haven't sequenced the same sample multiple times. You've only got one BAM file. And so running Picard merge SAM files doesn't make any sense anymore because you've got one BAM file already. So then you need conditional logic there to deal with that as well. And so in the end, I just decided to use a soft link and evaluating the size of the list. But yeah, there's different levels of where you would evaluate all of this. And yeah. So anything that's process skipping, I would say in the sub workflow is that if that's where you're implementing it. Anything that sub workflow skipping would be in the main script and anything that's skipping something based on the logic in the process would be defined there. Perfect. Yeah. We'll try it out then, see how it works. And otherwise we'll use your examples. Yeah. Good. So we have still the last hackathon team. And this is the NFCoreTools hackathon project. Somebody could briefly say what we're doing. So from Phil's side, he was working on NFCore Sync to get some more tests in. So we have more tests and some unrefractoring there. Stephen wrote the function so that the NFCoreTools now checks for the which version, oh, if a new version is available, basically. And I was still working on the automated documentation rendering where I have it now in HTML. So now I have to do some JavaScript magic on it. And just something because I'm looking forward to it, just so that Phil wrote the draft for the first newly implemented command line renderer, which will make, it's now for LinkedIn, but we'll see where else it will go, which has really, really nice progress bars. So if you're excited about that, look forward for that. Nice. Thank you for the briefing, Matias. Okay. And unless somebody else has some remarks to make right now, then we can close this session and we'll meet you tomorrow for the talk. Then let's hack again.