 Hello everyone and welcome to the first bite size of 2024. I hope all of you made it well into this new year. And as the first presenter this year, we have Maxime and he's going to talk about how to use Slurm as the next flow executor. So, we love to you. Thank you very much. So, hello everyone. It's nice to be like invited today at the NGI like to give this talk. So, I'm Maxime Garcia. I used to work here. So, I know the location pretty well. That was like kind of nice to be back. And I know work for Secura as like most of you know, I guess. And so today I'm going to talk about like how to use like Slurm within next row. So, sorry. Yes. So, yes, just a quick disclaimer at the beginning. It's my own limited usage. So, I did use Slurm before when I was like sitting here because that was like the executor that was used and the clusters that we were using. And I did use some other executor before. More or less, I guess changing from one executor to another. It's not that complicated. You have like different option, but next load take care of that for you. So, yes. As it says in the doc, like the executor provides an abstraction between the pipeline processes and the underlying execution system. So, this allows you to write the pipeline functional logic independently from the actual processing platform. So, yes. Basically, it means that that is what like make next flow portable. So, sorry, switch over. So, basically the executor component make like next flow portable and it's really allow you like to switch from one platform to another. And by default, the executor will be local. So, that's what we use when we run like most of our tests or when I like develop pipeline on my side of my computer. So, for example, here I will be, I will not be demoing anything today because I don't have access like to open my cluster anymore. So, I run that before with like fun assistance that was super nice offer to help me with that. And I like copy paste like everything that was there just change a few parts like to not show the secret underlying in between the file and stuff. So, but more or less, this is what happened. So, I will be running like a pipeline with flow so on hookmax. So, using the hookmax profile. So, the conflict file is linked like in the presentation. I will share the presentation like later. And basically the conflict is like pretty huge but these like few lines are the ones that are like interesting like for me today. So, first of all, we will be like setting up the singularity it's enabled. So, by default it's enabled. We don't have to specify singularity in our profile. And then for the process, we specify that we want the executor to be slow. And then we specify some cluster option which I will get back to later. So, let's go. I just run, I just connect to hookmax. I'm on the add node. And I just, I load like the correct like modules that I need to load like a hookmax to make it work. And then I just launch like a next flow run and of course, I'm an ASIC. That's all right. Like I don't do everything with Sarek. I use other pipeline from time to time. I specify the profile test because I just want like to test things out. I specify the profile hookmax because obviously I'm on hookmax and then I specify the out there to be results. And this unfortunately is failing. I have a first failure which is like interesting. So, usually what I do when I see such a failure, I look at like what is there? Like what are the warning, what are the error? And here I can see that it failed to submit process to grid scaler for execution. So something is wrong with stuff. Like right away, that's what I can see. I can see that the command executed was s batch dot command dot run. So that's the command that you were to execute if you want like to run a single process. And then I can see the actual error was like batch job submission fail, invalid account or account partition combination specified. So let's do a quick debug. So I'll have a look at my favorite like next low generated file which is like the dot command dot run. It's a huge file, it's super useful. And that's the first one that I look whenever I debug like a fail process. So the command dot run like is super huge. But when I run it with slurm as an executor, next low will add like a tiny bit of stuff like that is that help this file to be executed by s batch. And in our case, we can see that this is all of the new specific stuff like that are like specified for slurm by next low. And what interesting interest me here is that I can see that slurm as a minus a parameter option that is here specified as none. So if you remember my config file parameter, we had like a cluster option. So this is like an option for slurm. And we have like these parameters, param project. So basically what I can do, if I look back at the command one, I can see that here the minus a is none. So basically what I can do, I can like specify a project ID within like the pipeline and then it will be back in the config file back in the s batch like file with the proper minus a parameter. So if I do that, then it works. And if I go inspect like a dot command dot run, which you can do also like for just because it's an interesting file and then you can have a look and like this process has failed in my case because I did like some other crazy stuff obviously, but in this case, it would work. And here you will see like your real like secret ID and this cluster option will help you like specify any other particularities that you will need like for your own cluster. You can specify also the memory and the time and like any requirement that you need, CPU as well. So all of that that all of the specific that are really needed for slurm is populated by next law and you don't have to take care of that. And for me, that's the whole duty of next law. Next law, take care of all that for you and you don't have like to think about it. Yes, so basically for me, that was more or less all because I think that's more some things that you, if you have like an issue on slurm, I would recommend like really, oh, that's the wrong link to slack, sorry. I wanted like, I just copy paste that like from a previous byte size and that was the wrong link, so sorry about that. But just basically I will, if you have like a specific problem like with slurm or with any other executor, I will first suggest like to go to your local admin or IT, talk with them if there is something that they can help you about like they will help you with that, otherwise I will recommend like just gone slack. We have like no stupid question. We have like channels dedicated for that and definitely there are like a way you can deal with that. So no, I guess we can go with the question. So I'm like, oh, do you want to say that for me? It doesn't matter. So we already have a question, I guess. Is there any way to avoid adding to the dash dash mem option in the dot command dot run? Okay, in Microsoft, it's not a good practice to use is no regression date when you have the position on that that you ask for. I will say that, yes, I think, yes Phil answered. Sorry I didn't mean to jump in, Maxim. Oh, no, no, no it's like the more, the better. So generally we set it by default and of course pipelines are quite difficult to get rid of. Many, if not most clusters require you to specify man realcarbon, but in some cases it's the other way around any clusters administrators, they don't set it because it's set automatically by the number of CPUs and in a couple of rare cases, we've had it where classes will reject the job if any memory is specified. In those cases, it is possible to overwrite with a custom config to say, who basically set the memory process memory to null, and then it won't set the memory in the S batch command file at all. And there's an example in NFCore configs, I mean shared institutional configs for the heavy cluster in Gothenburg. I've said that I think that was the first one I did that for, so that's one I usually point to. I'll share the link in the Zoom chat. This is a question. Okay, second question, on Tower, so the platform, after a successful one, we have an optimization button, but what about on-prem slur? Feel, sorry again, can you admit that I don't remember what we, where we are on that sites? So the optimization works for runs which are done through Tower or through secure platform, as it's now called. And so you can run pipelines on Slurm through secure platform. So if you run by a secure platform, you should be able to optimize as far as I remember. I don't think it's cloud only at this point. I might be wrong about that, but obviously if you run next way by yourself, it's an extra tower secure platform feature. So it only works if you're using secure platform. But I guess if you use next secure platform, you can get the optimized input from there, right? Yes, that should work. But then you would just identify yourself. Exactly, at the moment, it gives you a config which you can copy and paste into a file. So you only need to run it once and then you can save that somewhere. Okay, people can also unmute themselves and ask the questions in person if they want to. But we do have an... Okay, we have like two more questions. So first it was like... Oh, Samuel, hello again. He mentioned that yes, a useful tip will go like to check about like debugging the run folder in general. And he shared a link like towards training.indexflow.io. And there is like a debugging page like in the basic training, which is like super useful. I really recommend like reading that like when you need like to debug. And yes, usually that's more or less like what I follow when I need to debug a pipeline. And Philippe has already answered the question like from June claps. So that was what I was about to say. So the agent was developed exactly for this use case and he linked like the docs for that as well. To reiterate the question in case anyone's watching on YouTube afterwards where you can't see the comments, it's about using secure platform to launch jobs on HPC, where your HPC has two factor authentication. So secure platform can authenticate either using SSH, just you give it an SSH key and it just logs in as you would do normally, or if that doesn't work because you have two factor or something like that, then then you can use tower agent, which is basically a demon, which you set up yourself on your cluster, which sits there running and that reaches out to secure platform, rather than the other way around. And we use they use that like here at NGI like for in a secure cluster settings. So that works well. Do we have any more questions from the audience? I see nothing else in the chat. Yeah, well, then I would like to thank you Maxim for giving this interesting talk. And as usual, I would also like to thank the John Zuckerberg initiative for funding our bite size talks and you all for listening. And I hope you all be back. Bye bye.