 Hi, everyone. Thank you for joining today's by itself stock. So first of all, I'd like to thank the chance to come back initiative for finding all NFCO events. And as always, the talk will be recorded and shared on our YouTube platform and shared on Slack as well. So if you're not able to catch all of it now, you can catch it up later in those spaces. And today we're glad to have Phil, who is up information at Sylap Lab in Sweden, and also the author of MultiQC who will be presenting on troubleshooting fill by plane NFCO by plane. And it's going to be roughly a 15 minute talk and then we'll have more of a Q&A session and discussion at the end. So feel free to use the chat box or unmute yourself and post any question or comment at the end. Over to you Phil. Thank you very much. Thanks for the introduction. It's nice to be back, giving another size talk. It's been a few months since I've done one. And quite a nice topic today, I think. I'm hoping that this will be quite a good resource for especially new people to NFCore who are just trying to pick up running next to running and NFCore pipelines and might be running into trouble. The idea today is to just run through some of the common questions and queries that we see on Slack and people will try and run pipelines and hit difficulties and kind of walk you through my personal typical kind of steps of what I do when something goes wrong. I'd like to point out that this talk is aimed at kind of end users, so people running pipelines. I'm not going to go into kind of originally the title was debugging a failed pipeline but it's not really debugging. I'm not going to go into how to look. I'm not going to go into the code of the pipelines themselves. I think that would be a good to follow on talk. And also like I say, like many of these talks really is kind of this is my personal take on it. I'm looking forward to hearing what everyone says in in the chat in the discussion afterwards like think about what kind of things you do if you're a bit more experienced and if you have any suggestions. And hopefully that part of this today's talk will be as good as my slides. Right, so I'll kick off. Let's see if I can get some working. Yes, because at some point things will go wrong. So if you're an experienced a bioinformatician you are how many years you've been using that flow stuff can and will go wrong boys if you run enough pipelines. And so here this is kind of your your lifelines to take a step back take a deep breath try not to send your keyboard through your computer monitor and and we'll walk you through how to how to get things up and running again. I've broken the talk up into five sections is the kind of steps I take. And so the simple one is start small, start simple start small. And we kind of say this over and over again but it is never you can't repeat it too many times really. We use this minus profile test, we use that for all of our automated testing and basically that should always work pretty much. So, if you are starting to use next low or you're running a new pipeline for the first time is always a good idea to run this first with a profile test, keep everything as small and minimal as possible and check but it works, because it should work because it should be passing on the automated tests. And so if it doesn't then it means there's something wrong at your end basically with the way you're running next flow with the config with something that's outside of the pipeline itself. And that translates where the problem is coming from which is what this is all about. So start small start don't use up some massive data set that's going to take days just checking that the pipeline runs as you expect with a minimal test data set. And also if you've hit a problem just check the basics. You don't have to go far in the slack history to see that lots and lots of people's problems have been resolved by updating next low. They come out fairly frequently. And within NF call we tend to use many of the latest features of next way. Many of the nice features features of next flow come as a result of us requesting them so it's not really a surprise. So, the first thing I always do if something goes wrong is I just check that I'm running the latest next low stable release. If you're running the latest edge version and that's also interesting and maybe try with the stable version and because that could be important for the pipeline developers to know. Then there's all the other like really simple stuff like have you got enough disk space like pipelines will fail in weird and unpredictable ways if you run out of this space. If you're using Docker, do you have a Docker demon running in the background. Did you remember to start it. Just kind of run through these basic things and often that will that will get you up and running. And wherever we see common things coming up within NF call we try and add it to the website on the troubleshooting documentation page. So if you haven't already just have a scan through that and see if see if what's happening to you as mentioned there. Okay, most people will do that stuff without really thinking about it but next just to categorize what kind of error it is that you're seeing. So, just because next flow fails on a certain step of a pipe pipeline doesn't necessarily mean that it was that step that it was that software tool which was responsible for the failure. Different types of failures happen at different times in the execution pipeline. So we're going to we're going to go through that now. Errors can happen basically before the first process kicks off so right when you first run next play. They could happen during that first process so when it next to tries to run run something it fails. It could happen somewhere somewhere else during the run and something wrong at the very end. So these are kind of different steps and one of the most common is before the first process so you try and run next flow and it just kills that is immediately. All of these examples, and I have mostly taken either from myself or from searching the slack history so apologies in advance if you see one of your queries on slack coming up as an example I'm not picking on anyone. And it's just kind of, you know, typical examples. So, so here this is a very obvious one. Unknown config attribute project there. So next flow is found something in a conflict which it doesn't recognize. And the reason is that this particular attributes is only available in more recent versions of the next way. You can see at the top that this is running 0.27.3 which is years old now. So not very surprising. So next flow is an up to date using your version, you should work. This is very obvious this happens right away you've only got a couple of lines of output here. But it's not always that obvious you could be running the RNA pipeline like here, you get tons of output all looks good it's nice and colored everything's fancy but you got a kind of really obscure error spat out, but just take a step back, go back up to the top. This version is not very out of date but a little bit out of date. And that's enough in this case to make the pipeline fail. So next version always check that first. And remember, if you're new to next one and of course, you need to tell next flow how to handle software dependencies out of the box if you just run a pipeline without any arguments. Next we will expect all of that software to be installed on your machine which is almost certainly not going to be happening. So you need to tell it, I want to use condor I want to use Docker I want to use singularity, or as about eight different types of kind of engines, which we can use to handle software dependencies automatically for you, but you need to tell next which one to use. And typically we do this with a config profile so here I've got test common Docker, I'm saying run the test profile and use Docker to do it. And of course you might want to use a different tool here or you might have your own config, which defines the which software tool to use here. So it might be the name of your institutional config here or something. But you don't have any spaces. If you have a space there, then it will just run profile tests and ignore the Docker, and your pipeline won't have any software to use. So, small thing catches a lot of people out including myself I've done it lots of times. Okay, so what very often you'll get when you something goes wrong, especially if the pipeline fails within the first process or within the actual execution of a pipeline is a lot of output. And this can be quite intimidating. Next flow really tries to help you with with figuring out what's gone wrong. And to do that it tells you everything it possibly knows about the step that was going on when it failed. And this isn't all of it here. But let's try and kind of take a pause and try and work through it and once you get used to looking at these kinds of areas and break down the different sections that they're quite quick to skim through and what we're really going for here is always finding the relevant part of the log which bit is telling me what's wrong. Here you can see the bit that pops out to me when I see this is command not found. This was a step in a pipeline the first step. But it's, and it looks like maybe it's something that was wrong with our Sam here from your pipeline. But when you look at this I say command not founds this is almost certainly a software packaging problem. So this has been run without Docker. Next flow doesn't know where to find the tool that's trying to run. And so it exits with an error saying the command is not found. Add minus profile Docker or something similar. And this will fix itself. Other kind of typical ones within this first process could be something to do with actually submitting the job to your computer environment. So here I'm trying to, or someone was trying to submit a job to a SLAM HPC cluster using S batch. And here it said the area is the top course by fail to submit process to grid to schedule if execution. So there's an S batch error. And in fact, so you can see under the command output actually tells you what was wrong. So again, this is not a problem to do with the pipeline. This is a problem to do with your config. Okay. So I kind of touched on this already but let's break down that log and try and get used to what it's telling us because there's a lot of text to look at here. The structure is always the same. So we have at the top information about where you were in a pipeline and what kind of error there was. So the top line says, okay, this is the process every pipeline is built up by lots of different processes that run in order. This is the name of the process that went wrong and in brackets you've got the tag which in this case is the name of the file where it broke. And it says caused by and that's kind of a summary headline of what went wrong. So here next flow was expecting some output and it didn't get it. So it's expecting a zip file and it wasn't wasn't generated. And then it says, okay, this is what I was running. This is the exact actual bash command, which to be honest for NFCorn for us is rarely interesting. Most of the time you can sort of trust this but that's like the resolved command that was run. And then you've got the exit status which is the status that was generated by the command when it finished. Usually not on zero. It means error and zero means success but in this case, we got zero even though it was an error. Next up in the log we have the actual output from that tool. So command line tools can generate two types of output in a terminal, you can have standard out and standard error. But for the purposes of this talk basically one and the same thing. So this is just telling us so two different types of output that we got from from fast QC in this point. So there wasn't anything on the command that standard out, but the standard error gives us a big blob of text. And if you run fast QC yourself manually, in a terminal if you run fast QC, this is what would be printed out basically. Okay, so there's a bit of a misleading, like red herring at the top that warning message is actually not related to the error in this case. And for bit, if you keep reading, it looks interesting as here. Okay fast QC is telling us what went wrong. It's just kind of buried. It's saying your file is probably truncated. So this error is almost certainly due to a corrupted file that downloaded didn't finish. This again is very common. And you just kind of need to work your way through the log file and the output to figure this out and try and spot that little nugget of interesting information in here. This is another example. This is running some tools. And again, same thing. Here it is buried in there some tools sorts truncated file. These are all examples I pulled out of slack. If, if you need to dig into this a bit more though, you can you don't have to look this is just the main output from next low running in the terminal when you run the pipeline. But you can start to dig into this specific process a bit more. And that's where the next bit of log is useful. Here it tells you where that process was running. So, every process generates multiple tasks each task runs in a work directory and an isolated file system. And so here this is the path to that work directory and we can go in there and we can start to dig around in those files and see if we can spot anything that wasn't immediately obvious in in the summary log output we have here. So what's the what's in a typical work directory and that's me the work directory. You have over the input files and any output files that were generated by the task, but you'll normally have a core set of files which next flow itself generates. You have a bunch of files which just capture the output from the tool. So I've mentioned this already that you've got standard out and standard error and you have a file for each. And then command log captures both into one file, which might be useful if you want to know what order different stuff came out in. So I have files which next flow uses to to kind of track and run the job itself. The exit code file just captures about zero or non zero value. I've got the trace you've got the command dot begin which to be honest I'm not sure I've ever looked at. And then what's usually the stuff that's most interesting after the output from at all is the command run and command dot sh. The final one is the bash script and that's just the resolved command which is run, and you can try running that yourself on on on the command line if but that won't use any of the software stuff like Docker and things. The command run is what next load itself actually launches and that will use Docker and everything and that should give you an identical error message. So it's useful to look at if you're using s batch or, you know, an HPC job scheduler, because at the top of that file you'll have the requests that were actually given to the cluster. So if your if your cluster is rejecting your jobs because of weird memory or CPU requirements you can check in a look at the headers and then manually debug that. Okay, you've looked through all this stuff you, you still don't really know what's going on. Maybe you found a little nugget of text which you think is the smoking gun. You don't really understand what it means. Now is the time to kind of start searching and the first place I always start is slack and of course slack. And we've been using it for a few years we have two and a half thousand users. There are, I don't know how many 10s or hundreds of thousands of messages in slack. There's a pretty decent chance that someone has come across this before and asked for help. So if you've, the key is to search for the right thing but once you've got that little nugget, stick that into the slack search bar and have a look. Many tools and errors will span multiple different pipelines, and you're probably not a member of every single pipeline channel. So it's really worth searching ever because maybe you hit the samtool sort error in. I don't know, I don't know seek and maybe someone is hit the same thing in the site pipeline. So searching all the slack is really, really powerful. And then of course there's also Google and you can ask finally ask for help. So this is just a few screenshots here you can see that that truncated file error if I stick that into into NF course slack. You can see there's this stuff, people talking about it and I don't know seek and Sarah in viral So having a dig through that might be helpful. And of course, search and Google once you have the correct bit of text is obviously helpful. Okay, you're still stuck. Now it's time to ask for help. There's good ways and bad ways to do this so what I'm going to take you through is the kind of like what I as someone responding to help request is kind of what makes my life easy, which gives you the best chance of getting a quick and useful Firstly, if you can pick the correct slack channel to post in we have lots of slack channels. If your pipeline is specific to if your question is specific to a given pipeline please ask in the channel for that pipeline, because the people in there will know the most about that pipeline. If you think it's to do with the config posting configs and so on and so on. If in doubt, you can always post in a help or and someone will either answer you there or redirect you provide straight away as much information as you can. This is really important and more experienced people tend to be used to this but especially if you're new to the community or new to buy informatics, you can kind of post the bit that you think might be wrong, but really out of context it's almost impossible to help. So as a minimum, usually we'll need the full command that you use to launch a pipeline and any next flow conflicts you use, because that ties together to tell us what environment your era came from. Sometimes this can be quite a lot of output. And so if in doubt post a short question or summary and then you can create a thread and slack and then you can kind of dump these outputs into there and it doesn't float and flood the whole slack channel. Use markdown code blocks don't just paste in your text from the terminal, because then you want to use those triple markdown triple back ticks to do a markdown code block. This is just purely code formatting, but it makes it much easier to read your message for anyone reading slack very easy to do once you're used to it. And try to narrow down the issue as much as possible before you ask so go through these steps we've talked to and come up with the best kind of question as you can, and tell other people how they can get reproduced error, because that's how bug fixing works. The first thing I do if I have an error reported to me which I think comes from a pipeline, so I go and see if I can get the same error. And once I can then I can work on it and dig into it and make sure I fixed it. But if I can't reproduce the error on my end is very, very difficult to actually fix anything. These are some of the things if you if you fall foul of these requests you might start to see these things come up. And these are things where we've written the same thing a lot of time so now, now we have little helpers within slack and so every now and then I'll type more info, and you'll get a little slack bot message which says basically what I've just been saying, please tell us a bit more about how you run next please don't feel offended if you get this, I send it to everyone. It's just a little reminder like we're probably going to need more information to be able to help you. I'm also about posting in the correct channel. And if you, if you don't format your code blocks nicely then there's a risk that me or someone else might ask for for for kind of better formatted ones in the future. I'm not really complaining it's just trying to help you out saying, I don't know how to do this. Here are a couple of help pages. You've gone through that and the people on slack can't help you, and, or maybe you think, you know, straight up you're pretty sure that you've encountered a bug in the pipeline code. Now is the time to move away from slack and actually make an issue on the pipeline repository on GitHub. And this is where we track kind of problems feature requests and bug reports, so they don't get lost because it's quite easy to lose things in the slack and, you know, they just kind of disappear out of sight when you're a maintainer and you forget about it. If you make an issue on the repository it's there and it keeps all about discussion together. So please do hit bug report, get started and it gives you a template to fill in to provide all the information that we typically need to be able to help. So title description and your terminal output all the same stuff I've been hammering on about. And same stuff really give all the information that we're asking for, try and narrow it down as much as possible and tell us how we can reproduce to error. If you think you know the solution don't be shy, you know, say I think it's this bit here and if you think you know how to fix that problem even better, just make a poor request or make an issue followed by a poor request and then just submit that fix yourself because that's the quickest way. And of course that really helps to relieve the burden on maintainers as much as possible, and that's that's the way that most of us who write code within and of course kind of got into the community. So we're very open to poor request always. Right, this is meant to be a short talk and I've already gone over so I'm going to wrap up at this point. And let's see if anyone has any any questions or any suggestions or thoughts about how how they do this and and and any any cool ideas basically. Thanks for listening. Yeah, feel free to ask comment anything in the chat box in the form and meet yourself. I'm how she says you can provide mix logs along with command and configs and also error traces. Yeah, what I was just talking about there is I mentioned kind of what's printed to your terminal. And that's a really good start and that's very much easier to read. Next load also generates a log file called next load log so it's the hidden file. And that's like the verbose version of that log. And that is massive and fairly difficult to dig through but that has all the information. So if you can drop that file into Slack that really helps to debug. I don't know if this anyone has another question or James is saying that file that I mentioned dot next load log that's generated where you launch next way. So in the launch directory. I'm not saying you can prefix a version of next way to your command. So if you want to run your pipeline with a specific version of next flow, you don't have to go and download assuming you're running somewhere with an internet connection. You can go and download a new binary or reinstall next or anything. You can just prefix the environment variable and X F underscore VR next version equals whatever version you want. And then carry on with your normal next play run command and next flow itself will automatically fetch that version of next way and run with it. So if you want to check that whether it's a problem to do with the version of next way running. It's quick to do that once you do with that technique. So you just put it before the next run command or you put it in your config. No, put it before the command. If you prefer you can also do it's just a regular bash environment variable. So you can do export and X F or wherever you want and then that will stick around for the whole of your terminal session. So if you just wanted it for a single next play run, you can just prep prepend it before the start of your next play run, whatever, and it will be used there. I'm not sure if next so carries on using that version afterwards or not. You might need to run next flow self update again afterwards if you've gone back. Thanks. Looks like there's no one else with a comment or a question. Thanks guys, and we'll see you next week for another by system. Thanks very much.