 Hello everyone, welcome to the second day of our NFCore training September 2020 third edition So yesterday in the first day of this training we were You guys were introduced to a few sections of the training material the official community training We saw a slide deck presenting next flow in the background why it was created why it's useful some features of the Technology and then in the official training material we Had a look in the introduction section the setting up your environment section the getting started with next flow Configuration managing dependencies and deployment scenarios including next flow tower So the idea is that if you found the pipeline on the internet an extra pipeline And you want to use it and have a somewhat superficial understanding of what it's doing how it's doing stuff How you can change the configuration to adapt to your needs you have the basic knowledge for that now today We have a different purpose the idea is to help you understand the Concepts required to write your next flow pipeline and by the end of the day today You will be able to write your own pipeline the last section will be Writing a proof of concept RNA seek next flow pipeline. So that's gonna be very exciting We will start having a look at what our next flow channels next processes channel operators Cash and resume and then the proof of concept RNA seek pipeline the third day, which is tomorrow will be with Chris Huckard Introducing you to NFCOR the project NFCOR the NFCOR knowledge required for you as a user And the NFCOR knowledge required for you as a developer and a pipeline developer a module developer and so on so keep Watching these sessions because tomorrow is gonna be very very interesting if you want to take advantage of the Normals amount of content including pipelines modules and some workflows that the NFCOR project has available for you So having said that let's go back to the training material I'd like to emphasize that By entering the website there is a button open in git pod and by clicking on this button You will be taken to a to the to a git pod workspace like this one you are seeing now where you can See the content and interact with the workspace with the terminal here. It's what we did yesterday. It's the same thing So let's go to the channels section and start okay so If you watch the session yesterday, you've probably already seen a figure like this one a few times We have a task alpha for example. There are there's an input channel That is consumed by this process every element is gonna give It's gonna instantiate this process in the task alpha 1 alpha 2 alpha 3 and so on and after the processing of this task the whatever it's supposed to do at the end it will have an element and If you have a set of tasks related to the same process You will have this channel of output right in this example here We have a channel with three elements file x y and z and this output channel will be the input channel to the next task Of another process. So here task beta for example, and then we have this multiple times happening So we could say that these three elements together the the formal workflow you have Processes you have channels But in this section we're going to focus on this middle part the channel. Okay, so the first thing to keep in mind is that Next flow has two types of channels We have Q channels and we have value channels So far every time I refer to channel both today and yesterday. I was referring to Q channels But today we're gonna also see these value channels So the definition of the Q channel is pretty much what I was What what I told you yesterday? So basically is an asynchronous undirectional first in first out Q data structure That connects to processes or operators or a process in an operator or an operator in a process As I said before processes in operators, they are functions like in the regular functioning programming languages But here they are special. They are an extra processes. They're an extra operators They interact with channels, which is a specific type of variable So the synchronos here means that the operations are non blocking so that if you have enough Resources in your configuration is not preventing the processes to instantiate. It will keep consuming this channel with other processes It's undirectional because the data flows from a producer to a consumer So a process alpha it's going to receive some input some input channel is going to Output an output channel and this one will be consumed with the next one So we always have a producer and a consumer And the fifth one means that the data is guaranteed to be delivered in the same order as it's produced so first team first out I briefly mentioned this yesterday that actually it's a bit trickier than that because if you have a channel of elements It's guaranteed that the first element in this in this channel to be consumed will be the first one But maybe the first one is going to take much longer than the second one to be ready The process seem to be ready Which means that maybe in the next channel the second will be the first one and the first will be the second one There's a process directive called fair fair threading that you can use to guarantee this ordering after the parallelization But what you have to keep in mind here that this FIFO means that The first element to be consumed in the channel is the first one and then the second one and so on if they all take the same time You will have the same ordering at the end Whenever you use a channel factory, which is a function to create channels, you will create a Q channel. It's by default So here we can create Example that an app a new file and I'm going to copy paste this snippet here Which basically uses channel of which is a channel factory is a function that creates a channel And I'm providing three elements here one two and three so it will be a Channel with three elements the print a land is a programming language It's a groovy function to print the content of a variable here It will be weird because this is not a regular variable is a channel It's an extra channel in here We would use the view which is a channel operator to view the content of a channel That's the right way to look at it And then you see that this print a land is not very useful because you cannot really see the content of the channel So let's go next we'll run. I think it was excellent example that enough So you see that what the print a land Prints to the screen Is some not really understandable content? It's important for next row to know that we have a data flow broadcast and data flow stream But for us we don't care about it We want to see the content of the channel and that's what the next flow the channel operator Viewed us one two and three they are in different lines because there are three different elements if we used the Collect channel operator like we did yesterday in one of the examples You would see they would be in between brackets in one line because we will have one element which has three items Okay So that's the correct way to do it It's actually what we just did right we created this Example that and a file we put this inside and we ran this Neck flow pipeline. Let's call it like this the next flow script But then we don't really have to spend much time with you channels because that's what we have been using the whole time And talking about the whole time the new thing here is the value channel. They're also called single tone channels, right? The idea of value channel is that first it's a single element Value channels are always single element channels. That's why they are also called single tone channels And they have one very interesting characteristic They are Implicitly created whenever you use a channel operator that returns a single element So you have first last collect count me all these channel operators here. They generate a single element So if I have a Q channel with 10 numbers and I apply first to it If you get the first one, which is a single element and therefore this new channel will be a value channel But then there is a very important difference between Q channels and value channels one that you have always to pay attention and This example is very good to show. What's the difference here? So let's copy this and Modify the example dot NF script that we had written before So we are creating a channel here a Q channel because we are using a channel factory, right? One two and three and this other channel here has only one So three elements one element then we have this step It's a process called some that receives two values x and y The output is just printing to the screen in this quick block that that shows what's being done here We have this this is actually a bash Expressions not next low. It's what you would do in the command line You're again just using bash to show example one example here then in the workflow block We say I want to run the sum process with these two channels inputs channel one channel two And then I want to view the content of this output channel, which has the sum of the two elements If you run this something very unexpected will happen We should have two as the output The sum of one plus one and that's it So what happens here is that Q channels Elements in Q channels can only be consumed once So the one from channel two was consumed to some with the one from channel one But the process didn't create a task for two and three because we don't have more elements in the channel two So this is a very common thing sometimes we have ten samples we run a process and we only see one task And we ask ourselves like why don't I have ten tasks if I have ten samples? So probably one of the channels that you passed have a different number of elements and because your Q channels Elements can only be consumed once it will not generate tasks for the other Elements of the different of the other channels one way to fix that is to turn this second channel in a value channel So there's actually one channel factory you can use for that Which is the value one but here we're gonna do different we're gonna use first which is channel operator It returns well value the first element in a channel Which here is the only one we have one and now we will have a value channel and by running the same pipeline with this small change You see that we're gonna have one plus one two plus one three plus one. So we have here what's expected So this is a very nice tip whenever you you see a different number of tasks from what you expected Be aware that maybe you have many Q channels and the number in the number of elements diverge So you have to convert one of them into Value channel this what was explained here So we have mentioned this channel factors a few time. Let's look at them into more detail now So they are functions just like any function, but they are a special functions that create channels from non-canals That's important to say because sometimes people have a channel and they want to create a channel with a channel This won't work. So you have lists files strings You have regular variables and you want to convert into a channel containing them That's the way to go with channel factories. So the value one it will create a value channel. So you need one Value and one only nothing is considered one because it's No, right One of the most common channel factories is of Because it literally is like of anything you can have a channel of numbers files Lists dictionaries strings, I mean anything So here we create a channel with four elements which are numbers and we use the view operator to view the content of this channel So this actually is related to a question that I answer today in the channel and of course I would like to take this at the end of this opportunity to remember that you should ask the questions in this SAP 23 training foundational slack channel Which you can find in the NFCOR slack if you don't know how to join the NFCOR slack There's a link here and this is the page of the community foundation actual training, right? But if you can't find this page you you can just go join NFCOR here at the top and then you choose slack here and you will be able to Join the channel and ask your questions. There are lots of people that are there Ready and happy to help answering your questions. So coming back to this the question I answer today is like what? What's this thing with the curly braces, right? This is what we call a closure. So I'm gonna Change this script only contain that I'm going to do The regular one which is I want to use the view channel operator to see the content of the channel ch but then I Will also use the view with the curly braces, which is a closure So a closure is when you pass a block of code as an argument to a function. So here I'm passing a string and Whatever the element is it's referred as it If you want to refer it with another name for like for readability purposes, you can just do Something else like number and then you put value Number and it will work Let's put like this So when we run this, let's clear the screen when you run this You have this G number just the numbers, which is this line to Value and the number which is the line three. Oops. I'm sorry. It should be without the door sign So I'm saying that whatever it is I'm gonna refer to it by the placeholder by the variable number and here I say number if we don't say anything It would be it so we have here with value with number and just the number Maybe we want to double it We could do something like Number, let's erase this. I want number times two. I don't know if this is going to work with. Let's try Okay, it worked. So one times two three times two five times two seven times two I'm using this double quotes here because I want a string, right and the reason I'm using the door sign It's because outside strings. I can only I can simply use the name of the variable But inside strings to differentiate the variable from any other string. You have to use the dollar sign Right, and it's safer to use the curly braces when you want to do some operation or apply some function to To the number, right? So let's say that I want to convert this I don't know like if I do this. I'm just getting the number and this is a string This is a nice example. So if I run this I'm gonna have one star two three star and so on But if I put it inside Then we know that we are doing an operation with this variable. So Cloters are a bit. I don't want to say advanced, but they are a bit difficult to digest at the beginning They're not so obvious for people who are starting to program But they are so useful and you are in your path on learning next flow and writing next flow pipelines Channel operators and closures are extremely powerful. They're very very important And you can learn more about them in the groovy introduction section that we will not cover in this training But you're free to check the material. It's very very nice. There's a part enclosure and Also in the official documentation, which is docs.nextflow.io you have next with scripting and you have closures here also Assessional enclosures. So again, we won't cover that but it's very very nice If you want to write next flow pipelines and want to really feel the power of the next language Channel operators and closures are extremely important. Very very nice Okay, so the next channel Factory is from list the name is quite obvious You're going to to create a channel from a list. So let's copy this and replace inside to see what happens I have a list here in groovy in next flow. You separate the elements in a list with the brackets, right? I mean you with a comma, but you Specify it's at least with the brackets and let's do next go brown example enough to see what's happening here What will happen? So we have hello in world and this is a nice example because if you think for a second you could say Why don't I just do this of it's a channel of a list, right? Yeah, but we have one element here. So if you print one element if that's what you want Great, you do it this way But usually people have a list of many elements and they want every element to be a single element in the channel And then the front least channel factory is very important for that as you can see here So let's go to the next Channel operator, which is the front path. So the front path is very interesting and very important If you're handling files, so the idea is to use is to provide a string with a glow pattern or the path to a specific file So in this one here, it's looking for the current folder data The data folder in the current directory where we are Inside the meta sub folder this CSV files So any file ending with dot CSV If we open the file explorer here and we see that we have the data and the meta sub folder We see two CSV files inside patient one and patient two. Let's close the file explorer. Let's copy this snippet or example.nf. I Partially like to Use spacing like that to make it easier to view to understand the expressions the chain operators, right? But it's just spacing. So it's up to you. It doesn't change Technically nothing changes in terms of readability and the personal preference of mine. So when we run this We will see the two CSV files that we have in this pack Which is patient one and patient two If you want to instead have a single element to provide all the CSV files to your next process You can use the collect channel operator that we've discussed a bit, but today we will get you know into more detail But for now, let's Keep with the basic understanding of it. And as you can see here, we have one element with two items Which is patient one and two CSV Okay So the glob pattern here that the description refers to is the star means anything ending with dot CSV You can have more complex shell globby you could do something like Like this which means Anything ending with underline one or two dot CSV And when we run that you see the same thing because actually you only have patient one and two, right? But I could just put one or three for example, and then in this case we would only get the first CSV file, which is the patient's one. Okay, so let's go to the next There's some information here some options about the from path channel factory I won't get too much detail here, but it's here for you to read if you want to get More knowledge about this specific channel factory And if you want to learn more about glob patterns, you can just click this link and there's also some information So there's an exercise here if you want to pause to try to do it Do it now because I'm going to show the solution, right? So the question is to use the channel from path channel factory to create a channel and meeting all files with the suffix dot fq In the data ggl directory and any sub directory in addition to hidden files So as you see there are some particular requests here and to understand how to do that you can check The the options here like hidden and so on. So let's open the solution one two three And you see it's mostly what we did before so from path The difference here is that we have two stars and if you go a bit to the top you see that two stars It cross directory boundaries And we have the hidden truth show hidden files also So let's copy this go to the example that enough And see what happens i'm going to type clear to clear the screen and burn again our next word script Get one two leave one two learn one two cool The next channel factory is a very commonly used one for for in bioinformatics It's called from file pairs So as you can see here in this list of files that we saw we have pairs of samples So get one and got two lever one and lever two and then one and two We have pairs of files and this channel factory from file pairs is very useful to handle this and you will see why Here we have six elements And we know that actually they should be three right with some information about Now where the files are and some id of this list of files that relates that relates to the to the same Sample right so let's copy this snippet here. Let's go to the example that enough and let's run it again So by doing that you see that we now have instead of six We have three elements in which every element is a tuple which is a structure with many elements So this element has a gut Which is an id and we have a list of paths of this gut sample that one got you Same thing here lever in a list of paths lever one lever two and lung Lung one lung two So if you have pairs of files like in paired in parent sequencing This is very nice. You can use this from file pairs channel factory to handle that for you Again, if you have a pattern you have these options here And you should read them if you want to get more knowledge about it Again a new exercise I use this from file pairs channel factory to create a channel emitting all pairs of files you read In this data ggl directory and print them Then use the flat true option To compare the output with the previous execution. So i'm going to show the solution. So if you want try to do it You can you should read these options above because they will be needed to the solution three two one Again very similar from what we are doing before what we did before Channel factory this string refer to the path and then we have flat true now And the output of that let's check what will be still one element for I mean for every sample But now instead of having the id in a list of reads we have it flattened We have three items same level which is this the sample the sample id in the true files or more if there are more files Another interesting channel factory is front sra So this one made it possible to query the ncbi ntbi sra archive through their api And return the channel emitting the fastq files matching the specified selection criteria Which means that we're going to have this id in the list of the reads for uh sra id that you provided right so here we have some some instructions on how to Uh to get your your api key how to put it here how to use the front sra by providing an id Your api key it will download everything for you So this takes a while because these files are huge so we won't do it here But in the end you have something like this The id and the list of the reads right the one and the true for every sample You can also provide multiple accession IDs right here. We have three instead of only Uh the one that I did here Here we have a simple next low pipeline using fastqc But obtaining these reads this this data with the front sra channel factory Again, I won't run this because it's gonna take like forever If you don't have fastqc installed in your machine like you saw yesterday We can use the bio containers fastqc container image by just putting here container or indexo.config And running with with docker or by using docker dot enable equal true in your configuration file So all of this we saw yesterday how to handle containers in this managed dependencies and containers section So you can also work with text files So once you use the front path channel factory to get this file So here's a nice example because we are not using any gloving. We don't want to get multiple files in a path We want one file Specifically and once we got it we want to use the channel operator That again, we will see more today in the operator section We're going to use split split text to split the text in this file So if we go if we open the file explorer and go to data meta Random dot txt. You see we have some random text here with seven lines There's one text file with seven lines By using the split text here, let's see what's going to happen this first example here So basically from path to add this channel to a to add this this file to a channel Then I want I will use the split text Uh channel operator and I'm going to use the option by 10 And then I'm going to use a closure here to whatever it is it I'm going to use the I'm going to apply the function to uppercase to this thing Which means that I want it to be uppercase, right? And then I'm going to use the view channel operator Oops to see What's inside this channel? Let's see what happens So you already saw the content of the random dot txt seven lines of text Here what do we have? So we have three six seven lines Everything is uppercase and every line is an element in our channel So maybe we could do by two And see we have every element two lines here So if we can always go to the official documentation Go to operators Choose one here. Let's look at the split text And you see these options So the by defines the number of lines in each chunk. So default is one One line element or you can group them in different elements so In this example here We could Well, it's already okay. We export already A lot in this one So you can also use the subscribe channel operator that does something whenever One element is consumed from my channel So this is good to explain how the split text with the buy option works So I'm going to do buy two and I'm going to add this end of chunk Every time an element is consumed, right? Okay, so let's do it here So we have Two lines and of chunk two lines and of chunk two lines and of chunk and here what whatever was left Let's keep giving some examples. I won't really repeat all these examples, but just different things you can do by Getting data from inside a text file You can also work with csv files comma separated values So here we have the patient on the line one csv that we played before We're going to use a split cd channel operator and in the end we're going to have a look at every role In our csv. So if we open the patient on the line one, we have patient id Jira id s3 id number of samples magnified regions and so on if we get This snippet here, let's try it out Oops So if we run it We are going to get the csv You're going to split the csv according to the commas And I I'm going to call whatever comes in every element row And I want the first one because it starts with zero and the fourth one So the columns right the first column and the fourth column Which here will be I closed it the first one is the patient id and then You're going to have the number of samples, I think Yeah, so the id and three which is the number of samples right samples One two three four Yeah number of samples So we could get all of them a few of them. Here we go at the first and fourth You can also include the header with this option you can When you do that you can also get by the name of the column instead of the position of the column And there are many other things you can do here with closures as I showed a few times You can do like a lot of magic here. We are using Slash Reverse slash t to edit a tab between these two values and there's so much we can do with that These examples are very nice You should have a look with care in every one of them to make sure you understand what's going on And if something is not clear again go to the to the channel we created on then and of course lack to ask your question so We haven't seen the RNA-seq workflow yet It's the last thing we're going to we're going to see today to this exercise is a bit Out of place, but if you want to try Or maybe i'm going to open the solution now You can just have a look at how it's changing things to make it work This is the process from the RNA-seq workflow And these are the things that changed right okay There are also you can also work with tsv You're still going to use the split tsv function channel operator, but you're going to provide that new separator which is reverse t right So he had asked you to use the tab separation technique on the regions.tsv file But print just the first column and remove the the header So with what we've shown so far you can already do it by yourself So if you want to pause the video to try to do it You can do it, but i'm going to open the solution now So as you can see from path To load the file We're just using here check if exists which is going to throw an error if the file doesn't exist instead of silently failing It's just an option that you can check the documentation on the front path channel factory and see how it works We're going to use this split tsv channel operator. I'm providing the new Separator, which is reverse slash t is a tsv, right? I want the header and I want here the first Column, which is because we are using the header we can call by the name patient id There are also more complex file formats Like jason, we do have a channel operator for jason, which is split jason. There are some examples here with the output Again, I don't want to get into much detail here Because I think it can be slightly advanced for people who are just starting to use next floor programming But again, you can with com have a look at this Every jason file the output the search code with next glow to split that With yellow we have to use a bit of groovy, but it can still be dawn And one thing you can do and this is a very nice feature of next floor the modlarty That you will see tomorrow with chris hagard in more detail using nf core modules and nf core tools To help you with that But the idea is that you can get some code Like there's a yellow groovy code here code here and you can write it to a file Instead of having it written in your main workflow script So you can just include a function from this file and call it Like here in your workflow script it's much it's going to be much cleaner for you to read So you can organize content like configuration script files everything you can organize different files Just include them In the main configuration file or the main script file and so on So with that we finished the first section of today, which is channels And it's time now to go to the next one, which is processes So now in the next section, we're going to talk about processes Which as we saw a few times already are basically functions, right? They're special functions. They're next floor processes So it's the basic computing primitive to execute for any functions the next floor Every step of your pipeline if you if you think of your pipelines as a set of steps Every step can be represented as a process Here we have a very simple one, which is we call it say hello. There's some Preference usually to use uppercase names for processes, but it's not mandatory. It doesn't really have any technical Effect is just like indentation, right? But usually people use uppercase letters for process names We have a very simple process here. We just have That just has This quick block with an echo. Hello world, right? Again without the workflow block nothing happens because we have to tell in the workflow block what processes should be called In reality processes can be much more complex than that So it can have Directives at the very beginning Which are declarations that define some optional settings of how the process should behave Using container or using konda requesting a specific amount of cpu of memory Having some limitations for time and disk among many other things We have the input block which has the expected inputs for this process I can have maybe a thousand files in my current directory But maybe I just want two of them to be the inputs to the next process So you specify them in input block here Same thing for the output block So maybe the command that i'm going to run in this process is going to generate a thousand files But maybe I just care about one of them, which is like the results of my analysis So I only want to pass these results file to the next process. I don't care about the other ones So the output block here is not like about what will be generated But what will be generated and I want to be in the output channel The when block is a condition that you can create to specify When the process will generate a task So maybe you can say that if the if the if a specific input element is a number Fine Run a task If it's a number don't do anything So it's like a you can think of it as a starting condition for a task And then we have this this quick block here That can have different keywords not only script and we will see soon Which they what they do, but for now think of this part As what will be done in this step of your pipeline Okay, so here we have another example So now we have the workflow block calling the example process And we have here It echoing a string with some reverse slash n which is a new line And all of this being saved into a file name file And they said many times what's inside this quick block? It's not next flow It's some r code python code met lab or a program you're calling In this case as in many other examples that we have seen so far We have shell script commands and expressions So by after saving this in the file, I'm going to get the content I'm going to get the first line I'm going to get the first five characters if I recall correctly And I'm going to start this in the chunk on the line one txt file Then I'm going to compress this into a file name chunk underline archive dot gz Okay, that's what my process does. Let's copy this And paste here and run this next flow pipeline Nothing will be shown because we you're not saying anything to be shown But we can get this hash here, which is the task directory and use tree to see the tree structure So work And you see that we indeed have a file name file. It was created here We have the chunk underline one txt and we have the chunk underline archive gz There is no input, so it's final input, but we don't have any output either So if we requested the output channel of this process, we would have nothing because we didn't specify What to be so let's even try to do something like this Which is I want the output channel of this process And I want to apply a channel operator called view to view the content of this output channel So if we run this There's nothing actually to to be Call here I'm going to try to do this but if you had put output and then path chunk archive dot gz It's a path from this example Actually use the pipe here Same thing, but it's better this way So by using the view here with the pipe instead of using the dot same thing just better to read We have now the output the element in the output channel, which is the chunk archive gz But if we don't have this output block here And you run this Then it can be applied to an undefined output because there's no output, right? We didn't specify an output here So here we have a different one and now instead of just share script We have a python code and we use the python shebang to to let next we'll know that it has to use python to interpret this script file So some python code here And you know, it's the same thing. We're just going to run for the sake of showing you the output But in the end you could have our python matlab You could have shell script anything here and it would work the same way No output or nothing printed screen You can also use parameters as we saw before to Change some content some variables inside the script block. So here we have The default value of parameters to be of the data parameter to be world And we're going to say hello and whatever it is inside if we run this Just running we're going to have hello world either I do view here to see the content Or I put the bug with the book through here There is no output Like this the bug through And we can see hello world if I run this again with dash dash data And I use hello mercell We're going to have hello mercell here instead of hello world As we saw yesterday. So good to go to the next part Ah, okay. So one interesting thing is remember that when you have the dollar sign It's a it's a variable, right? So next we will try to resolve this variable just like I did here So params data is a variable. What's the content of it world? Okay, fine But what if it's not an extra variable? What if it's a bash variable? Like even in here in the terminal I can use echo dollar sign pwd and we have the word the to print the current working directory So if instead I say here echo pwd What's going to happen? Let's see There's no need to put this dash dash data It showed the current folder. Where am I where I am right now? Good, but I didn't want that I want the pwd of the task folder when the when the script is being run. I want to know where it is So I have to use a reverse slash here And by doing that next flow will not resolve this variable We have to escape the dollar sign here And it will be interpreted and resolved when a task is being run and here we have test directory So it could be the same thing for other variables, right? So escaping dollar sign among all the things in your programming language in the script block may be required in order for next flow to know if this is a next flow variable or a variable in your programming language of the script block Another way is to use single quotes and then next flow won't try to to To resolve this variable, right? But this is not the only way you can also instead of script you can use shell And then the dollar signs will be for batch variables or whatever language you're using and exclamation mark with the curly braces will be for the next low one So some people prefer to use a shell so that you can have this obvious difference between next low variables and other variables You can also have some conditional script. That's very useful. So I can have this file that I want to compress I want to choose the gzip method to compress and Depending on the on the on the on this Parameter, I'm going to use a different program. So I have the workflow block here. I'm going to call the full process with this input the file to compress It's a file so it will be converted into a value channel automatically when you call the process but then If the the the parameter for compress is gzip use this script block for gzip If it's busy use this one if it's neither Throw on an argument saying a no compressor and the name of the compressor that the person used So if you want your process to behave differently based on a variable That's what you're going to do conditional scripts So we can also have a look at the input block of our process So we discuss this a lot of times. So You have an input qualifier and you have the name of the input. So here I'm going to say it's a value And whatever is getting inside this process. I'm going to call I'm going to refer to it as x So here process job one two and three right If if they are files or paths you use the path qualifier like we are doing here So a channel from path I'm going to have a channel where every element is a file and the process has that is a file by using this path So here I'm going to Whatever the name of the of the file it will be stored in my folder as This name. So let's try this because it's not so straightforward to understand what's going on. I'm going to remove Let's just run this next flow run next sample dot NF So we know this data ggl Uh folder has all these six files here, right? And what I want to do Is to Okay, let's to make it easier. Let's just get one Which is the good one fq and I want to know where What's the test here, right? It will be here, but just because we learned that to escape the dollar sign Let's play with it. So we know that the input is good on the line one fq. It's here And here we have the path. So let's open it And we know that the name of the input file it will whatever it is It will be overwritten with the name sample dot passq. So if we do a head here Minus and want to get the first line of this file. That's the first line And we should do the same With the data ggl Good one We get the same thing. So it's the same file Indeed, we are inserting this file into a process, but we decided to rename it as simple fastq And you can also use a variable as a place for it to refer to it like we did many order times There's an exercise here again, uh, you can pause the video now to try to do it. Otherwise, I'm gonna open Three two one the solution So here the idea is to write the script that creates a channel containing all the read files matching this pattern Followed by a process that's going to concatenate them into a single file and print the first 10 lines So I create a channel which with this Path here from path is going to get all these first reads of these files got one one is one And I have this concatenate process that receives lots of files like everything that's in my input folder I want it there And the output is a file called top on the line 10 on the line lines I'm going to use the cat to get all the files in my current directory Which are the inputs and I'm going to put them in this concatenated txt file Then I'm going to use head to get the first 10 lines and I'm going to save in this file Because I have many files and I don't want to have concatenate being called Multiple times with one file because then there's not nothing to concatenate I'm using the collect channel up the collect channel operator To send all these files at once for one instance of the concatenate process Which is a task and then because all the files are there. I want to concatenate them. So let's run this And I want to show you what's inside the test directory, which is all these input files So let's do a tree here And you see good one lever one lung one in the same task folder because it was a single task with all these input files So here we have two input files two channels one two three and abc See here that they're calling this full process with these two channels And indeed you can pass many channels to the same process Here we say it's going to be a value and I'm going to refer to this value as x and here the same thing But I'm going to refer to this value as y and the script basically prints to the screen x and y and we have here One and two two and three and so on The output is not exactly what was there But it's what you can guess From the script one and two one and a two and b three and c We already saw that so what happens when channels do not have the same cardinality One two, but four letters. We saw this already a and one in a and two and b and it stops But if we make them Value channel then it's going to be to be called multiple times one and a one and b one and c Good, just a reminder of something that we learned at the beginning of the channel section today So here another exercise So it's asking you to write a process that is executed for each read file matching the pattern and use the same Uh transcriptom.fa in each execution. So it's a very simple problem Just to remind you that you should use Value channel so that this transcriptom file can be used multiple times for all these different files I'm going to open a solution in three two one So the secret here Is passing this as a file that will be automatically converted into a value channel Or you could maybe create a channel which channel value or use the first channel operator Which returns one value and then it's going to be a value channel The goal is to understand that you need value channels here when you want to use the same file multiple times for different elements Of another channel which here is the read on the line ch channel That is created We're using the front path channel factory one second for some water One very interesting thing you can do is that each qualifier So maybe you want to do you want to do some repetition? So this example is very nice. You have this tfa files and let's say you have three methods regular expresso and psychoffee So you could have a line sequences process that will get some sequence file and you want to Do to this same file Three times with different modes. I'm going to run this command here With dash mode and the mode which is regular expresso psychoffee for the same file So you want to repeat this process three times just changing the mode. That's why you have the each mode So this is the repeat qualifier. It can be very useful in some situations Here you have a nice exercise to practice that so you probably know salmon and callisto They are very famous pseudo aligners And what it does here is ask you to write the process that we execute for each read in this path here matching this pattern Which is good one lever one and lung one but repeat the same task with both salmon and callisto So i'm going to open the solution in three two one So the sequence is here each mode and we have salmon and callisto So salmon read script on and so on So this command is not going to work because the the actually the command line of someone qualifies is different But you could have a conditional script here that you already saw saying if the mode is callisto do this If the mode is salmon do this otherwise throw an illegal exception Now let's have a look at the output block. So we already saw we need an output qualifier tuppo path Whatever and name for the output It could be the name of the file that we want to say for example And one thing that we haven't mentioned yet, but it's very useful is to use emit to give a nice name to this output. So instead There's gonna be some nice examples for that like sometimes you can have multiple output channels And you would say, okay. I want to see the first one the second one It would be better to have names for that and the emit very useful Use the dot out and the name of the of the emit that you chose. So soon we're gonna have some examples Let's keep it in mind for now that the emit is very useful So here for example You have a value x that's going to be received as input a value x which is the output and you want to Start this x value to a file The thing is the file itself is not really what's going to be what for the for the output, right? So if they are files, then great. You can have the path in the name of the file That's good You can also have multiple output files So here for example, I say the name of the file is chunk underlying something So in the in the first example that we had today, we had this Multiple different output files Sharing the beginning with this chunk underlying and I want all of them to be in the output channel of this process Here we are using the flat map operator I won't get into details about it now because soon we will be in the operator section and then I would talk more about it Here we have an exercise which is remove the flat map operator here and see the output change It's not willing to do something but more to observe what's going to happen One interesting thing is that we can also have dynamic output filenames So here for example, we have this process called a line Which receives a value that I'm going to refer to it as x and another value which I'm going to refer to it as seek In the output Of this process is the is the name of x probably is the name of a symbol for example So gut I don't know dot a lm So you can have dynamic output filenames, but pay attention that we are using here double quote to resolve This dollar sign here. We say this is a variable, right? So let's do a test here So it's clear to you Let's get inside the test folder So we have cat a lm sloth a lm dog a lm Which are the species here, right? We call species channel as the first input channel is x So we have cat dog and sloth dot a lm So here how to create this output file here what output files to watch To add to the output channel of this process You can also have tuples which are composite inputs and outputs So here this full process is expecting a single input channel as you can see here is only one argument one input channel But this input channel Consists of a value, which is a sample ID. I'm calling sample ID in a path. Which are the paths Maybe this it can sound a bit weird, but we already seen that with the from file pairs Channel factory you saw that this is the structure it creates An element which has a sample ID in the list of files. And that's what we have here So here it asked you to modify this script of the previous exercise so that the bump file is named as the given simple ID I'm going to show the solution in three two one and you have here The simple ID being in the name of the output file Here we have sample bin for everyone usually, this is not the problem because Knowing that every task is isolated in the test directory. It doesn't matter if they have the same name But if you're passing this file around It can get into a situation where you have all these files in the same folder And then you are going to have a conflict because they all have the same name So it's not very smart to have names like sample.bam Because you can you can run into you can run into trouble very quickly The wind block we briefly mentioned it about about it already, but here you can see you know when Verify this condition here So if the name and then you have this expression here or the type is nr And there's if you and then if you work or not, right? So we can copy this base here run this pipeline And we got the result because the type is nr and the faster name respects Some pattern, but if we have an array for example as a db type It will fail this condition and then this process won't create a task. So here we saw there was a task Is it a test folder, right? If we run it again, the db type now is not an r There is no process No task, right? No task was issued from this process Because the condition was not met the when Right when it comes to directives, it's the first ones at the beginning of the process body We saw this yesterday in multiple different sections So here we are requesting two cpus one gigabyte of ram and to use this container image here for this process So nothing really new Here we have a few process directives We saw many yesterday and if you go to the next flow of documentation to Processes Directives you have all these directives here I saw I showed you yesterday a few of them But as you can see there are many and you should definitely have a look Including this part here about dynamic directives dynamic computing resources and dynamic retry with backoff. This is very very nice So you can also organize your outputs. That's very nice using a Directive called published here So when you run your pipeline and you have your 10 steps, for example It's pretty clear already that every task will be run in a test folder isolated for everything else And you have it stored in a crazy path like this one here. It's called a lm It's cat dot a lm. It's kind of I was hidden in this path, right? So you don't want that you some of these output files You don't care because they're just intermediate steps intermediate files for your analysis But at some point you want some results from your analysis to be organized in the place So the published here directive allows you to choose a place to organize your outputs so that you can look at your results more easily So here we have one example I'm using published here for this process And I'm saying that I want it to be stored in this folder Which is depending on the output parameter that I provided or the default one which is my dash results In the mode is copy because you can create links Shortcuts from the task folders to the published here to the directory where you publish your results or you can Copy them so that you can delete your work directory and you still you will still have your files This can get very complex actually you can make it very very complex This publish directory is important to say that it can be local or remote So you can use s3 buckets for example to to show your files, right? So here we have A bit more a bit more complex usage of published here. I'm saying, you know In this full process All the files ending in dot fq. I want to store in this path here All the files ending in whatever Underline comes with txt This path and all files ending in underlying outlook.txt in this path So you can have multiple published t-directives for the same process And again, it's a per process usage So if you have one pipeline to 10 steps And you want the results of the third and the fifth set to be stored somewhere You put your policy directive for this third process and for the other one You can put in the in the process block like here Or you could put in the configuration files like we saw yesterday with the process scope with name or with label for the process, right And with that we get to the end of our process Section I insist that you check the documentation official documentation Because there are some directives that are very very interesting here Including these three uses here in the end. This is a very nice thing They're relatively complex, but amazing things you can do with dynamic usage of directives And there are also other things that you can do with processes like stubs. This is very very nice And but again, it's a foundational training. We cannot cover everything So in the end a few things have to be left out so that you can focus on the Things that allow you to start using next flow as soon as possible, right? So There's stewardess managing semantics of directories Yeah, that's what I just showed all these patterns and stuff. Okay with that we go to the next section, which is Next-flow channel operators So these operators just like functions in channel factories. They are functions, but they are special ones There are special functions that apply to channels to next-flow channels If you go to the next flow official documentation and you go to the operation Operators, you're going to see a very long list and again I recommend you to have a look at each of these operators so that you you are aware of what you can do with them Closures with operators is one of the best things about next flow. It allows you to do amazing thing with your processes and with your channels So they can be organized in some categories like filtering operators Transforming operators splitting operators among other things Here we won't show all of them, but we will show a few of them that are very interesting So I like this example because it allows you to to see how closures and the map Channel operator work the map is one of the most used channel operators. It's very very useful And I'm going to show you a few examples here So the first thing that we do remembering that you can click on the plus and get a description of what's happening in the source code We are creating a channel of four numbers. So every number here is one element Later, I get this channel that I call nums and I'm going to apply the map channel operator to do something I'm gonna Multiply every number to itself. I'm going to I'm going to square them, right? I'm going to start this to a channel named square you can ask what's the it It's like an iterate, right? dash bigger than here four so Earlier I replied to a question saying that view Is the same thing as view it, right? It is the same thing as view it this But actually this format here of using this dash and bigger than is when you want to use a different name But they are the same thing you can just do what I did here and it's fine. You can also ignore this part So let's copy this Go to our example at an f and I'm going to remove this it here Because it's not required. It's the same result I'm going to have two four six Oh, sorry two four nine and sixteen because we are squaring everyone one times one two times two three times three four times four And this figure here shows us what happens. I have this channel with the elements and every element is going to be doubled Sorry, every element is going to be multiplied by itself Taking the square here uh We can also instead of doing like creating this all these variables We can use the dot to chain all of them together, right? It's more beautiful And as I said earlier, I like to use these spaces to make it clear So I'm going to clear the channel These elements and then I'm going to map them and view them. I like to separate in lines with the spacing So it's really clear as soon as you put your eyes on the code. It's clear to you. What's what's happening here As opposed to do something like this Which some people do sometimes and in my opinion It's so bad Like you look at the code and you have to be very careful to understand what's going on What's happening first? and so on So let's look at the basic operators the view operator We saw a few times already actually a lot of times It's going to consume every element in the channel and then we have it here We can use a closure to change the way it's going to be represented in the end We saw there's a few times already the map one we can do something to every element This is actually very nice because you can even change the structure so If I do this I'm going to have one two three four but what if I want to repeat the number I could have something like The number and the number again And now I have still four elements, but every element has two items Which isn't any repeated and maybe I could do something like Multiplying by two and here multiplying by itself So whenever you want to organize the way your your your items are in the channel element the map operator is very useful Sometimes for example, you have things like lung some info and some other info for example So it's a one element which is a list with the first item which is lung and the second item which is a list of Informations so I can say that I have this Sample ID at the beginning again I can take I can give any name I want before this dash bigger than sample ID and list of info And I just want list of info So by doing that I'm changing I'm not only changing the structure I'm getting rid of part of the items in the channel So I don't want this sample ID something missing here Sorry, there's another bracket here. So closing this and closing the other one So I have some info and some other info the lung disappeared. So this is the original structure Something comma something that's what we have here, but I just want the second part Or maybe I just want the first item of the second part Let's run this again Or maybe I want to change the order instead of having simple ID and list of info. I want to have a list of info sample ID So as you can see the map operator is very very powerful You can do a lot of things with it if you want to to make things uppercase you can do So Sample ID uppercase I think I don't remember the function name. It's uh two uppercase. I think maybe let's use reverse here Which is easier. I want to pre first this and list of info reverse So when we run this We're going to have lung reversed oops And here I want the first one to be reversed and let's say that the the second one I don't want to be reversed So nothing happened because actually what happened here is that because the second item is a list It reversed the order of the element instead of being some info and some other info It gave me some other info in some info, but that's not what I want So now I'm applying reverse the specific items inside the list Then when I run it again, I'm going to have lung and some info reverse some other info reversed Some things missing here Yeah a bracket So we have some other info reverse some info reverse in lung reverse So the map operator, I can stop repeating this. It's a very important Channel operator if you master how it works, it's going to do magic for you So here, for example, I have a channel with two elements hello and world. So just For the sake of clarity, maybe it's not clear to you why I have two elements here and here In my example, I have This being One element. So why is this one element and the other one is two elements? So let's copy this and compare line by line and you see the difference So you see we have a bracket here making everything one This and this If we remove the bracket, it becomes two elements. Okay So let's go back. We have two elements here, which is hello and world And I want to use map so that every element I'm going to refer to that to it as word I want to make it a list with word and the number of characters of that string And then I'm going to use view with another closure, which is going to receive every element as port and lan saying word contains lan letters This is a very nice usage of closures So hello contain five letters and world contain five letters I would advise you to pause the video for a few seconds now and read All these four lines to make sure it's clear to you what's happening here And then at the end we have this exercise asking you to use from path to create a channel emitting the fast queue files matching the pattern data to gl star dot fq Then use map to return a pair containing the file name and the path itself And finally use view to bring the resulting channel This is not really straightforward because you have to know something that we haven't Showed before but if you want to try pause the video and I'm going to show the solution in three two one So this dot name hasn't been shown before but you could try to guess This if we have a file which is a file and indeed it is as we know here because it's a from path Channel factory if you do the dot name it gives you the name of the file as a string You could also say file dot two string Open and close parentheses, but this is another way to do that Another nice channel operator is the mix one So the mix operator combines the items emitted by two or more channels into a single channel So here we have a channel with one two and three Another channel with a and b in a third channel with c By using channel one mix channel two comma three comma and so on until all the channels that you want to mix Are listed It's going to mix everything and with the view you see its content So the mix channel is not used so often because usually you won't have such simple One items elements right in your channel you have like pairs of reads or metadata and so on but if you just have Simple channels like this and you want to get them together into one channel you're going to use the mix channel operator One warning is that you know the order is not guaranteed. So we have here one two three a b and actually we have here one two a three b and z The flatten operator we saw it yesterday So here we have channel of two elements because there are two lists Every element has three items But then we're going to use flatten to flatten everything into six elements and that's what we see here when we view The collect channel operator we also saw it a lot already Here we have four elements, which is one two three and four But when we use collect we turn this into Value channel with four items in a single element and here is what we see The group topo Is reasonably used Often I would say here we have a channel with a few elements Every element has two items. So one a one b to c and so on You could think of the first item as a key And by using group topo if you group these elements according to a key If you don't say anything it will be the first item. So here we have a b and C here with one as a key and then we have this c And this a with two as a key and we only have d with a key three. So that's what we have instead of This three is the seven elements. We have three We have now one element Sorry instead of the seven elements, we only have three that are grouped Based on these keys one two and three and we have now a b and c which are the values For this key one that we saw here the values for this key two that we saw here And the values for the two three. So actually there's a b three also b and d. They have the three That's what we have here There's a nice exercise here So use from path to create a channel and meeting all of the files in the folder data slash beta slash Then use map to associate the base name prefix to each file Which could be seen as a key and in the end group all these files that have the same common prefix I'm gonna you could stop the video now. I'm going to show the solution in three two and one So channel from path Use this path to create my channel where every element is one of the files in this folder I only have a file, right? But we already got the idea of the name or base name So I'm going to use map to create a tuple. I could use The brackets with the base name and the path and then I'm gonna group tuple base on this file name So let's copy this and paste it here because I think it's a nice example to show how it works So the path is data and meta We have lots of things here So for patient one, we only have one file for bench it like two one file But if you go to regions, you have regions tsv jason yamel And that's what we have here jason tsv and yamel Regions two we have yamel and jason and that's what we have here. Oh, just jason y Regions. Oh, sorry. It's just jason. I read means read this So jason we only have regions two we only have jason. That's what we have here In the view in the end we have this closure here to format how it's being shown The joint operator is very interesting also It creates a channel that joins together the items emitted by two channels with a matching key So here for example, we have x One x four you have y two y five and z three z six and p only seven And then when you do this join of left and right, that's what you get So p is missing in the final result because there is no p in this second channel So this join operator can be tricky if it's not really clear to you what it is doing So be careful when you when you use that The branch operator is very very nice. I really like it to be honest So the idea is that you have one channel and you want to create Other channels based on the elements in this channel taking into consideration some conditions So this example is going to make it clear for you I have a channel with numbers one two three 40 and 50 and i'm going to create two channels One of them. It's going to be called small In every element so the the it here right every element which is smaller than 10 Is going to be in this new channel small and everything which is bigger than 10 will be in this new channel called large And i'm going to use a set operator to give a name I could either the result equal and everything or dot set and the name at the end I like I personally like this format because we it's like the way you think right I have a channel with this value separate this way and store this variable So now result is a multi channel variable. It's not a single channel So in order to pick the channel you want to use you have to use the dot So result result dot small or result dot large And then you can use your channel operators like view and then we can say like one is small two is small Three is small 40 is large 50 is large That's the output if you run this next low script Not in the order but still one is small two is small three is small 40 is large 50 is large So one nice example of using this branch operator. I was using it a few days ago with a colleague So we had this channel with parent sequences and single and sequences And we wanted to do something with the parent sequences So we use branch to separate our channel into two channels one with parent sequences Parent reads and the other one with single and reads We did something with the parent ones and afterwards we mixed them again with the single ends right If you go to the next low documentation again, you're gonna have a huge amount of Details about all these operators that I show here right We have new operators appearing quite often not quite often But I would say every year we have a few new channel operators. So it's worth to keep an eye Like the split jason isn't is one that was new is was a contribution by a community member and Among others. So it's something to worth keeping an eye You have buffer you have take you have many many channel operators. It's very useful to to to have a look With that we end the second part the second the third section Of our training today and now we go to the latest one, which is the simple rc workflow So actually my bad the next section is Cache and resume then the next one is the rc one So we played a bit already with the cache Future the resume future before By adding the dash resume to the command line like next low run Example dot f dash resume We didn't brand this in the end. So there is no cache. They will run it twice to remind you what happens Okay Oh, but actually here. There's no file being created. So there will be no cache. Okay, this is not a good example But we could get from the introduction one The example we had yesterday Okay, that's the hello file. It's already there my bad So let's run the next flow run Hello, let's open it It's the one yesterday splitting ladders converting to upper and so on We have world. Hello. If you run it again with the dash resume It will take advantage of the cache of this task having been computed already And as you can see here It cached one task of this process and two tasks of this process. Great. Nothing was computed in the end Uh, however, it's important to know how this mechanism behaves, right? How it works So we know there is a hash for every task and indeed we have 128 bit hash value for every task as a unique id In texting to consideration the test input values the files and the command string By the script block that we write what the test does We have this example here with the three The the file structures As we saw before right some files and this is what's being in the end used to measure if the if the Inputs that we're providing now are the ones that were used before to get the answer So in the end it's relatively complex, but it does it checks for an exit exit code So yesterday we saw Dot command dot is dot command dot sh and dot command dot Run, but actually you also have other files like the exit code here and if you do Exit code you get zero which means that this task was run successfully So it checks if it was okay Then compares the input and the script block and so on and then it knows There's a cache here. We don't have to recompute it again One interesting thing is that all the time when you when we are burning these next little scripts I'm always saying let's look inside the work directory, right? Because the default is to name it work, but actually you could name it anything use the the dash w Option actually you can even do next little dash dash help and see a huge amount of options that oops Dash help right because that's dash is a pipeline parameter. One dash is next low one. So next low dash help dash H and you see a lot of different commands that next low has and these options And then you can also do next low run dash help dash help dash h And you're gonna see all the options of the run and one of them is the w that we are seeing now They work directly So if you are running in a cluster, for example, you may have a scratch directory where you want the files to be written there If you are on the cloud, maybe there's a specific A specific bucket that you want to store the intermediate files the work directory So you can decide this with the minus w dash w option It was there in the last modified time stamp If you really want to understand how It works the caching I advise you to go to the next flow on your website to the blog and look for resume And you see that there's a blog post by abinav, which is a great blog post analyzing the caching behavior of pipelines But not only that you have other Blog posts like troubleshooting the next flow resume And this means to find next flow resume. So this blog post is to get into more detail about how the resume feature works Which is not really the purpose here One nice thing also is that there is a command called next log. I'm going to increase here the terminal so we can see more things The next law command is going to show you all the runs as you see we played a lot already We ran a lot of examples at NF and there are many things we said we did So we have a timestamp. We have the duration of the pipeline We have the run name, which is always a random Adjective in a random scientist last name unless you provide in a run name with the command line option You can do that If the pipeline ended successfully with an okay or with an error The revision id that we saw at the beginning, which is like the id of the pipeline script There's also session id in the command that you ran You can choose the name of the run name to resume Like this, okay One interesting thing also is that let's get this, you know, this This scruffy Jones running and I'm going to next flow log with the name its name And by doing that, it's going to show me all the paths in the work directory of tasks of this run And here in this case, they have three. So you have more in this example You can also choose Columns from the From the that list that we saw when we when we typed next log And we're going to see the process the exit they had in a duration, right? So this hash is for the split layer process This has for the converter upper and this one for the converter upper process, right? The exit codes were zero You can do next for log Minus l to get a list of all the possible fields you can choose as you can see There's a lot of fields you can do the container for example And in this case, we won't see much because we didn't use any container, right? Oh, no, we had the next flow.config it contained this container. So it was run with these containers. Okay, cool Hmm, you can also specify like a filtering criteria here. I want only the I want to see all the test folders Of the run tiny format But only of those That the process was fast qc If we type this here with the scruffy jones, there will be nothing because I think We didn't have any process with fast qc for this run name. So this scruffy jones was like you missing here Oops So nothing, right? And the minus t allows you to provide a template for a provenance report. It's very nice So here we have this html code with some placeholders for the variables And then I can use this template. I save this as simply the html And I can provide This guy's you like log the run name the minus t for the template and the html File in the end that I want to use and all this container I've worked these status. They are these fields That we listed here with the minus l So work here status exit and so on So let's do this and see what happens Of course, we don't want this. We want the scruffy jones Okay, but we didn't create the template html. Let me do that Save this I copy pasted the html template saved in this file And then I'm going to run again my command Still like this, but it's wrong. It wasn't in plate Now I'm going to download to my computer and I'm going to Open this So we have this script block We have the exit the status the work directory the container for the split letters process For the convert to upper one is this one the other is this one One is this one. The other is this one. So you'll remember the chunk a a chunk a b that we had Yesterday so you can have this very nice problem reports by using a template that you created Based on the values that you got with the next flow log minus l You can get a he the process the amount of memory the peak memory and so on So for zoom trouble shooting you're going to see these links that I showed before in the next flow blog They are very very nice, but there are some common situations Where you're going you're going to run into trouble With the resume so the resume won't kick in even though it should If you change your input files, let's say that you have this task that receives fast queue files and During the execution of your task you rename these files or you delete these files or you overwrite these files When you try to run your your pipeline with resume it will check that the input files are different And it will not be able to find the right cache and then it it will recompute everything in this task So you should never change your input files. You should always If you change something save to another file so that the cache system will understand what's going on here You can have some inconsistent file attributes because the timestamps matter And sometimes some file systems Such as nfs they will change the timestamps So you have some option to to use a linear cache strategy that will not look for timestamp for that Race condition is a bit more complex one But we still mention it here, but be aware that it's not so straightforward to understand this one The idea is that if you have a variable that is changed by two different Functions at the same time, maybe when one wants to check the value it was changed by the other one So this race condition with global variables can Break your your resume and the right way is to use the def Here so that we make these variables local. So this map won't Bother this other map here and then it won't break your Resume Non-derministic input channels is also a complicated stuff because If the input channel names change Randomly for some reason house next was supposed to know that the input is the one that was there before For this everything's the same, but the name is different. How is it supposed to know that even though the name is different? It's the thing that you want, right? So be careful with the this non-derministic input channels We have here like waiting randomly in the name The output is a test index which depends on the order of this elements being being called So again, this is a It could be a quite heavy and complex section because resume is not so straightforward If I remember if I remember correctly in the advanced training with rob signs at the end of the month You will see a lot of things about not changing input variable input file names and the resume feature So I think it's a better place there to get into more detail about this But what you have to know here is that resume feature is very useful for you to start from where your pipeline stopped for some reason Maybe you killed it. Maybe there was a power outage Maybe there was some error Then you go there you fix your error and you don't want your pipelines to start from scratch You wanted to start from where it stopped Also, there's something that I like a lot when I'm developing a new pipeline. So let's say I'm creating here a process full That's going to receive as input Val I'm going to call name The output is going to be a file called I'm going to do this name We use double quotes to resolve this variable Dot txt right this quick block is going to Write this this variable that's getting in this name into a file called name dot txt So I'm going to write my workflow block. I'm going to create a channel of a to F I want all the letters from a to f and then I'm going to call foo So the easy simple pipeline one step I'm going to run this And check if it worked I can check into the files, right? So let's see for example So a b c d e app six letters six dash seems to be okay. Let's check this one So code work Complete with tab. Let's look at the dot command dot sh Echo c in c dot txt makes sense seems to be right Okay, so now I want to go to the next and close the file explorer to get more space Now I'm going to go to the next process of my pipeline I want To print the content to the screen. So I'm going to use input a path Which is My file path. That's the one I'm going to use The output is going to be the standard output And the script block it's going to use cat. Which is the command line program To get whatever is inside my file path And then I'm going to forward the output of the full process to the bar process If I run this with next low run example at the left it's going to run everything again But if I already ran the first step, why should I do this again? So I'm going to use the dash resume As you can see the first six tasks. They were cached already So it worked but oh I can't see anything. Why I didn't put the book through here To see everything that's echoed, but I don't want to do that I should have actually used view here. Oops. I should have used view here to see what's inside So there was an error in my pipeline. I fixed the error now and I'm going to run with resume I don't want everything to be computed again Because I already worked on that so it's cached. That's great. I fixed my error I didn't have to wait anything because everything was cached But let's say that I want to do more things. I want to actually use something like content of This content and then it I'm gonna run again What's already computed? I don't have to compute again. So it will be cached Great content fd and so on But let's say that I want to add a third process in a fourth process in a fifth process and Let's consider that instead of taking a few seconds the full step The full process for the sex these six samples. It takes I don't know an hour So I just save an hour with the resume here and the next step, which is the bar It's going to take three hours. So now I just saved four hours And when you are writing a new pipeline you keep doing that you you do something in your task You change something in your task you add a new process and your task You saw you change you made something wrong. You fix a new task. There was an error you fix in your task You run your pipeline sometimes hundreds of times during development And without the resume feature it would be hell It would waste a lot of time just small change waiting an hour small change waiting an hour You go too bad because you can't work like that So the resume feature is not only useful for people Using extra pipelines, but writing natural pipelines. It's also extremely useful. It's very very nice With that, I think we end the catching resume We already mentioned the fair process directive a few times that allows you to Keep the order of the elements in the output channel based on the input channel regardless of how much took Because the resources will be offered to every element in a democratic way So the first one we will finish first the second one we will finish second and so on But then because it's not parallelizing the best way it can there's a decrease in performance So let's go now to our final session of today with this simple RNA-seq workflow Now let's start the last part Of our session today So this is probably one of the most important sections in this training because finally we will write Uh close to a real-life pipeline with next flow to perform To analysis of RNA-seq data So in order to demonstrate this real world scenario We will have a few steps in our pipeline. It will create an index for a reference transcriptome file It will perform some quality control in the samples. It will perform quantification And then it will create a modic uc report We will step by step together build this pipeline and these are the Multiple script files from the very beginning we have seen around this workspace. We're going to finally use them If you are If you are a biophomatician you probably know these tools, but in case you don't Salmone is a tool for quantifying molecules known as transcripts through a type of data called RNA-seq data If that's qc, we perform some quality control and multi qc Organize output from some programs to generate a nice report Uh for you So when we start playing with them, it will become more obvious. So it's a very short Uh overview of what they are for people who have never heard of them So they are programs that we're going to use the data that we have as example in this workspace to practice or Uh recently acquired next flow of knowledge, right? So the first thing that you do Uh as to of course organizing your mind what you want to do your pipeline the inputs output and so on And even though it's difficult to guess what their input will be for a process in the middle of your pipeline In the beginning it's kind of obvious. So we will need to say where the files are the reads their input files We will need to tell where the transcriptome file is the one we want to create an index out of it and A place to store the multi qc report So we created these three default parameters here We can override them with them with the dash dash read dash dash proscript file dash dash multi qc But if we run next flow without providing any of these options of these parameters, these are the default values This dollar sign project here Is a variable that reflect the directory from where you're running the pipeline So this file is already created for us. It's script one dot nf It has what you were discussing until now Let's just run this pipeline This next flow script and see what's going to happen. You can probably guess what's going to happen It's just going to print to the screen reads In the param reads, right? It's going to replace because we have double quotes It will resolve this variable and we have workspace git pod nf training data ggl good and so on okay But what if I provide a dash dash reads to override? Instead of gut, which is what we have here. What if I put lung? And that's what we're gonna have the lung So this is nothing new. We have learned how to play with params Uh, how to set default values how to override them with the command line with dash dash. Okay, nothing new The first exercise is let's modify this script one dot nf by adding a fourth parameter because we have three so far, right? Reads true script on file in multi qc So let's add a fourth parameter named out there And set it to a default path that will be used as the workflow output directory We want some important results from from our pipeline We want to organize them in the out here and when I talk about organizing outputs, you probably remember Of the published year process directive that we learned today Okay Let's see what we have here So i'm gonna open the result The answer for this exercise three two one And that's it. So actually it's quite obvious We just create a new parameter and we say Results as the path To the folder in the current folder. I want a folder called results and inside it. I will have my published results Another one is to modify another exercise to modify the script dot one script one dot nf To bring all the workflow parameters by using a single log info command as a multi-line string statement So now we have defined these four parameters, but only the first one is being printed, right? We want to print all of them, but I don't want to print with println I want to print using log info And you've never seen this command, but actually whenever you run next flow There's a next flow. There's a dot next load log file created Which is the log of the execution of your pipeline. So a lot of information are here When you use log info it not only is printed to your screen, but it's also printed to the log So if there's if there's information that you want the user to see and to be in the log you use log dot info And as modline as for a modline string, we've learned that actually if we use Three single quotes or three double quotes you can create a multi line string And that's what we do every time when we use the script log. There's one example here in the real pipeline So they are in a sequence. We are using log info the three double quotes and Some out some nice outputs with these variables to be replaced And in the end we have this function called the strip indent Which is to bring all this to the beginning of the of the shell right of the of the terminal Boundary, you know instead of having this indentation we use the indentation here to make it easy to read But we want to strip it so that it appears normally in our Screen So i'm going to show the results And that's what we have Ultra script on i'm going to show the file reads out here good So basically in this first part We learn how to define parameters in our workflow script Both the default values and how to pass parameters by using the command line to override the default values The use of door sign var and door sign with the curly braces variable place holders We've played with that in in in the past We know that if we want to apply a function we put inside it here Otherwise you put outside and this just a string right so we learned this already How to use multi-line strings Actually, we've been using them for a while in these quick blocks And how to use log that info to print information is save it in the log execution file So actually one thing we want to do is to play with that so that you see what's actually going on So we're going to create a new one here farms Out here Results and here we're going to replace with this So when we run this script one F We know what are these values because they are the default ones here And it appears a very nice Description of what you're using in the pipeline this is very useful because sometimes You will be used a pipeline written by someone else or someone else will use about will you we use a pipeline written by you And it's important to make it explicit what are the default values for some options and what the files are What files are being used so that you don't have to open multiple files and read this first code It's when you run a pipeline. It's right there for you And now when we look at the dot nettoe.log It's here. We have in the log We have This same information here in the info category. Let's say there's a debug. There are other ones So the log info will do this for you So now let's go to the next part of our pipeline So now I want to start writing processes right the steps So I will call this one index because I'm going to use the program somon with the index Command to create an index from this transcriptome reference file So that's what I do here. I create a process. I call it index The input is going to be path right and I'm going to call it transcriptome And the output is going to be a path in this case It's a folder called someone underline index And that's what the minus i someone index here is doing Saying that this is the folder where it that will contain the index file that this program is going to Create One nice thing here is that With someone you can tell how many threads you have available how many cpu cards you have available to To process this and here we are using this dollar sign test dot cpus Which is the value that we have here at the beginning for the process directive cpus But you don't see anything it's one, but if you had requested for cpu course Here is how you're going to tell someone to use for cpu course, right? We also need our workflow block We're calling the index process with this input Which is the programs that give you the path to the transcriptome reference file And we're going to store the output channel of this process In a channel name index underline ch Good So let's open script 2 that has this content everything we've done so far plus the process in here Let's copy this and run Or script number two Okay, there's a never here. It says someone command not found And it's clear why we don't have someone installed So If we look at the next one at config We have The process dot container as this one Which means that we can run dash with dash docker and say next load use docker And use the container image that i'm telling you in the next load config Which is next load slash rdc dash and f Good now it worked, but I want to show you something. So let's go to work qa 13 tab I want to look at the dot command dot sh So as you can see here the dash dash threads has won Because by default as I told you the test CPUs will have the value one But what if I do CPUs tree here And run this again I'm going to start using that resume because I changed here The the number of CPUs The The the cache won't kick in But then if I have a look Code work v5 07 You're going to see the three now instead of the one Right here what we place with the name of the file. Okay, let's go back to the regular one And if we want to run it again for some reason at all because we have the dash resume The cache is supposed to kick in And we won't recompute that good Docker is already enabled I mean, sorry if docker.enable was equal to true. We don't need to do the dash with docker So let's edit the next little config to make docker enabled true by default Because it's very annoying to having to have to write dash with docker Every time you want to run your pipeline So here we have one exercise which is to enable the docker execution by default by adding the above line To the next little config. We just did it So the next exercise is to print the output of the index underline ch channel by using the view operator So based on what we've done so far. This should be fairly simple rate Just to index underline ch view. Let's run this I'm gonna erase this dash with dash docker because you don't need it anymore Good We have the path to the somal index folder If we do if we use tree here, we're going to see lots of files are created by somal Now it's something we already did also So if you have more cpu's available try changing your script to request more resources for this process For example, see the directive docs. So we changed this to three and we saw that in the end Uh in the command at this age, we had a different value We also use the tree here to see the structure inside the the folder Good So the summary of this part is that we were we were able to define a process executing a custom command with somal program We saw how process inputs are declared How process outputs are declared how to print the content of a channel with the view operator And how to access and change the number of available cpu's Actually the right way to say that is the amount of cpu's that you requested, right? Next part is to collect read files by pairs. We also saw we already saw this also. So let's open this script three now Which is a much simpler version Which basically is using the channel factory from file pairs with the param reads that we already saw earlier that Data ggl it has got one got two leave one liver two So we're gonna have gut and the list of files liver and the list of files and so on That's why they what the from file pairs channel factory does We can add here a view to see what's going to happen so this script three Good we have gut And the list of reads right here. We only passing gut as the reads as the default As the path the reads we could overwrite that to get all good labor and so on but here by default So it's quicker the first times we are trying this Let's keep only gut and at the very end of this session today, we will do with all their samples Here we could do that all the samples with one and two, but it's just showing the reads So I think it's okay. Let's do it because so cheap in terms of computation We just you're just Listing the files. We're not aligning or anything. So it's not It won't take long so good liver and lung Good. Yeah, that's what we will do So the exercise here is to instead of using what we've been doing so far Like index underline ch equal blah blah instead of using this equal use the set operator And we also saw an example using a few minutes ago Just put the bottom so create a channel from file pairs this set to this being the name of the new channel One interesting thing is to use a check if exists option in the from file pairs channel factory This this option also exists for For other channel factories like from path. We've seen this and this exercise is very simple I will pause if you want to try to do it, but I'm going to show the result now So you just add check if it exists true This means that if this path doesn't exist It won't file It won't fail silently if you complain that the file or the path doesn't exist So the summary for this very short Part of our pipeline is how to use the from file pairs to handle read warfare pair files How to use the check if exists option to check for the existence of input files And how to use the set operator to define a new channel variable After everything that we saw yesterday and we are seeing today You're probably seeing that now the rhythm is is much slower, right? Every step we are doing with a lot of care We already saw all these channel factories the channel operators the options How to create channels we saw all of this and now we are very slowly building block by block or pipeline Okay, so the next this process now is the quantification process that will perform the expression quantification, right? So let's go to script number four Which has everything we've done so far the definition of the of the parameters the log info the index process The workflow block, but now we have a new block Which is a new process actually, which is the quantification one, right? So It has as input the path to the sum on index built From the reference for script on we have a tuple that has the sample ID and the list of reads We've built this we from file pairs channel factory and the output is going to be It's double code. So it's we're going to resolve this variable, which is the name of the sample, right? So it's going to be a folder with this name And the command is a sample program with the quant command Same thing with the cpus Some parameters that are specific to this program. It could be an r script. It could be a python anything The output we're using this the this path for the index And the output is going to be a folder with this name, which is sample ID We have double quotes here double quotes here. So these these dollar sign variables. They will be replaced The reads as we know, it's a list So here we are doing with the brackets zero and one to get the first item of the list and the second item of the list In the workflow part, we are using the quantificate. We are calling the quantification process We are providing the output of the previous process, which was the index We are providing a new input channel Which is read pairs ch that we built using the from file pairs channel factory And we want the output of this process to be stored in a variable called quant underline ch So let's run this with brazium because we we don't want to recompute everything But we changed the script files different input files. So the cache didn't Kicking in this time But if you write it again Using now this expression that's going to run with gut, liver and lung, you will see that gut will be cached So it won't be recomputed again. So as you can see here One was in cached for index and one was cached for the quantification because we already did it before For exercise it asks you to add a tag directive to the quantification process To provide a more readable execution log. So here we have quantification, but we don't know Who is this guy? Even if we run this with unsee log false so that we have one test per line We don't know who is three who is one who is two But we could use a tag The slow way to do it the right way will go to the next low documentation, right? Oops not here Yeah, let's go docs next low you would go here and you would go for process Directive and you look for tag and here you have all the information So basically what we're going to do here is to add this tag Usually we have an empty line separating The blocks in the process right so tag let's do simple 80 And the nice thing about this is that when we when we run this again There will be a string close to the test to the process name Saying what is that so it's lung, gut and liver So if I want to see the lung files now, I know this is the test that we actually have to check And that's the the result they have here actually the they would say someone on and gut liver lung It's not much different Another exercise is to add a published year directive to the quantification process to store the process result in the directory of your choice So you can pause the video if you want I will show the result in three two Well So basically you just add published here at the top of the process block You say the path where you want to store in the mode which is copy. I want to copy the files. So let's do this Let's group four I'm going to add the quantification It's a process directive so I can put it here The params dot out there is results good mode copy As you can see here, we don't have a results folder, right? There's no results folder But when we run this A results folder will be created Here if you open it you see a folder for gut A folder for liver a folder for lung But these files here inside they are the results of the quantification process. You may ask, where is the index? We don't care We didn't put the post here here so During the execution of your pipeline Sometimes thousands and thousands of files will be created and you don't really care about all these files because a lot of files They are intermediate files. You only care about the end result of your analysis So here in this case, we are saying it's quantification. So we put the publisher directive here So the output of this process will be stored in results But all the files from index they will just stay in the work directory. They won't go to our to our results folder So the summary of this part is how to connect two processes together by using the channel declarations We created the index process. We stored the output channel to index underline ch Then we called quantification using it and stored it to quantity to quantize underline ch So we learned how to connect processes using channels We learned how to resume the script execution and skip cache steps. So we saw us At the whole time we were playing with one pair of samples And at some point we wanted to use all of them, but the ones that we had used before they were in cache. So What i'm trying to say here is that You can have partial cache for the same process For samples that have been run before and not for the new samples, right? We learned how to use the tag directive to make it easier to read the logs And how to use the publish tier directive to store a process results in a path of your choice So you always want to use publish tier Maybe just a few of the files that you want to to publish to a results directory But you always want to do that because having to Find the files in this hash directory is not really practical The next part is to do some quality control. So let's open the script file if not enough So begin is the same index process quantification process. Now we have the fast qc process You see it's already using the tag here. So the log shows us Let's say some there was an issue with a sample and there was an error With the tag it's easy to see what was the sample being used when they ever occurred for the task. Okay We are going to receive Just the tupper with the sample id in the reads So it's what comes from the from file pairs channel factory and the output is a folder Whose name that is a folder with the name fast qc underline the sample id name here And underline logs. So we have the double quotes this variable will be resolved if we had single quotes It will be exactly that name and probably we would have some issue So that's not what we want. We want this to be the content of the variable So we use a dollar sign and the double quotes We're going to create a folder here And we're going to use the fast qc program to do the quality control and start the results to this folder Good in the workflow block. We have this new line here Which is calling the fast qc process with the read pairs channel and starting the result in this fast qc underline ch So let's run this fifth script We'd resume so we don't have to recompute everything we have already computed So cash cash the fast qc is a new thing. So there's no cash for that Good So this file that we have here there are pieces of original files So If you want to have a look at the results of this analysis, they don't really make sense because they are kind of fake data or Manipulated data. So that's very quick here to to to learn next to to learn next flow This fast qc it would take much longer The quantification part would take much longer the index would take longer So for real files, it would take much longer and the goal here is not to teach By informatics, but next flow and writing pipelines So we we changed a bit the data so that it's quick to run and quick to to to teach you how to do these things So now let's go for the 6th script We have a new process called mode qc We have the published here here because we want the report to be published It would take everything in the test folder. So the star Everything and the result is going to be a file named multi qc and the line report dot html And the command is just multi qc dot. It's going to take all the files in the current directory identify what tools are That generated them and write a report for you So the change that now we have this multi qc call here, but look what we have We have the quant underline ch channel Which is the output of the quantification process and we are going to mix it with the output from the fast qc process And when we have this channel with a lot of elements We're going to use the collect to turn this into a value channel, which means we're going to have a single element a lot of files When we do that It means that we are calling the mode qc and we're going to move We're going to create a short group of all these files in a task folder of mode qc And with that we're going to run a multi qc command to generate our report So let's run this and i'm going to also investigate with you the The content of the folder of mode qc So we're going to override the reads parameter to use all the six samples it repairs So we had some cache for the fast qc for one of them, but not all And it's done. So we're going to find the results the mode qc report. Let's download it Okay, and that's the multi qc report Because you can see we have the percent of the peter were aligned That duplicates the percent of gc or a thing to fragment length distribution Uh, a lot of things. So it's a we just use the few Softwares and still we get lots of charts about our analysis So if you go to the website of mode qc You see that it supports literally 100 over 100 supported tools Which means that if you use any of these tools You will have a nice report built by mode qc Let's get into the mode qc task directory so three work And you see that All the output files there's a link here the folder for the quantification of the gut liver lung The fast qc of good liver lung All the files are here and based on that you can view the mode qc report So the summary for this section is that you learn how to collect many outputs to a single input with the collect operator That's what we are doing here, right? How to mix these channels together And how to chain two or more upgrades together because we are using mix here We are using collect and we are passing all that as a single channel to mode qc as input One nice thing with next flow is that you can also handle some events. So let's look at the script 7.f It has everything we've done so far But it has this new block Called workflow dot on complete So basically what it does is whenever the workflow is completed Next flow, we will see what's inside this block and here it's going to use log dot info To check if the workflow Ended with the success. So workflow dot success. We return true It's going to say to log info done open the following report in your browser And it's going to give you the path to this model and then a new line Otherwise, it's going to say oops something went wrong. So this expression like a ternary expression We have like if this is true do this otherwise do the thing after the column If we run this script 7 for all the samples I believe everything is going to be cached It could just say hello don't everything with according to plan. So open the following report Okay, everything cached My mode qc has only one task If it's getting six samples because we use collect right to make everything into one element with multiple items You can also get email notifications. You can configure some smtp server to send an email so that When your pipeline is is finished, you get an email saying your pipeline is finished with success Or there was a failure or something like this You can also play with some custom scripts. So at some point here We were using fast qc and basically we Had just like a mkdir and the fast qc command so two lines But sometimes your script block can get very very long and you don't want to have in your workflow script A lot of content that is not really in the logic of the pipeline. So one thing you can do Is to create a script. So let's do that. I'm going to create this fast qc.sh is a shell script, right? I'm going to put this content inside Oops The first argument is going to be sample ID. The second argument is going to be reads. I'm going to create the folder. I'm going to run the fast qc Program just like I did before but now in the Script seven file I will replace This this two lines here. I'm going to just say, you know Run this fast qc program With this two information here Of course, there are a few things we have to change. The first thing is that you have to make use x mode Change mode plus x to the file to make it acceptable that you can execute it You have to create a folder called beam and move your program inside It's the same thing if you have an r script a python script another program that you have It's not heavy as a script. You don't have to install it as a custom one You put everything into the beam folder because by doing so when you run your tasks next flow We will automatically stage that script those scripts that are there For your task to be able to use it and that's what we're doing here But probably in beam when we say fast qc.sh the task will know where to find the script because next flow Staged it so we can do it like this and it's going to work The detail here is that because we change the script block. There's no cache anymore So it will recompute this part In the mode qc also because it depends on the fast qc and the fast qc changed So whenever you have a difference in the process the ones that depend on it We also have to be recomputed because what this computes changed, right? So the summary for this part is how to write or use existing custom scripts in your next flow workflow And then how to avoid the use of absolute paths by having your scripts in the beam folder So you can ask morsel Why do I have to use konda or container to use somon but to use the script that just put it in the beam folder So usually programs they have different versions. They are being updated. They have some complex configurations They have dependencies So some programs are very complex to to manage and then it's better to use konda or a container to to organize all these things For custom scripts that you wrote by hand like a large script or something or they're not a lot of dependencies You can just use the scripts in the beam folder and it would be much easier Also, you could have this script in any other place in your computer But if you have the absolute path if by accident or for any other reason you move this script on the place To break your pipeline if I try to use your pipeline It won't work because maybe your path is slash home slash john slash something in my case It's not john is morsel So using absolute paths is a very Bad thing you shouldn't do that and by putting everything to the beam folder in the project The project directory of your pipeline makes it work in any computer. So that's the preferred way to do it There's just something we need to do Let's use this dash with report and dash with trace dash with timeline dash with dag Next low command is going to generate a lot of nice things for us. So the same command we were using before Actually now we're going to use the rnac dash nf Next low pipeline, which is similar to the one we built But there are some extra things If we have to stick the version, let's say Okay, it worked. So we have the tag here. So you can see index is this one fast you see Fast you see So we are supposed to get a few extra files Okay, it's over. So you're supposed to get A timeline. So let's download this file and see what it does Trace file We can see here It has some information about every task the g the name the exit how much resources it used but it's a very Raw file, let's say we have a report also. Let's download that and we have a tag So that's also the leg that we can see here. So every node the channel is going around There are you can also generate a better tag But let's stop with this here, but you can generate with mermaid Interactive tags in a lot of different things. Let's see the report Here the report and the timeline So this is the timeline it shows you For every process how long it took to stage the files, right? The gray part and the blue part is where the processing was really happening For this one for this one. So you see there's some parallelization here between fast qc and index And also with the quant and fast qc in the end what you see had to wait until everything stopped And it took a while to stage the output file. So this is the timeline it shows during the time how the tasks Took place And in here we have the report with some information about the pipeline We have some plots here This is as I said like the files were made in a way to make it really quickly. So we don't have a nice plot, but we know that there will be some Some plots here on cp usage the percent allocated memory job duration input and output read and write All these tasks here with information about them The deep memory the duration So it's a nice report that you can have for every next low pipeline that you run Hmm Another thing that we can do is to run the next low pipeline straight from github So every time here we've been doing next low run Some local in that file, but actually you can use something like I think that's the name of the pipeline. Let's see Yes So by doing next low run and we rather than this slash hello to Because I didn't say where it is by default. It's going to check on github And indeed it was on github slash my name is space and the name of the repository If I had plot before githlab big bucket whatever it would run it would look for it there Same way that when we don't say anything it's going to look for the container in the docker hub container registry But you can also say quay.io and it's going to look in quay.io So It clones the report it pulls the repository it organizes all the files it runs for you the script and and you can see everything here, right? You can also get some information about some pipeline. So next low info next low Actually, oh, are they sick enough? You can get some info about this pipeline by typing this command So here we have to choose a revision Actually, I mean it shows the revisions. What's the default one which is monster? It has some description who is the author the description of the pipeline the main script file The local path where it's stored. So when we do the next low pool or when you do the run we do remote Uh pipeline that we store it in dot next flow slash assets in your home directory You can also choose a specific revision Of the pipeline we are using master break would have chosen v2.1. For example, it's a branch It would work just fine There are other resources that you can use to find more things about next flow and so on the next For documentation is always a great place to go. It's next low patterns. It's very nice So the next low patterns basically, okay, let's be The next low patterns this repository you have indeed a website which are Common things people ask that are not so straightforward and you have here how to do them So what if I want to process all outputs all together? So clicking here? You have this problem solution and some code here you also have How to get the process work directory and have here how to do it which is escape in the door sign We saw it before how to do additional optional inputs and then also show you how to do it I also create the repository On my namespace, which is next flow snippets. We had a similar purpose But it's only a github repository. There's no website, but because github can render markdown It's almost the same thing. So how can I get the first n elements from a channel? Here you have it by clicking you will see Uh It's taking a while to load The problem and solutions some sometimes I have some pictures But you know sort lexical graphically paths in a list within a table And then I have this what I have what I want Ba ba I have some code here using map and so on in the end is the output right so There are multiple different places where you can find information about Next flow definitely a great place to look is this lack. So you have the next floor is lack that you can join The next floor. Yo community click on select community chat. You'll get the invite It's a great place to ask the next floor blog is a great place to read about next floor There is the channel's podcast. You have a podcast talking about The news in the next world. There are some technical discussions sometimes. It's a very nice podcast And tomorrow you will have Let's see here the third and last day of this training with chris hakhard is going to be a day focused on nf core So you're going to see the nf core best practices pipelines modules sub work flows tools For users and for developers. So it's a great day. It's a very important day And I hope that I was able to teach you the basics the foundational concept of next flow So that tomorrow you can take the best of the training. So don't forget You're more than welcome to ask any question in the channel on the nf course lack And even if it's not during the the streaming of the videos, right? You can ask at any time We're gonna leave the channel open even for a few days Maybe a few weeks after the training is over. So you can ask all the questions there that you want And I hope you enjoyed the the session today. So see you In another opportunity. Bye. Bye