 This is day two of the training sessions we are running. And my name is Abhinav. I work for Secular Labs and Customer Solutions. And I think we can get started already. So before we jump into the main content today, I would like to quickly thank, first of all, you, for making the Slack channel very active for engaging with us and asking all those interesting questions. And I would also like to thank the wonderful team which is working hard to answer those interesting questions, Maxime, Chris Wyatt, Chris Hackert, Piliuels, and also Marcel Heberodantes. So thank you so much for answering the wonderful questions. And about today's session, what we are going to do, what we are going to cover is basically a quick recap of what we have done yesterday. And then focus on today's content which is digging deeper into the concepts we have touched and practiced yesterday, which are channels, processes, operators. And then we have a quick 15 minute break. So again, the content that we are going to cover is available on training.secular.io. And for anyone who has their Gitpod environment up and running, you should already have that open, the training material open in a tab. So let me jump to the Gitpod environment. So yesterday we covered a series of scripts and the end result of that script is a simple proof of concept RNSEC NF pipeline. And in that pipeline, we analyzed chicken genome and samples from gut, liver, and lung. And then we indexed the transcript home for the chicken genome. And also we did quantification on that genome. So in the end, we merged all of the steps together. Let me quickly highlight this here. In the end, we merged all of the steps together and created a multi QC report. This was the result of script seven. So I understand there is a lot of content and it should not be a big problem if you don't understand, if you miss something in the middle or some concept is not clear yet, it's not a big problem. So what we have done is that we have made the training material open and it's also available on GitHub as well. So you can always fork it, contribute to it. We highly encourage if you can contribute to it. But then you can always use that material and open a Gitpod environment to practice everything from that material. So let's focus on the quick recap from yesterday. So yesterday, we quickly discussed about pipeline parameters. So parameter next flow gives a special status to params and all of these parameters are modifiable when you want to run a pipeline. These are not your typical variables in a programming language, right? These are special variables. So for example, if I run this pipeline, next flow run script seven and I would have to use with Docker because this is a fresh Gitpod environment. So I don't have that configuration on next flow config, but we'll put it there. So if I run it now, we'll have the default behavior, right? And the default behavior actually includes printing out all of the variables. So we will cancel the pipeline as soon as we have access to the screen. Yeah. So over here, we can see the variables are transcriptome, reads and out dir. This is the output directory. And what I will do next is quickly override one of the parameters and I can say my custom out dir. And then the pipeline runs again and we log out all of the parameter information from the pipeline. So when we have overridden one of the values, one of the parameters, that is picked up by next flow, but for the others, the default value has been reduced. So this is a concept that we will touch upon again when we talk about configuration of next flow pipelines. But for now, this much is enough. Then we have a quick comment. This is a special comment. It's, I think it's pretty much standard in languages which are inspired from C, which is C++, Java. And since next flow is based on a language called Groovy, it also follows that pattern. So this is a single line comment, single line comment. And a block comment is represented as shown here. So the next thing is a process. So in this case, again, a quick, let's quickly recall the analogy we created yesterday. And a process is more, is really like a blueprint that is used to create a task which would run on your infrastructure, right? And this process, this blueprint of a task needs to fulfill certain contracts, right? So the most important section or technically the word is directive. These, the most important directive in this pipeline is script. And this process is script. So over here we will, in the script section, we will run Salman. And this Salman string relies on couple of variables which are provided to this script section by next flow layer. So you can think in terms of layers as well. So there is an next flow layer. And in the end, all of that next flow layer gets converted into the commands you want to run, right? So let me, yeah. So transcriptome over here comes from the input of this process, right? So this, the contract of input and script is that they have to follow the same semantics and same names. So if I have to change this to maybe say file, then I need to make the same change over here as well, right? Otherwise this value would turn out to be null and it would not really generate what we want it to generate, right? But this task, task is a special directive within any process. So instead of saying pass, pass.CPUs, I could have just said CPUs2 over here and then just use maybe, well, let's just stick to this analogy first and then I'll cover the other section. So over here, if I say pass.CPUs2 and then I specify the number of CPUs to be two, it means that when the pipeline is run, this process would generate a task which would replace this by two because I have customized the value of CPUs for this task, right? But if I don't provide this, which was the default behavior, it means that it will only run with one thread, right? So basically this is how you can control the number of threads you want to provide or any kind of setting you want to provide to the underlying script using the next-flow layer. So I will remove, I will undo this change and I will go, I will move forward. So in the quantification step, we see a couple of new things which is a tag. So a tag is you can say annotation, a way you can annotate a specific execution of a process. So over here, when I run the pipeline, you can see Salman on gut. So the gut came from the input directive, right? So over here, we are taking, we are, whatever input is being given to quantification, it's called sample ID. A part of it is called sample ID and that sample ID is used to annotate the specific execution of quantification. At the same time, we have published the error. So this directive tells next-flow that whatever output this particular process or task generates, it needs to be moved into a specific directory, right? So in our case, the outDIR is results by default. So if you run the pipeline, we will see the results being stored in the value in outDIR, which is in this case results, but we can override it, we can say maybe projectDIR and we can say results. And if you run the pipeline, it would publish those files into the results directory. If we open the contents of Gitpod on the left using Exploder, we will see the results folder. There is first myCustomOutDIR, which we used last time, but this time we have published the results directory. So this is the results of quantification and then multi-QC report. And basically this is how we can customize a lot of behavior of the pipeline using directives, right? And this becomes important again, in terms of configuration, which we will cover later on. So moving on, we see fastQC process and multi-QC process. And in this section, we only see workflows. So workflow keyword has a special meaning for nextflow. It means that within this workflow, many processes would be combined together and the way you want to combine processes depends on your logic of the pipeline. But as far as nextflow is concerned, this workflow is used to combine various processes together. And this is an improvement over the previous version of nextflow language, not the runtime itself, but the language. So now we are using DSL2 by default. And this is an improvement over the previous version, DSL1, where the process was run, the declaration of the process was the same as invocation of the process. But with DSL2, we can declare a process over here and call it in a separate location which makes the overall workflow quite concise. We also saw channel manipulation. So QuantCH was manipulated using mix first and then collect. So let's dig deeper into this block a bit more. So QuantCH, the content we want to send to multi-QC is a combination of the output of quantification as well as the fast QC process. So what we do, we capture the output of these two processes and then we combine them together using an operator called mix and then we collect everything together. So if you want to quickly have a look what is the output of QuantCH, we can run the pipeline again. This time, I'm just gonna rely on resume because we don't want to actually run the pipeline. We just want to see the view, the contents of the channel and we have changed nothing in the pipeline. So let's make use of resume ability of the pipeline. Yes, so these are the contents of the mixture or you can say when QuantCH channel is mixed with fast QC channel. This is the content of that. This is the result of that, right? But this is separate and what we want to do next is to collect it because multi-QC is expecting a single input which could itself be a collection. So I can say collect here and then we run the pipeline again and over here. Now, instead of printing out these two paths separately next flow has combined these together and this is what we want to send into multi-QC. So basically this is what is happening. What we can also do is save set this content the result of this set of operations into a special channel which we are creating right now. We can call it multi-QCCH and we can replace this entire content and maybe make the code more readable. So this again is a very stylistic choice. So you can choose to create channels on the fly. You can do operations on the fly and not save them in a separate channel or you can choose to do so. This basically is a stylistic decision. And then we reach workflow.onComplete. This is a special mechanism you can use to give feedback to your users, right? Because many things can go wrong in the infrastructure side on the pipeline logic side, on the data validation side. So a good practice is to inform the user when a workflow has completed. And then in this case, we are passing the special message that when the pipeline is done successfully, we point the user to where the multi-QC report has been published. If the workflow doesn't exit successfully, we print this message. But instead of this message, you could mention other things this basically depends again on your design decisions. But next flow allows you to customize and tweak things a lot. So this was a quick review of script number seven and whatever progress we made yesterday. Before we move on to the next section, I want to just mention something quickly. Yeah, so let's open the training material and split the view. Yeah, let me rearrange the screen a bit more. So the sections we are going to cover today as mentioned in the agenda is basically around channels, processes and operators. So I will quickly jump to that section. Yeah, I think it's worth calling out that yesterday we all, since we also did a little bit of work with Docker containers, we also mentioned about Konda. So if you want to have a simple takeaway, you can think of a Konda as a reproducible way to install dependencies and then Docker as a reproducible way to share those dependencies or run those dependencies. And we share in our training material we have shared how you can create your custom containers, how you can use those containers. But the most important section is over here, especially in the field of bioinformatics, which has projects like biocontainers. We will scroll a bit more over here. Yeah, so this is a Konda recipe. It's a YAML file and you can mention whatever softwares you want to install within a container in this single file. And then you can rely on the setup we have shared over here. This is a Docker file and this Docker file comes with a pre-built Konda implementation, which is what we have delivered over here is a Docker file which relies on MicroMamba, which is kind of a compatible but faster version of Konda. So it creates the environment and then it installs in the environment everything you need, everything you provided in the file above. And it cleans up all of the extra dependencies which get downloaded or which were already present. You want your Docker containers to be as slim as possible. So this is an optimization and then we update the path. And basically after this, we are able to use all of the dependencies within a container to in conjunction with Nextflow. We just have to provide with Docker on the command line or update the conflict file. So with this, I think we are ready for the main content today, channels. So what is a channel? So the most that the straightforward and technical definition of a channel would be these are a special type of data structures which are like the foundation of Nextflow and the data flow programming model. So it means that you might have used other programming languages like Java, R, Python, pole. So all of those languages have a data set of data structures array, hash map associated arrays and things like that. But since Nextflow is a specialized language building on top of a general language. Nextflow chose the foundation of channels, right? So channels are the key data structure or you can think of this, think of channels as like highways for your data, data highway. So if you want to send your data from one place to another place in the Nextflow language, use a channel. Now this is different from the data structures you see in other languages. So the behavior is slightly different as well. So let's understand what kind of channels we have within Nextflow. And just keep in mind, I will circle back to this figure after I have covered section 5.1. But this is a very interesting and important figure. So quick word about this, task alpha produced three files X, Y and Z, three separate files. And those are being sent or sent to task beta, right? So this task alpha is technically the producer of these files and this task beta is the consumer of these files, right? So with that being said, let's move on and study what kind of channel types we have within Nextflow. So Nextflow has a couple of channel types. First one is a Q channel. Another one is a value channel. So a Q channel is basically what it says it has. Let me make this bigger. Oh, this was better. Yeah, so a Q channel would basically, basically provides us a data structure in which we can put things on one end and take things out in the other end, right? So it's a unit directional flow of information. Technically, you can call it a unit directional fee for Q that connects to processes or to operators as we have seen previously. So operators like mix, operators like collect, they are connecting to different channels together, right? So if you move on, we see a couple of value channels a couple of Q channels that we have already seen. And over here, we see channel.off, right? So technically anything that you will see with channel.something directly with channel.something, this you can call it a channel factory, right? So this channel factory creates a channel which we can use to kind of move the data and values around within the pipeline. So if we just run this section in a snippet, I will, okay, so this is already here. I can come to the terminal and I can run it with snippet.nf and we will see what it prints. So we have seen the view operator earlier as well, but what happens if we just print something, if we apply the usual semantics of print line on a channel. So below we can see that the channel.view gives us the results we expect to see, but applying a print line on a channel gives us this dataflow broadcast around dataflow stream. And this is because the dataflow or the channel is not a default data structure of the underlying groovy language, right? So to see the content of a channel, you have to follow kind of the rules of a channel of interacting with the channel. And then if you want to see the content, view is your friend. You can, instead of using parenthesis, you can use braces as well and you can do advanced stuff like you can say contents of ch and here you can say dollar it. So it, you can think of it as an implicit variable which next flow and groovy create for you that you can just use to refer to technically itself if anyone has done object oriented programming. This, the keyword this provides you a similar semantics. So if you run the pipeline again, if you run the snippet again, we will see similar results, but this time formatted differently. You see contents of channel first value one, value two, value three. So three values have been put into the contents of CH channel, right? And using view, we were able to see what is the exact value of that content within the channel. So channel of operator is a channel factory to which you provide individual values and it will create a channel and put those individual values one by one in the channel. I will move on to the next type of channel which is value channel. And the first channel is Q channel. The second type is value channel. The value channel is very unique in its scope because it can only have one value, right? And its value does not get depleted or consumed by any kind of process you want to connect it to. So let's talk about channel.value, maybe replace the entire content here. And if you run the pipeline again over here, we will see it has, it will print hello thrice because this is a single value channel technically called singleton channel and is only bound to a single value. If you try to put multiple values here, it will throw an error. We can say hello world if you run it here. Yeah, so here's the error message. It says invalid method of invocation value with arguments hello, which is a Java lang string and then world, which is again a Java lang string. So we can't provide multiple values but technically we can provide a list of value which is a single value, a single collection which has its own individual members. So this would work, perfect. So the value channel is a channel fact, the value method is a channel factory which can only work with a single value, right? And the channel created using channel.value channel factory, its content cannot be depleted no matter how many times you read it, right? So it's like if you put one value into this kind of a channel, it gets multiplied by you can say infinity. So you can't consume, you can't take out or deplete anything in this value channel. And that's why the contents are pretty much fixed. So next flow implicitly converts some of the, well if you apply certain operators like first, last, collect, count, then it automatically creates a value channel type. So again, we talked about operators. Operators are applied on channels and certain operators like first, last, collect result in a value channel. So this was an overview of the types of channels we have within next flow. And let's go back to the figure we discussed earlier. So try to guess which kind of a channel is this. So we have a channel, we have a task alpha. It produces three files, file X, Y, Z, and they're reaching task data. So without waiting for long, I'll just highlight what this is. So this is a Q channel because it is producing three different result, three different, it is putting three different files in a channel. That's why it's a Q channel. So it means there is a producer and a consumer and as soon as task beta takes X, only Y and Z would remain in the channel. As soon as another task takes Y, only Z will remain in the channel. So this follows the first type of channel, Q channel that we studied. This is a first in, first out Q. So with that being said, let's move on to study other kind of channel factories. We have already talked about the value channel and we have seen that, for example, it can only store a single kind of value, but that value itself could be a composition of other values. So for example, a string. So a string is made up of individual characters, but since it is conceptually a single string, it works with a value channel. The same is true for a list, even though it constitutes of five different values, but those five values are contained within a single collection value. It's called a list and then it's a perfectly valid input for a value channel. So we have already seen how to use the implicit it value and then we can move on to from list. So this is another channel factory that is of interest. Let's say we have maybe a list of two strings. Hello and world and what would happen if I run this? So from list, channel factory would read each one of the, each of the contents of this list and then put them into a channel. So I can apply set, apply view first, which means that I can see the contents of the channel and then I can also apply set and say list CH. So I am converting a list data type or data structure into a channel data structure. So I can run the snippet again and we will see the results and we will save the results in list CH. So these are individual values created because we have two different members of a list. And again, this is a from list, again, this is a from list is not a value channel because it can take two different values. It is not returning a single value here. It is returning two different values. So anything, I think a good heuristic is that anything which you don't see as being used as channel.value or isn't getting produced by application of these operators that's a Q type channel, right? So the next channel factory that we will quickly cover is channel from path. So we already have some data available if you are also using the Gitpod environment. We have some data available within the data folder and meta folder. So we have CSV files, a couple of CSV files but we have other files as well. However, the channel dot from path channel factory reads individual files. It does not read files in pairs. And this is different from the channel factory we saw yesterday from file pairs. And I'll quickly cover that later again today. But for now let's focus on from path. So over here I can, we can specify like this and then again we can do a few on this and set. We can save the results in maybe CSV, CH. So a channel for storing CH CSV files. And if you run the snippet again, we will see the contents created by channel dot from path channel factory. So it creates two values, it has two values and those values are captured because of this globbin pattern. So if instead of this globbin pattern I had specified the path, not the path, the full name. Let's say patients one. Open this. If I specify the full path, then the channel would be created from that one file only. So the contents of the channel would be a single file rather than a list of files. So globbin patterns are very useful when you want to work with files, when you want to source files or publish files. So this is something that we highly recommend that you kind of look into. But most of the general patterns we are already going to cover here. So a quick tip about globbin patterns. So if you want to cover, if you want to have, if you have nested directories for example, data meta, if I don't want to specify the nesting structure of the data here, I can just replace this with double star. And if I run the snippet again, I would see the same results. I would see two files which are stored in a nested directory within data. So the double star globbin pattern basically instructs next flow that ignore the nesting level of a folder. Just try to find all of the CSE files within that top level folder. And as we saw yesterday with from file fact, from file pairs, there are options available for most of the channel factories. In this case, we have the globbin pattern which we already mentioned. We have follow links. We have check if exists. So if I say here, let's say the file does not exist. If I try to just source this file, what would go into the channel? Let's try to run it and see. This file of course does not exist. Yeah, so next flow still finds it and puts it as the value of that channel because it is not validating whether this file exists or not. You can tell the from path channel factory to confirmed. Check if exists and you can say true. And this time next flow would throw an error. Yeah, so this time you are telling next flow to actually go and check whether the file exists rather than just taking the file path and putting it into a channel. So this is how you can tweak the behavior of various channel factories. I will cover another example next. It is about from file pairs, which we saw yesterday. And it was used to create a channel with a very peculiar shape, right? So if we run this content here, run the snippet here, it would create a channel which has the following shape. We can say sample name and then another collection. It says sample one path and sample two path, right? So this is what we call a tuple within next flow and the shape of data within a channel created using from file pairs follows this order. It gives you a sample name and then it gives you the path of these two samples. The forward read and reverse read in this case, but you are not limited to just using one and two, you can use any kind of globbing pattern and collect and accommodate any kind of naming scheme you're using for your samples. So with this, you can again, use similar set of options and you can specify that, okay, also include hidden files, which is not the default behavior, but if for some reason you have files which are hidden in a directory which you still want to analyze further, you can specify them using hidden true. And then the analysis would continue as you expect. With this, we can move on to the next section which is from SRA. So SRA is a bit different because it relies on, first of all, a key acquisition. So you would have to go to your SRA account, and then get that key and provide that key to this operator. And only then you can kind of view the contents of the channel. This is more often the case that you actually have to find and put in the key because the API is very active and very much under stress. But the shape of the channel that this, from SRA of channel factory producers is similar to what we see from file pairs. So again, we see sample name, part of the first sample, and then part of the second read, right? You are highly encouraged to experiment with this and use your NCBI API keys and then just experiment with what kind of data you can gather, you want to inspect from NCBI. But this again depends on your API key. So yeah, so this example is worth covering because in this example, you are combining a channel factory to a process which is in this case, fast QC. So what you can do is you can provide your key, of course, and then provide accession numbers of the samples you want to do quality check on, right? And then we define the process here. The process is similar to what we have seen before. But notice this, the shape of the channel of input, the shape of input for fast QC needs to have a sample ID and then a path or a couple of paths in this case. So you can create a channel called reads using from SRA channel factory and then provide that channel into fast QC. So all of the samples that from SRA would read from NCBI, they would be downloaded and provided to fast QC for the analysis. And this is how you can create pipelines which integrate external data sources, not only the data sources you have on your file system, which we are using currently, right? So I will mention a couple of pipelines which rely on external data sets which is a very good pipeline to kind of study and understand how these things are used in a production pipeline. The next channel factory I want to quickly cover is a text file. So you can read any kind of TXT file and then apply split text on that file. Split text is an operator. So we are creating a channel using from path factory and then applying split text on the contents of that channel and then we will, we are applying views. We want to see what are the contents of that file which we just split, right? Let's try to run this in our terminal and I can open that file here for reference. This is data, meta and TXT. So this has the content of Lodom Ipsum and then what we told next flow is to read this file and split this file using new lines. So all of these new lines are printed separately now. Again, next flow has a lot of inbuilt channel factories. You are highly encouraged to explore all of them and I'm going to move to other sections. Yeah, so as mentioned, as I mentioned before, you can apply, you can do very nice things with view especially when you use it with parenthesis, sorry, curly braces. So this curly braces, you can think of curly braces in the context of next flow as anonymous functions and I'm going to cover them again briefly but just think of them as your quick and dirty functions you want to define and apply in the context of a channel, right? So anything, if you see view being used with curly braces, with curly braces, we are going to use probably the it value, the implicit it value within the curly braces and the word for this structure is called a closure. So closure and we briefly mentioned this later in the documentation but we might not cover this fully today and then you can also define functions and variables like we did yesterday. So we define a variable called temp my var and you can add depth. This is defined, you are telling next flow and groovy that, hey, please create a standard variable for me and you tell groovy and next flow by using depth. So here we create a couple of variables and then we are applying a for loop. So the for loop, the synchronous behavior or the traditional programming model that you might be used to is like for loops, process things one by one in sequence that next flow inherits from the groovy model and you can always combine these two. So for example, over here, this set of lines could be converted into a groovy function which can be used in your next flow pipeline. So I'll move on to CSV next. We have covered TXT files. The CSV files are pretty much the same and you can read those files, you can split those files based on columns and you can modify this behavior as well. So you can run the snippet again and you will see this time we only get the first and the fourth column of the CSV and I can open the CSV here quickly for reference. So the first column is patient ID, the fourth column is number of samples. So here when we ran our snippet and split the CSV we saw that we can kind of pull out the first column of that row and the third column we only want to show that when we want to view it, right? So view is a great way to kind of have a glimpse within the contents of any channel you want to inspect and you can of course modify the behavior of split CSV using various things. I can mention here, let me move here. Operators, you can come to split CSV and you have other operators like split FASTA, split FASTQ and all of these operators have options similar to channel factories and you can say split by like number of rows and the separator, you can also specify that, for example, you might have a non-standard CSV separator like colon or something like that. So you can modify the behavior of split CSV operator as well and you're free to explore other options within this operator. The documentation is very thorough and you can experiment as well. So notice in this set of snippets we have never even created a process. You don't really need a process to experiment with channels and operators. So you can understand and apply process and channels and operators in different contexts as well. But later on, of course, they have to work together because channels is the way you can move data around within a next-floor pipeline, right? I'm gonna skip CSV section and focus maybe a little bit on the JSON files. So again, we are going to choose or rely upon Groovy's inbuilt support for reading JSON data. And this part, as you can probably tell because it has for loop is synchronous in nature, right? We have bundled all of these parses within this training environment and you can refer them using, these are all available within NF training and parsing module, so parsing folder. So you can click on maybe parse JSON and you will see that this parsing logic has been shared already. And later on we will see how we can import this and use this in a pipeline. It does not have a native operator yet, but it's simple enough to use anything available within Groovy or Java in next-floor. So let's go back here quickly. So this is the one that we are going to cover quickly. So if you want to see the parsers, you can save those parsers in a separate file as well. So within the same folder of parsing, NF training, parsing, we have a folder called modules. And within modules we have parsers. So here again, the definition is pretty much the same. We are importing a library here for YAML and the same for JSON. And we are defining functions. These are vanilla Groovy functions, but the beauty is we can use these functions within next-floor and import them. We can even define them in the same file, but later on we will see the benefits of modularization. So let's focus on importing and just making use of this function which is available in a separate module. So we include a function using this syntax, include name of the function and then from the location of the file. Notice the file is .nf. So you can define your Groovy functions or your variables within a next-floor file and it would work natively. So over here we can run the snippet and we will see that all of the JSON files within this data meta folder would be read. All of them would be parsed. And then what we will do, we will transform their structure into a different, sorry, the switch file can't find. Maybe the location is wrong. I should be parsing. Yeah. So you saw if you just notice, if you can't, if next-floor can't find the module in the specified location, it will tell you, I'll try it, no switch file. And this time it found and it is processing everything. And the result of this is basically foo. What is foo doing? It is going to echo everything that we are parsing to this function and we created, let me make this bigger. I can make it this bigger, this side, it's better. Yeah. So we parsed the contents of the JSON file. We applied a map. Map is a function, some, it's similar to a function. Technically it's an operator, but you can think of it as being applied to all the contents of a channel. So we can also do a quick view on this and see what happens after map has been applied. Say view. Again, as you can see, this is, these are curly braces. So it means that you can use anonymous functions within this, these curly braces and that's what is being used here. So instead of entry, we could have just said it, right? But sometimes it's better to have specific names. Let me mark it here. We don't have a specific function, which is called view, this is true. Flat map is a function, true. Then we apply, let's convert this into a quick channel. C-H, then we can say here, C-H I'm just doing some tweaking here. We run again. Ah, yeah. So this is not your typical operator. That's why it doesn't have any set. So now let me just have saved the results and parsed JSON C-H. And this is the last attempt we do at running this one. Maybe I'm missing out something here. Yeah, so this time we made it work. We saved the contents in a separate channel and then we applied the operators. So yeah, what we were trying to do is to get specific piece of information from the JSON file and then transform that into a different structure, which is a couple of entry and the path, some data or patient ID. The next thing we are going to quickly cover is processes. So we don't have a lot of time right now, but we'll have a quick break, five minutes, and then we can jump into processes. But let me tell you in advance that since we have covered channels in depth, the operators section, we have seen most of the operators already. So we are going to focus mostly on processes and then the modernization part. So let's have a quick break of five minutes and then we'll be back. So welcome back, guys. I hope you all have had a good bio break. Yeah, let me highlight a couple of things here quickly. So with DSL2 of Nextflow, there is a specific syntax that you can use for sending the input for connecting one process to another process or connecting even operators using a syntax of piping, right? So this is equivalent to what you would do on the command line. You can maybe cat a file and you can do grep Docker. So Nextflow DSL2 kind of embraces this paradigm of connecting processes together, operators together using a pipe operator. But again, this is very stylistic. So you can use pipes, but you can also use dots wherever you want to invoke an operator. If you want to connect, if you want to pass output or the channel contents to a process, you can use also pipe. So it's very similar to what you would do on the command line. With this, I think we can move on to the processes section. And the process section is basically built upon most of the things that we have already discussed. So within a pipeline, the way you would move your, sorry, yeah, the way you would move content, yeah. So the way you would move the data within a pipeline is through channels, right? But moving data around by itself is not very useful. So what we want to do is that we want to use the blueprint of a task we have decided, we have designed earlier, and then connect the channel to the task, right? And this is where processes come in. So processes are like the unit of work. This is nothing related to movement of data. That is the unit of work. So you can define a process within Nextflow. This is the simplest process. You can say, hello, and just define the script. So there is no input and no output of this process. So it doesn't really do anything that would be a real-world use case. But a process consists of directives. Directives are published, they are tagging, and also CPUs. So these directives are mentioned within the Nextflow documentation and the tutorial material. And we covered them briefly yesterday. So directives, you can mention container options, CPUs, label, memory, published IAR. So there are a lot of directives, around 30 directives which you can use within Nextflow. Process definition. And then what we have is input. So input is again, like the design of the process specifies that the shape of the input or the number of inputs that it needs should be within this input section. The same is true for output. If a process has an output, it needs to be specified here. When is more of a design choice? But again, what it does is, you can specify a condition within when and it would execute only when that condition is satisfied. We will see a quick use case later on. But just keep in mind that even after you have put in all of the data here using input, you might still want to have a pre-flight check. And that is the point when is useful. And then we have already seen script, but there are a couple of alternatives. First one is shell, other one is exec. So we will also see those quickly. And script, the next section is about working with script. So what we can do, we can create an example process and we can run it quickly. We can run the snippet file again. And what this process does is it echoes a few things into a file and then it finds the last line of the file, does head and using counts characters and then creates a file called chunk one and then runs Gzip to compress that file. And since it has run, the results are within a work directory because we didn't publish anything from this process, right? So this process again has no input, no output and no published directory. So this process just creates a file and that file stays within a work directory. It's not very useful by itself, but if we comment this out, basically nothing would happen because this file, only this part is like the declaration of a process, not the invocation. Yeah. So you see, next flow tries to help people to understand that this is DSL2 and you would have to enable, if you want to use declaration of the process as also the invocation, that's the DSL1 syntax. But to invoke any process in DSL2, you have to use the workflow and then the parenthesis syntax for a process. Let's move on, we have seen yesterday that next flow integrates natively with many scripting languages. And if you don't want to use the default shell script, you can provide a Python script over here and then you can run that process. So if you run this process, it would automatically invoke the Python available in the environment and then run this code. You would expect it to print Hello World as well. I'll move on to script parameters. This is something we have already covered yesterday because we were talking about the transcript home file, the fast queue files, the reads and the out TIR. So you can use the values of these parameters directly within a process as well. So when you invoke the process, it will try to look up the value of this params.data, which is world and therefore echo Hello World would be executed by this task. I will move on to a different section where we can cover maybe shell. Yeah, so the script section is pretty much the standard but if you want to embed a specialized or like a sophisticated shell script, then maybe you are already using dollar for shell variables rather than next flow variables. In that case, there is a conflict, right? So you can say shell variable and if within script block you are using dollar shell variable, that means it should be a next flow variable. It would not take the value from bash but if you want to, if you change this to script and change, oh, sorry, shell and keep in mind you would also have to change the quotations, triple single codes. That is the syntax for using shell. At this point, this is a proper shell variable and to use any next flow variable, you can use a different syntax called bank and then again the parenthesis. So this is how you can kind of put in your big shell script that you already have into the shell section of an next flow process and you would have to make minimal changes to accommodate that within a next flow script. Conditional scripts. So this is again very useful when you have sophisticated logic. For example, some samples have only a single read, other samples have double reads. In those cases, depending on the, for example, the length or size of the input, you can either choose to run a tool with single sample, single fast queue or a tool with, you can say if, else, or you can choose to run a tool with both of the reads. So next flow provides a lot of configuration even within a process as well. This depends on your design choice. So next thing, inputs. So we see a similar diagram here that we saw at the beginning of channels and we can see that processes. So this process is the blueprint for task one, task two and task three. So as soon as the queue channel provides data X to the blueprint, it gets specialized for task one and that is executed. Same thing happens with data Y which is, which when provided to process blueprint, it converts, it gets converted and consumed by task two. Same thing happened for task three and data Z. So this is how a queue channel is consumed by a process. So process relies on a queue channel to convert the contents of queue into task, a specialized tasks. We have various kinds of input qualifiers. So far we have seen Val as we saw in Hello World and Path. But there are a couple others. For example, environment and, but let's stick to the Val for now. I want to cover the point of debugging. So in this script, if you run the script, we have added the word debug. If you run it with this commented out, we will not really see anything being printed on the console output here. But if we enable it, enable debug, so whatever would be produced or outputted by the script section, it would be output on the next flow console. Yeah, so this time we are able to see process job two, process job one, process job three. This is really just used for debugging and it's not meant to be used for like within the pipeline itself. So this is a special directive that you can probably rely on while developing pipelines. We'll move on to input files. This section we have already covered quite a few times already. This is a channel factory from path. We store the channel content into a reads and then define a process and then invoke this process with the contents of reads. So this is something standard that we have come to understand with the previous script. We'll move on to combining input channels. So this is a great example. If you want to use multiple channels, like we have used for fast QC earlier, if you want to use multiple channels to provide input to a process, you would have to specify those in a positional order. So the first channel, the value X should be the first argument for foo. The value Y should be the second argument for foo when it's invoked. So channel one and channel two. Again, this relates to how you want to send data or push data into a process. But what happens if we delete, if we have three values in one channel and two values in another channel, but the process requires a combination of values. What would happen at this point? So if you have guessed that Nextflow would execute this process only twice, you're correct. Because the pair of one and A and two and B is satisfies this contract of input, but three has no counterpart in channel two. Therefore it would not run third time. What next steps? Input repeaters. This is another useful directive. So what you can do is that if you want to maybe optimize the parameters of a task, of a tool, in this case, let's assume tea coffee. And if you want to, which could be used in three different modes, like regular, espresso, side coffee. And you want to understand what is the result of those. So you can run the snippet here. And since we are only doing echo, we don't really need to have tea coffee on our system. But this is what would be run. Notice we have enabled debug. That's why we can see them on the screen. So you can see that we have a total of 18 processes. And that's because we have maybe six files in trots folder. Let me quickly check here, data. Data and trots. And we have six files within the trots folder. So the process is sure to run at least six times. But since we also indicated the process to run with each of the methods here. So six into three, this is 18 different tasks. That's why next flow kind of automatically ran each file with each of these modes. So the first file was ran with regular espresso, and the second was with iced coffee. And the same is true for all other files, which were input. Yeah, moving on to output. So we have covered this yesterday, but I'll quickly mention it again. So if we want to send something out from a process, we need to mention this in the output. And the content that we want to send out of the process needs to be generated by the process itself, and again, we can use Vue to see the content of that script. And I'll move on to the next section. If you want to output, so globbing patterns are very useful for creating channels, for providing inputs to a process, or for outputting files from a process. So you can always rely on globbing patterns, and next flow would automatically apply this globbing pattern for the files in the directory, and then stage them out from the process. Dynamic output names. Yeah, so we can quickly go with this example as well. This file does not exist. Yeah, so these output files are not produced by the process. That's why next flow was able to warn us that, hey, these output files are not there. So maybe we need to check the command which is supposed to generate these files. So we can say sequences, this, and instead of sequences, maybe we need to apply, I think in the interest of time, I would not dive deeper into this script or debug this right now, but I will move on to the next step, which is the use of when. And as I briefly mentioned earlier, when is used to conditionally run a process, even when it could be run, right? So in this process, for example, find, we are taking input of a faster file and a value type. And then if the faster file satisfies these two conditions, the faster file name should be BB11, and the type should be NR. So we applied a condition on both of the inputs, only then the script should run, otherwise it would not run. So even after you have defined a process, you have some checks and balances with which you can control the execution of that process. So directives, we have covered quite a few directives already, but I'm gonna mention CPUs, memory, and container here because these directives are probably the most relevant ones when you want to make good use of the infrastructure you have. And at this point, I would say you can choose to specify these directives in the process definition, but that's not always the most optimal option. And that's because we have in the upcoming section configuration. And the best practice is that to specify all of these processes in the configuration, rather than the process definition itself. So in this process, we have already seen how to publish contents into a specific file. You can choose the mode, you can publish it, instead of duplicating the file, you can publish just a sim link or symbolic link, but you can also opt specifically to publish the results and then delete the rest of the work directory, all of the intermediate files. The cool thing about published DIR is that you can specify this multiple times in a process. So let's say a process creates three kinds of files. First one is FastQ, another one is TXT, Count TXT, and Outlook TXT. And you want to publish these files in different directories. So the first published DIR, you can specify the pattern to be, again, globbin pattern to be star.fq. And the second globbin pattern is around counts and it gets published into a different location. And same is true for the third published DIR. So you're not really limited on trying to publish, you're not limited to publish all of the results of a pipeline of a process into a single published directory, right? I will quickly cover the operators. We have seen the operators already and we have seen the usage of map. So map is an operator which you can use to transform each element of a channel, right? So if you have, in this case, we are just squaring it. So if we have one, we want to convert this into one. We want to create another channel which has the squares of these value. So one, four, nine, 16. And to achieve this, you can apply a map. Here's a quick snippet you can use to see the results of this application. View we have seen more than a few times. And the closure we have mentioned earlier, that every time you see curly braces within the context of an operator, it's an anonymous function or a closure. So you can define anything that a function, you would typically use a function for. So again, map here, you can apply various operations on a map, within a map, sorry. You can do it reverse. So instead of having hello world, we are going to replace it with world hello. And this would be the result. Over here, we can apply, we can create a channel with hello world and replace that with world size and then view it with another closure. So again, this is the first anonymous function we applied. This is the second anonymous function we applied on the initial channel created by channel off. With this, I will move on to perhaps group tuple, because this is an interesting one and you would often use it. So imagine you have a set of values which are somehow related to other values. Let's say one A and one B. And you can replace this with a file path like read one and read two. And this could be a sample name, right? So when you apply group tuple, it would combine the contents of the set of values into a group, right? So that's why this is group tuple. It converts one A, it combines one A, one B and later on we see one C into a single group of values, one A, B, C. So this is pretty useful and I highly recommend you experiment with this in your snippet.nf file. Another important use case operator is join. So anyone who is familiar with SQL language, it's kind of similar in concept. So you can apply, you can join values of two channels using the join operators and you can view them in a different way. So over here we have Z in second channel, which is right. The first channel has Z and associated value is three. So when we apply join on these, we have Z and three, six. So join works on two channels. Group tuple worked on a single channel. And this further, I'll quickly mention something about groovy basics. Well, actually we have covered a lot about groovy already. We just didn't call it as groovy as such, right? So we have seen how to print values in groovy. We have seen how to use comments, multi-line comments, single-line comments. We have declared variables. But at the same time, you can use anything available within Java as well. So that's what the groovy layer provides you of the bad. Again, we have seen lists. We have even seen an operator.fromListChannelFactory which you can use to create channels using lists. So like any other language, you might, you would expect methods or functions which are applicable on a data structure. So list has a function called, method called get size and all of these are also available. I'll skip a couple of things and I'll just jump to map because on a real world pipeline, something like NFCore fetch inges. So this pipeline you can use to download all of the sequences you want from public and private databases. So the concept is to use a map as an input for processes as well, just to provide all of the metadata because you can imagine tuple getting longer and longer, like path this and path that but using meta, using a map, you can provide metadata information about a sample to any kind of process. I will show an example later on but I would quickly like to go to the next sections. Yeah, so we also have a section on closure and this section is something I would recommend. You can play around with this section, especially within the context of map, flat map, view. So you can use this technique of curly braces to understand how to change or modify the shape of any kind of channel. Now I will quickly cover modularization. And actually we have seen modularization already within this session and that was in the context of importing a function from parsing files, right? So we imported a JSON parser function from a different file. But let me open modules hello world hello world and just hello world. So I'll make this side bigger and smaller. So in the hello world script, this is the first script that we covered in the entire set of videos. We had a split later process and we had a convert to upper later process but both of these process are defined in the same file, right? But we can easily imagine on a real world use case we have hundreds of processes, 200 processes or maybe more than one workflows, but it doesn't scale. We don't want to read through 2000 lines of workflows, right? So that's why with DSL2, next flow allows us to convert or to save this section, save this into a different file. We can save this into perhaps module second pretty file here and then import it into the hello world script using the include syntax. So over here, the module name is split letters. It has to match what we are importing from a module. So now we can only focus on the higher level abstraction of connecting these individual processes together. So again, just to highlight, we can import any kind of module or function from other next flow files, right? And you can combine and focus only on the workflow and you can combine these processes together. Another important thing I would like to cover about workflows is, oh, sorry. Yeah, before I move on to workflows, I want to highlight that you can copy, you can include multiple processes from the same file. So let's say you have, you want to keep all of your fast queue processing tasks or processes into a single file and you put them all in modules like split, let's say here, you can use them as a separate module. You can always include them using this syntax, split letters, and then split letters too. So this is something which is quite useful when you want to modularize your code and it makes it easier to read and maintain. You can maybe use V1 here or V2, depending on whatever best practices you have come up within your team. Module aliases. So if you want to use the same module within the process or within the workflow twice, that's not really possible as of now. So what we can do is give that module an alias, right? So we can import split letters. Let's say in this workflow, we want to invoke or use split letters twice. So we have to include it within alias. So split letters one, split letters two, same for convert to upper one and two, and then we can reuse, we can use each one of the aliases within a part of the workflow. So it's a good analogy to think of a process to be something similar to a function, but it's not quite that. It's based on a different foundation of processing things differently in an asynchronous manner. This thing we have already seen, we have covered in the RNSEC design. I'll move over to the next section here. Yeah. So again, this is something we saw earlier as well, which is pipe. So instead of calling functions or providing or invoking processes like this, let me clean this up. Instead of invoking a process like this, what you can do is just use the familiar piping mechanism. You can say pipe this to split letters. And then if you want to invoke another process, you can do the same. You can apply letters, flatten, and then pipe it to convert to upper. Yeah. Moving on to workflow level. Yeah. So workflow, as I mentioned earlier as well, workflow is a special construct within the next flow language. And it has its own, you can think of them as directives, but let's call them as blocks right now. You can, as you have a take block, you have a main block, and then you have an emit block. So DSL2 of next flow is all about composing smaller building blocks into larger building blocks. So a process is the smallest building block of work, unit of work. A workflow combines multiple processes together, but we don't really have to stop only at workflow. So what we can do, we can name this workflow as a work ABC. And then we can write our main workflow and then invoke this workflow ABC directly as any other process here, input CH. So you can have multiple workflows within a pipeline, right? The take is similar to input. Main is similar to, you can say, script conceptually. And then emit is similar to output of what we have studied in the context of the process. So similar to each process has an input and output and the script of what it actually is supposed to do. You can think of workflows as having that same kind of conceptual structure as well. You can have workflows can have specific inputs and outputs emit, and then have the main body of a workflow in which you can connect other workflows or processes. We have mentioned about workflows. Again, for any kind of pipelines which you have seen or which you have written earlier in DSL one, we have published DSL two migration notes and we call out specific things which are not backward compatible. But it should be very straightforward to kind of paste or separate out those processes into separate files and then connect them together using workflows. So the overall logic of the pipeline stays the same, just the differences in the design, right? With this, we are going to talk quickly about next flow configuration. And this is where the big portability that next flow provides comes in. We have already seen one next flow config file here. And I'll move it to my right, close this. And in this config file, we can see a process and a Docker. So this is technically called a scope. Instead of saying process.container, I could have just said process container next flow. And instead of just saying Docker.runOptions, I could have said Docker parenthesis and then runOptions. So this is technically what we call a scope in the configuration file, right? And next flow has a very extensive set of scopes and you can use these scopes for modifying a lot of the behavior and encapsulating assumptions about the pipeline. So for example, let's check Docker scope. So Docker scope has these options, right? So we can, right now we have seen runOptions, but we could just as well specify our own Docker registry. We could specify a fixed ownership. So all of these scopes have many options available. I will highlight one interesting thing, which is params. So params are actually a scope by themselves. So you can use params within the context of a process, but you can just as well specify all of these parameters as part of your configuration file. So I can say params where one, and specify my default value where two, and specify my default value again. So all of the things we have discussed in the past that about overriding these parameters from the command line, that is still valid. But this way you can have cleaner process definitions as well as workflow definitions. So imagine we don't really need to specify this in the workflow definition file. We can specify this within the configuration file. So the best practice is to keep the minimal things you need, for example, in the process definition, within the process definition, and move everything out to the configuration level. Now, what is the benefit of that? The benefit of moving out configuration to the centralized config file is that you can, if you want to run the analysis with tomorrow like a different container system, you don't have to specify or change anything in the process or workflow. So you want to separate the concerns of designing a pipeline and running a pipeline, right? Configuration files can help you separate out these two concerns, the design of processes, design of workflows, and then portability of pipeline, how many CPUs a workflow should use, et cetera, et cetera. All of these are infrastructure concerns and you can easily move them out into a next flow config file. A next flow config file again follows very similar rules like any other next flow file, like it has single comment, multiple line comments, it has scopes, which we already discussed. We have config params, we have environment config scope, and I can show this to you in our next flow documentation. So environment scope is very special in the sense that when you run a task and some tools require configurations to be provided using environment variables, you can put those environment variables at the configuration level itself, right? So the scope is ENV, and you can mention a quick use of this as over here. Like I would, if I update, if I provide these values within my next flow script, next flow configuration file, and then if I try to run this process within scope or within snippet, I can see these are now a part of the environment of task, right? I have not added them to my batch environment where I'm running this pipeline, but I have just added this to the configuration. In next flow, make sure that these environment variables are available to the, sorry, I need to invoke it, yeah. Next flow, make sure that these environment variables are available to the task execution level. So again, alpha and beta are available to the full process. You can think of providing a command line or environment tokens and secrets, but the scope is pretty open. So we would not really recommend providing secrets or as environment variables, but maybe username or some other configurations your tools might require. So with this, I would quickly like to mention that various process directives that we have covered earlier, like CPUs and container and tag, all of these could be covered and mentioned within the next flow config file itself. So we actually saw the use of process.container, right? And yesterday, if you might recall, I mentioned that these can be specified at the level of process or at the level of configuration. So what is the benefit of specifying this at the level of container or level of config? Tomorrow, let's imagine you want to host your container in a different location, not Docker Hub. So let's say, okay, the default Docker registry that next flow understands is Docker Hub, but you can specify a different one using just a single change in the configuration file related to your infrastructure details. And then you are, again, you are good to run your pipeline on with this container, right? So there are a lot of benefits of keeping things at the level of configurations. Again, you can specify how many CPUs, a task, every process should receive, how many memory, how much memory every process should receive. All of this can be controlled using the process directive, process scope. But now interestingly, if you specify this as simply process.container, it means you're telling next flow to use this container for all of your processes, but probably that's not what you want to do. You don't want to specify as something like process.cpu is 10 and apply this for all processes, right? What you want to do is apply this selectively and to apply selectively, we can make use of process selectors within next flow configuration, which helps you to apply things only to the processes which are valid as per the configuration or process selection, right? So I can show you a quick example here. This is process.container. Oh, we have covered that example as part of deployment. So I think this is a good segue into deployment itself because we are already talking about configuration and accommodating the pipeline to run on multiple environments. And I would have to be quick on this one because we don't have a lot of time left. So deployment scenario is basically you want to take the pipeline which you perhaps developed in your laptop to a Slurm cluster or PBS cluster or AWS batch environment or perhaps Azure batch environment. So you don't want to change a lot of the, you don't want to change the main pipeline logic. What you want to do is to put all of the infrastructure level details at configuration layer, right? So as Evan mentioned earlier in his presentation, next flow makes it very easy to kind of focus all of the infrastructure level details into the configuration file and the core bioinformatics part of the pipeline into processes and workflows and modules. So for example, to change something, to change a pipeline execution environment, you can specify process.executor. Process.executor is equal to Slurm. And tomorrow if you want to run this pipeline on a configured PBS pro cluster, you can just change the single line and you are good to go. Again, this makes a few assumptions that your next flow is already installed on your cluster, your cluster already has similar file parts, et cetera, et cetera. But that's a big idea. We have already mentioned CPUs memory on cluster, time also becomes important because you don't want to have something like sleep running 5,000 seconds or something. You want to limit the amount of time each process can work within your cluster or your cloud environment. And that's why it's better to keep these settings at the level of configuration. So here is a great example of process scope, right? So you can put all of the Slurm and executor level, Slurm executor level configurations within a process scope, but you can also configure processes by name. So this is something we mentioned is called process selector. So let's say by default, you want all of the processes to run with four CPUs, but any process which is called maybe foo, this is a funny name in this case, but let's assume this is one of the processes of your pipeline. So the default executor, default queue, default memory for all processes, you can specify directly here, but if you want to limit or customize these directives for a specific process, you can make use of process selectors. So the with name is the process selector here and you can tell with name that anything which matches foo, apply only, run it with only two CPUs and maximum 20 GB memory and queue, which is short in this one, you can change, you can send it to a different queue even, but again, this all depends on your infrastructure. We can all, there is another process selector which is called with label. And this with label really relies on the design of your pipeline because for running this, for using this label, you would have to label your processes. So one common label you would see in NFCore pipelines would be process high or process low, process medium, something like that. And then in the configuration file, you can see with label, process high. So all processes which have that label would be executed with this specific set of configurations. So again, to control the resources or infrastructure resources allocated for a process, you can make use of name, you can make use of labels. And for making use of labels, you would have to actually go and add this directive to your processes. I will move on to, yeah. So another practical example is you want to use different containers for different processes, right? So let's add a default container, you can say my container. And then we have a process, we want to specialize, to change use a different container for any process which has the name foo. We have to use the sum image X tag and for the bar container or bar process, you want to use a different container. So again, you can control all of these information at the level of configuration. It's not something you should worry about at the level of process while designing the pipeline. Now, how does a configuration file compose? So again, this is something very powerful within Nextflow. You can compose bits of configurations together and then activate them selectively. So let's say, let's copy this one and we will explain. We have profiles, profiles is a special source scope, it is used to combine bits of configuration together and give them a name. So you can think of profile as a named configuration, right? So over here, we have three profiles, standard cluster, cloud and this is a glimpse of the portability configurations. So for example, if you want to run a pipeline on a cluster, then you can enable the cluster profile. If you want to run the pipeline on a cloud, in this case, AWS patch, you can enable the cloud profile. And the way you would enable this is since we have already added this to our Nextflow config, we can use another tool from the Nextflow command line called config and if we print, if you just type Nextflow config, it will print process local. So it just takes the standard profile by default, right? But we can specify Nextflow to use cluster profile and now whenever it will run a pipeline, it will use of course the parameters here and the cluster specific profile, right? The same thing is true for clouds. So you can selectively activate bits of configuration using profiles. Since we don't have a lot of time, I would just like to share some tips on this. So first of all, you can always whatever we have covered until now, it is very theoretical. We have tried to make this something you can practice by yourself, but to really practice, you need a good context, right? And I think the Nextflow RNSEQ NF pipeline already provides a great context because this is what we have used throughout the tutorial. We have technically built a good portion of this pipeline as part of the tutorial. So you will see main.nf, you will see similar processes, log info, multi QC and workflow and then you can go and check into modules, fast QC, index, multi QC and quantification. So we have really built a functional pipeline and you can say the finished or furnished version of that pipeline would be something like RNSEQ NF. However, if you want to, this pipeline again can be run on multiple infrastructures like the standard profile that is Docker, there is Slurm and batch, AWS batch, S3 data. So you don't always have to limit yourself to executor level configuration. You can just specialize parameters, capture parameters as a profile. So yeah, this pipeline can be run on multiple infrastructures, but there is one pipeline from NF core, which I highly recommend that you can maybe, sorry, fetch NGS, you can just type fetch NGS data and it will land you on NF core website for that pipeline. And this pipeline you would often use for pulling data from private and public databases, right? It's very simple to just play around with the pipeline. You can copy this command from the right hand side. You can even try this out on Gitpod and you can say fetch NGS data. But notice one thing, it uses the profile test by default, right? So it means that this is something common to all NF core pipelines. All NF core pipelines have a test profile. You can run a pipeline on your infrastructure without even providing your own datasets, right? So you just want to proof check or have a run through of your infrastructure using a pipeline. So fetch NGS is a great pipeline and you can run this pipeline, but the best part is you can combine multiple profiles together. So maybe Gitpod is acting up. Yeah, let me start again. I'll say fetch NGS and what we need to add is Docker. And this way we are telling NextFlow that to run this pipeline, pull any associated Docker container and NextFlow would do everything as we expect, right? And again, this is what I would say the experience of a production pipeline looks like. So you have a nice log of all of the parameters that were set or not set while running the pipeline. And then it gives us an idea of, well, it is pulling the containers, it is running and then it is giving like feedback. So now we start to see some completed tasks. But again, to understand how all of the things we have discussed today and yesterday they come together, I would really recommend you study the source code of this pipeline. And my teammates would discuss a couple of things tomorrow. But the source code of this pipeline is very small and it builds upon the same principles that we have discussed. So we have modules, we have workflows and we have sub-workflows which are workflows which can be used within other workflows. So I highly recommend you give a look at this pipeline and maybe go through this and ask any kind of questions you might have on the Slack. So Slack is open for conversations during the training and after the training as well. And I would say with this, we can conclude our training for today. And tomorrow we will build upon these concepts of running a pipeline and also designing a pipeline. So please do tune in tomorrow. And thank you so much for joining. Have a wonderful evening ahead.