 Welcome to the Community Foundational Next Floor Training of March 2024. This is going to be a two-day training starting today on March 5th, and it goes until tomorrow. This is the page of the event where you can have more information about it. It's going to take about three or four hours. On the first day, these are topics that we want to cover. We will start with a slide deck, giving an introduction to Next Floor and NFCore. Then we have a Getting Started with Next Floor section. And finally, we will start doing some hands-on by developing from scratch a proof-of-concept RNA-seq pipeline. After that, we're going to go have a look about how to manage the software dependencies and also how to containerize your applications, the application that you're going to use in your pipeline. And for the end of today, we're going to go through an overview of the Groovy programming language, which can be useful when you're writing your Next Floor pipelines. So the go-of this first day is that by the end of it, you will be able to write your own Next Floor pipelines. Simple pipelines, but still functional ones. The proof-of-concept RNA-seq pipeline has multiple stacks with containers, that would count the environment if you desired with real data and some interesting results. I mean, not really results that are insightful, but there's some output. So with that, you can just port this knowledge to write your own pipelines that can be much larger and more complex and so on. For the second day tomorrow, you're going to go through this schedule here with Chris Huckard. So you're going to learn about channels, processes and operators, which are the three primitives of Next Floor. It's very important to master these three concepts. Then you will also learn about how to write modules, how to make your Next Floor pipeline more modular. You're going to be introduced to configurations and different deployment scenarios and cache and resume, trouble shooting. And for the last section, you will learn about secure platform. So this is the schedule for the two days where you both have each about three to four hours duration. And having said this, if you have any questions, you can go to the NFCore Slack to the training channel. So basically you can go here about join NFCore. I'm going to open a new tab and here you're going to have this link to go to Slack, right? Inside the NFCore Slack, you should go to the training channel and ask your questions there. And there should be people there to answer your questions and help you with the training. Here you can see the code of conduct that you should abide in order to participate in the training. And let's go now towards Light Deck. So again, welcome to this training session. We're going to go at the beginning, we're going to cover some concepts and some background knowledge for you to understand more about pipelines and the reason for Next Floor to exist. So the concept of workflow, it's basically using computers to collect, store, analyze and disseminate data and information. So you have some input data, you want to do something with that, some transformation, run some methodologies on top of that. And at the end, you expect to have some output that you want to have as a result for your pipeline. Not every output is desirable in the end. Maybe you just want to pick some of the output files, but still your pipeline is supposed to be an input, do some transformation and get an output. The first thing that's interesting to mention here is that depending on where you're coming from on your field, you could have inputs that are very different. You could have a lot of very small files. You could have a lot of large files or you could have a mix of them, some small files, some large files and so on. In some scenarios, you could even have like one single file, which is very large, which is not uncommon in data science. But when it comes to bioinformatics, usually you have a lot of files in which many of them are very large. In this case here, we have as an example, one raw human genome having over 100 gigabytes of disk space being used. In these pipelines, it's not uncommon that you have many different programming languages. In each step of your pipeline, you could have Python scripts or R scripts or batch scripts or Perl scripts. You could also have some Matlab scripts. You could have compiled programs. So usually you are going to have a pipeline that's very heterogeneous in terms of technologies that among the multiple steps that it will have, different technologies will be used to run these steps. And also the interaction between these softwares and libraries in other configuration in your operating system, they can be quite complex. So these three things together, they describe reasonably well what these pipelines are, what they consist of and what they do, and also showing how complex it could be to have this pipeline with so many different technologies and complex interactions in large files and so on. And by thinking about it, it brings the topic of reproducibility. Because the more complex your pipeline is, the more difficult it is to repeat the steps in any other machine and to reproduce the same results. Today, actually what happens at any simple pipeline is already composed of so many different softwares on top of different libraries and other softwares that reproducibility becomes really a difficult thing to achieve. We have this mentioned here from this experiment team with reproducibility, a case study of robustness in bioprometics paper by Kim et al. Where the authors were saying that first they tried to rebrand the analysis with the code and data provided by the authors. But second, they just implemented the whole method in a Python package. Because just trying to repeat what the authors did wasn't good enough, it wasn't working. And as I said, even simple pipelines, you can get quite complex if you take into consideration the data, the softwares that are used, the interactions between them, the dependencies and so on. So here we have this metro plot of the NFCore eager pipeline. You have the legend here on the right that basically every line of pieces is one of the input files. One of the input files through the steps where you have sometimes more the output intermediate steps, which are the circle with the black border. You have input files entering different stages. You have conditions in which some tools are sometimes used or not. And this is a relatively simple pipeline. It can get much more complicated than that. So talking about these primitives that I mentioned earlier when I was talking about the schedule of the training, there are at least three concepts that is very important that you understand. The first one is the process. So every step of the pipeline usually consists of the process. So if your pipeline gets the input data, that's one thing, other thing, another thing. These things, these transformations, they're actually next row processes. You can think of them as boxes where something is getting in and then after some transformation, something is getting out. When you have these boxes, these steps, you want them actually to communicate. You want the data to get to go inside the box, something happens, it goes out, but then it goes in the next box and so on. These ways in which these boxes communicate is what we call next-flow channels. And whenever you have next-flow processes connected by next-flow channels, we can say you have a workflow, a next-flow workflow. Visually, that's actually how it happens because a next-flow channel is just a queue, a first-in, first-out data structure in which every input file or every string, depending on how the data is handled, is a channel element. Here we have on the left some files, some data files, and we make use of the next-flow channel factory, which is a special function to create channels. And by using this function, our data is going to be organized inside this input channel in which each file is a channel element. When we have this channel, we can provide it as an input to the next process, and that's what we see here. The thing is, the process is just a receipt, and whenever there is some input data, it's going to create an instance of the process, which is what we call a task. And depending on how you configure your next-flow orchestration, it could happen that if you have three input files in your channel, it will automatically create three tasks that will be run in parallel at the same time. That's what we see here. At the same time, each of these channel elements goes to a different task, some transformation is done, and then we have some output. Here we have output x, y, and z from the inputs data y to the x and z. And actually when the tests are over, next-flow is going to get these outputs and put them back in the channel, which is the output channel. And by having this, it's ready to be the input channel for the next process and so on, or being the final result of your pipeline. But how are these processes written? How are they visually, let's say, in terms of code? So let's think about FastQC, for example. It's a very common software by Formatics for quality control. So usually what you have is some input file, and you're going to run in the command line the command FastQC, space dash q, space, the name of the file. This, in the next-flow process, will basically be that. You have the process block. You give a name to it. Here we are giving the name FastQC. There's an input block where we say, what is this process supposed to expect as an input? Here we are saying it's a path, and whatever is provided as a path, we're going to refer to that to the variable called input. It could be a name. As output, we are also saying this process is expecting as output a path, so it could be a file or a folder. And then we give here a shell globbing saying, whatever ending with underline FastQC underscore, FastQC dot, either zip or HTML, I want this to be the output. And it's important to understand it here because some programs, they are going to create a lot of different files. Error files, log files, intermediate files, folders, results, and so on. And sometimes you don't want all of this to go to the next process or to be stored, to be taken into consideration. You just need the actual result file. So that's what the output block does. Even though there's a lot of output from this translation, you're saying, next-flow, just pay attention to these ones because I want these ones to be sent to the next process or as a result. So that's what's happening here. And the third block that we see is a script block, which tells what the process is supposed to do. So we have this multi-line string with three double quotes, whatever is inside is going to be run. And here you can see that the input that the process is expecting to receive, it's referred at the bottom by using the dollar sign. So here we are saying whatever is given as an input to FastQC, which is a path, I want this path to be provided to the FastQC command. At the end, we need a workflow block, and this is what actually going to tell next-flow what is supposed to happen. Because a process is just a description of a process, and that's it. Nothing happened. You're just describing it. But in the workflow block, you are saying when to call this process, how, and so on. So here we are using a channel dot from path, channel factory, which is a function that creates a channel based on the path. We're giving this path, which is every file in the current directory ending in dot fastq dot gz. We're going to create this channel in which every channel element is a file that matches this expression. And then we're going to call the FastQC process, which means an instance of this process, a task, is going to be created for every channel element from this channel that we just created. So there's some very interesting benefits from using these channels, operators, channel operators, workflows. By using the Dataflow paradigm, by using next-flow as a language, you get some free benefits. A very famous one is the parallelness. It's what we call the implicit parallelization, because you don't even have to know what parallelization is. Just by writing next-flow code, next-flow will automatically try to parallelize your tasks, which means the same process is going to create multiple instances that are going to be run at the same time and so on. That's what it's illustrated here by these multiple arrows. Another interesting thing is the resume feature or re-entrance, which means that if you have multiple steps in your pipeline and you don't want to rerun things like half the pipeline has already been run, you can use the resume flag and next-flow will start from where it's stopped. There are multiple different reasons for why this is an awesome feature. One of them is when you are developing your pipeline, you're developing it one step at a time. You create a step, you test, everything's fine. Then you add a new step, you test, it's fine. Then you add a new step, you test, it's fine. And you keep doing that until you're done and all your steps, they run successfully. The thing is, let's say that every step takes like 10 minutes. You'd run the first step once, it was working great 10 minutes. Now you add on the step, you run it again. Now it's 20 minutes, 10 plus 10. Okay, but you already knew the first step was working, but still have you, right? That's life. So you go to the third step, now it's 30 minutes. So you already spent 60 minutes, 30, 20, and 10 to test these three steps. By using resume during pipeline development, what happens is that you run the first time, the first step, 10 minutes. The second one, it's just 10 minutes because it's going to take advantage of the cache for the first step and it won't run the process again. The third one, just 10 minutes again because the first and the second step is already cached. So this is a great example of your agency feature for development, but it's also very good for daily operations. Let's say you have this pipeline taking like 10 days to finish and when it was very close to finish, there was a power outage or something happened in the server, some bug, something that was out of your control or maybe there was a bug you left there. You go there, you fix it, and you run with the resume option if you restart from where it errored, it stopped. Without that, it would have to run everything again. You would need another one extra week. When we think about clusters like shared computing or cloud computing, it shows really how troublesome this is because it may take a long time to your job to be run again or spending money that you have to spend again because it failed for the first time. Another thing is about usability that whenever you have an extra pipeline, you can share a workflow, the sub-workflow, which means anyone can easily copy paste your code and it's going to work for them. It's very modular and easy to take code from people and people taking code from you. You can have processes as modules, which are very easy to share with other people, but also a set of modules, a set of processes, which will be a sub-workflow. It's been clear so far that NextFlow is a language. We have this language based on Groovy, which is known as Python for Java. Everything written in Java or Groovy and any library can be used in NextFlow. It's a language. It's clear by now. We have these primitives, these NextFlow processes, NextFlow channels, NextFlow workflows, and so on some workflows, but also NextFlow is a runtime, which means NextFlow is a program called NextFlow that will orchestrate your NextFlow pipelines. So we have both language in a runtime. And also a huge community, right? I'm going to talk about, and of course soon, which is a large part of the NextFlow community, but there's plenty of very interesting people doing amazing work, contributing different tools that can make your life easier when writing NextFlow pipelines. So, as I showed earlier, you can write code in any language because between the multi-line strings, the three double quotes, you can put anything there. Some programs, Pash, Python, R, Matlab, Perl, whatever you want to be right there, you can. So NextFlow doesn't care about the programming language you're using. It's just inside the NextFlow process. You orchestrate this task with Dataflow programming, which is this programming paradigm about having queues and so on that makes it very easy for NextFlow to automatically parallelize your code. You can define software dependencies via containers, so you can use Singularity, Docker, Podman, Charlie Cloud, and other container technologies to containerize the tasks of your pipeline to make sure they are very isolated and thus reproducible. But you could also use SPAC or Conda to help you manage the installation of programs, both just by themselves or also inside containers. And then you also have some beauty inversion control support that makes it very easy to integrate Git repositories with your NextFlow pipeline. Basically what this means is that if you have an NextFlow pipeline on GitHub, you can just run NextFlow, run in the URL of the repository and NextFlow will be, like, will download, will pull the repository, download everything, organize in your computer and run the pipeline. And also a very important thing is where will you run your pipeline, the computing environment, right? It could be locally on your desktop, it could also be on Kubernetes and on a cloud with AWS or Azure or Google Cloud Platform. It could be in a shared computing, like a cluster with Slurm or PBS, Turkey and so on. So this support that NextFlow has with all these technologies means that you can write your pipeline in your local computer, in your desktop, you test it, everything is fine and very easily you can activate this to support any of these platforms, meaning that you can run them on AWS or Turkey or Slurm or Kubernetes and it's going to be fine because NextFlow has this layer, this abstraction layer, that allows the same pipeline to be run in different places. So with that we have reproducibility, this integration with code management tools, version releases and so on. You have this portability because by using condo or other containers you make sure what's running on your computer is going to run very similarly it's not the same in other computers. You also have scalability, which is very interesting because you can play with the pipeline when you're developing it in your laptop with pipe samples and if it works it's going to be just the same with a 5,000 samples on an HPC, a cluster, a supercomputer, something like this or also in a cloud with 5,000,000 samples is going to be the same. Another interesting thing is that this open source projects like NextFlow, they're very interesting because you can check you can contribute, you can do lots of different things, it's very nice, but at some point they become so large that to contribute code it becomes very complex. So there's something that you would like NextFlow to have but it doesn't and it's not so straightforward for you to just go there in a day and contribute this new feature. So what we created was a plugin system so that it's very easy for you to create a plugin and then it's very easy to add this functionality to NextFlow. Here we have one example which is an F-validation which is a plugin to natively handle schema files so to validate that the inputs that you're receiving are the inputs that you should receive and the pipeline is expected to receive. With that we have parameter validation, we have sample sheet validation and we even have a channel factory which is from sample sheet to make it easy for you to create NextFlow channels out of sample sheet files. Here you have the URL for the documentation in the project on GitHub. Having said all this in terms of what NextFlow is why it's important, we need reproducibility not only in science but in industry it's becoming a very wish functionality. We have at some point realized that we could benefit from not only a great technology like NextFlow but also a curated set of analysis pipelines and best practices and tools to help you develop your NextFlow pipeline. Around that in 2018 the NFCore community was created and right now it's huge. It has over 8,000 select users with thousands of GitHub contributors, a lot of GitHub repositories, thousands and thousands of GitHub commits for request, issues and so on. So it's a community inside the NextFlow community and it's a very large one. NFCore came with some principles that are very interesting and we have benefited a lot through time from these principles. So the main one is develop with the community. So instead of people being isolated in their institutions right in code, right in their NextFlow pipelines we invite everyone to get together and develop with us so that we can help you with the best practices and so on. We worked on a common template to make sure everyone is starting from the same thing so that we need to help. I already know the terrain so that I can help you better. There's also this very important thinking about trying not to duplicate pipelines within NFCore. So if you want to donate a pipeline to NFCore, for example I'm going to talk more about this in the next slide and we already have an NextFlow pipeline that's exactly the same thing we will not accept that because we have people almost like employed to keep track of these pipelines make sure they are working enough to date we spend lots of money that we got for sponsorship by Microsoft Azure and AWS so a lot of money is being spent in time to make sure everything works and it will make sense to have two pipelines that do most of the same thing being maintained. So we go with this no duplication of pipelines within NFCore. We also created many helper tools that makes it easy for you to actually it's like the NFCore tools that does a bunch of stuff in terms of subworkflows and pipelines and modules and so on to make it easier for you not only to write NextFlow pipelines but also to write good NextFlow pipelines with the best practices and so on. This idea about compatibility so we want these tools to working for any NextFlow pipeline and also to to modelize things in a way that you have these components that are very easy for you to contribute like the planning system you don't have to learn or understand the whole pipeline you could contribute a module which is a process and a step of a pipeline right? With that we have right now almost a hundred different pipelines and I come back again with the idea of no duplication so 95 pipelines means that we have 95 different pipelines doing different types of analysis it's a huge number if you start for a second think about it we don't repeat the same analysis so it's a huge number of analysis that we are covering with these pipelines we have over 50 sub-workflows which are these smaller workflows that do something that is common to many different pipelines so you could think of quality control for example a lot of different pipelines in genomics they would do very similar quality control so we have a sub-workflow for that which means that you can use the NFCore tools with one command to insert that in your pipeline and then you don't have to write the whole code anymore for all these different steps you just take advantage of the sub-workflow that is already written when it comes to modules which are these software wrappers we have over a thousand modules which means there are about something like thousand different tools that you don't have to write the process for them to use on the next flow you can just import these modules and they are already tested and working for you to use for your pipeline step we have some linting features in the NFCore tools to make sure that you are following the conventions and it is consistent and everything is right we also have a schema to do validation and channel and we even have a user interface so that you can easily with your mouse play with the parameters and the input and so on for your pipeline and also some tooling to develop and deploy these next flow pipelines in different places so the idea here is you can create these pipelines from a common template easier for us to provide support you can with the NFCore tools create, install and update sub-workflows you can create, install update, patch, test modules so the idea of patch is having a different version of a module that you want to add something or change something and test are a unit test that you can create or make use if they are already created to make sure the module is working this schema has a GUI so you can very easily with your mouse and your browser make this relatively annoying but very important developing of your pipeline make sure it is what it expects and how it does and so on and also provides the links that I just mentioned and allows you to also download the singularity images for offline use so if you are running your pipeline in a cluster for example this means that sometimes you won't have access to the internet so how can you pull the containers if you don't have access to the internet with the NFCore to download the feature you can patch ahead of time all the singularity images and move to the cluster so that they'll all be there when it's required and then you won't need internet for that so there are many different ways you can participate in the community not only in the NFCore community but the larger Nexlo community we have bite-sized seminars pretty often in NFCore we have training sessions like this one with multiple different ones we had in the past in different languages we had in Portuguese in Spanish in French and Hindi in Spanish so very often we are doing new training sessions we have hackathons at least twice a year we have one soon at the end of March you can go to the nf-co.re website and you can see all the hackathons and trainings there in the events page you could also follow us on social media we are on Twitter, we are on LinkedIn we are on Macedon so we are always posting content there there is the NFCore blog but also the Nexlo blog you can go to nexlo.io or nf-co.re website and you're going to see the blogs there with lots of content for support we have the community forum which is community.secura.io in this website you can ask questions and see questions that have already been answered by other people for discussions we have the Nexlo of course there are two separate slacks but lots of channels are integrated between the two we also have the official documentation you can go to docs.nexlo.io and also mentorship programs in which we mentor people there is being some rounds and soon we are supposed to have another round for the Nexlo program where we can help beginners to do what they are planning to do like pouring some configuration make Nexlo work in the local cluster writing a new pipeline making some work flow making some pipeline work for their needs and so on so having said that thanks for your attention I think at the beginning I didn't introduce myself but I'm a developer advocate at Secura and this is my email if you want to reach me and let's go now to some hands on and the real training material so the material that we are going to use for this fundamental training is hosted at this website training.nexlo.io and when you type this address, use URL and your browser and press enter you are going to see this platform we have multiple trainings here you should go to the fundamental training you can launch the fundamental training here we have multiple sections some of them we are going to cover today some of them are going to be covered tomorrow the main objectives of this training material is that by the end of it you will be proficient in writing Nexlo workflows of course we don't expect you to be able to develop very complex Nexlo pipelines but you should be able to write some Nexlo pipelines mostly because one of the sections is writing from scratch a proof of concept simple RNA-seq workflow by the end of it you should also be aware of the concepts of channels, processes and operators which are primitives in Nexlo it's very important that you have it clear in your mind what channels are what processes are and what operators are it's also expected you will have some understanding of container rights workflows understand that you can run Nexlo in different platforms and also to have some basic understanding of what is the Nexlo community what it consists of and so on there are some old versions of this training that were recorded in other languages in English, Hindi, Spanish, Portuguese and French but there's been some changes and updates to the training material so that these recordings you're not up to date so let's go to the environment setup if you want to be able to run this training material in your machine you will need to have bash installed Java 11 or later up to Java, Git, Docker and depending on the section that you want to try also Singularity, Conduct, Graphics AWS CLI and also an AWS patch computing environment setup but we won't go through this section in this current training the next step once you have the requirements is to download Nexlo here is a single command but you can also type this on your machine, you can use curl and vice versa after you have run one of these two commands you should type this changemod plus x to make Nexlo executable and then move the file to some place in your path, example would be USR local bin, bin and so on so that wherever you are in your machine you can just type Nexlo and it will work because it's in your path now if you don't move it to one of these places you can only be able to run Nexlo from the current folder where Nexlo was downloaded but we won't expect from you to have all this installed or install Nexlo in your machine for this training material we will provide you with a virtual machine on the internet that you can access through your browser using Gitpod, so everything is already installed there is a machine with some free computing that you can play with and test everything you are going to do here today what you need basically is a GitHub or a Bitbucket or a GitLab account to log in Gitpod, a browser to access it and the internet connection so I would ask you to click on this link which is basically the Gitpod that are your website, hashtag and the address of the GitHub repository of this training, so let's click here you will be asked to log in in this case I am already logged in I will just go with the default configuration here and click on continue this will clone the GitHub repository, use some information that we have set up there for the virtual machine it's going to create now the container image and so on, so it's going to take some time in the meantime I am going to tell you a bit more about Gitpod soon you will see a window like this which basically has a simple browser here previewing the training material what we have seen now, we will also see inside this Gitpod instance at the bottom here we will have access to a terminal type of commands and on the left we have this file explorer with all the files that are already there to help us with the training material but also any file that you create so this is what we call the sidebar if you have used Visual Studio before VS Code you will recognize some of these icons and the structure because actually this is a version of VS Code on your browser so you have plugins here, extensions and everything that you should have in your regular VS Code in your desktop you are going to also have here in the web version one way to test if nextflow is installed is to type nextflow info and it will show you some information about nextflow like the version of nextflow when this version was created the kernel version of the system where you are running this nextflow the runtime for grouping the version OpenJDK the encoding some basic information it's almost there it's still opening the browser but it's halfway through one thing you can do here is just close this debug console so you have more time for your terminal and also you can move here to get more space you can click here on this true file icon to get more space here so from now on let's just go through the browser inside the Gitpod instance so environments are topped and we were talking about Gitpod so for Gitpod but you have to know is that there is a paid version of it of course with more powerful machines but by creating your account basically you have 500 free credits per month which is equivalent to 50 hours of free environment runtime using the standard workspace which is what I did here I just said go on, I didn't choose anything but you can also ask for a more powerful machine the thing is this large workspace option that gives you up to 8 cores 60 gigabytes of RAM and 50 gigabytes of storage even though it's a more powerful machine what happens is that this is going to use more credits so you won't have 50 hours but still maybe you won't use 50 hours anyway so some people may choose just to go to this more powerful machine if you are inactive in Gitpod for 30 minutes it will just time out and you'll have to reload the tab or the window of your browser again the interesting thing is that once you refresh it it will be like it was when it stopped so you can continue from where it stopped, you don't have to do everything again you can go to this link that takes you to your workspaces and you can see all the workspaces that you have opened I just opened one so it's only one here but it could have many and then you can clean them delete them, reopen them and so on one interesting thing is that when you see the file explorer if you like any of these files or if you created some of these files or if you added some new file here you can just click with your right button and you can go to download so that you can download this file to your local machine this one interesting thing you can also do you're probably aware of environment variables which are variables that you create in your environment to give instructions to how some software will behave for example and what this line does is to tell Nextflow that we want to run this version here 23.10.1 so let's type this command let me deactivate this ok so by typing that basically what's going to happen is that it will use this version specifically we could pick other versions but this is the version that this training material is tested with so you can also do Nextflow-Version and get the version with some more information so here we know what's the version, the build number when it was created if you want to site Nextflow here's the DOI and also the website having said this we can go to the first section of this training which is the getting started with Nextflow some of the concept sets that you can see here they are actually mentioned in the slide deck that are presented to you previously so what's the process, what's the channel here we have a similar image that I showed you we have a channel with channel elements here they're queued and first scene first out they're going to be added and removed from the queue so you can go to the process and create process instances which are the tasks something getting in some transformation happening, some output leaving and being queued again in the next output channel which is going to be the input channel for the next process and so on or the ending of your pipeline in which you're going to extract the values from this channel and do something with it one interesting thing is that Nextflow has this layer of execution abstraction which means that you write your pipeline once but then you can run it locally on AWS, on Microsoft Azure on Google Cloud Platform on a cluster with Slurm on a cluster with Kubernetes and so on the scripting language is also another interesting thing that Nextflow itself is a domain specific language but the processes can be written in any language so here is the first script the first workflow that we're going to look at it's called hello.nf you can click here on your left let's hide the file explorer for a second and basically here we have like if you go back here you can see the code with some plus and whenever you click the plus you have some instruction some information about the line I'm going to come here and explain to you in this file so basically what we have here I just saw that actually I'm going to hide let's see if I can hide this I'm going to move it somewhere so it doesn't use more space from the terminal window okay and it's also increased a bit the font so here what we have in the first line is what we call a shebang so basically this is a line that tells the script what program should be used to interpret this script code if it's Python we would have a shebang with Python it's show script, show script, Nextflow it's Nextflow here the thing is this is to help the operating system program to use to interpret the code but because we're going to always use Nextflow to call the script this line is actually optional if you don't use it it won't be an issue this first line here we are declaring a variable we are initializing a variable the interesting thing to mention here is that this variable it starts with params. so it's a greeting variable starting with params. this means that even though we are declaring and initializing the variable here in the script code you can actually set this value in the command line by using dash dash greeting so whatever word comes after the params. you can use it to change this variable in the command line it's actually an interesting thing to know because we just saw that Nextflow dash version does something and whenever you have one dash that means you're referring to a Nextflow command when you're using two dashes that means you're referring to a workflow parameter which is this example here we are initializing hello world but we can change that and we're going to see an example soon so as we saw before every process receives always a channel and outputs always a channel so if we want to give this greeting to a pipeline or workflow a process we have to put it inside a channel that's what we are doing here this channel of is a special function we call a channel factory because it creates a channel and here you're providing it with a string which means I want to create a channel with this value here our first process it's called split letters it's the best practice to name the process uppercase because then when you are writing your code and it's not so close to the processed block you can easily see what are regular functions and what are processes because the processes will always be uppercase here we have a simple process in the sense that it has three simple blocks we have what input block saying what this process expects as an input an output block that says what this process expects to have as an output and we have this quick block saying what this process is supposed to do so here we are saying that something will be shared as an input to this process and it's going to be a value the string a number string it's a value it's not a path as output we have a glob here which means one or more things and you chunk on the line that's what this term says one more whatever comes after the chunk underscore and for path we mean these are paths it could be files it could be folders but that's the output why would I have an output block you may ask I mean I have this quick block that says what's going to happen and then I have some output of my process sometimes sorry sometimes yes sometimes no the thing is sometimes your programs are going to generate a lot of files and error files intermediate files log files and you don't necessarily want all of them to be passed to the next process you just want to pass to share with the next process the output of this process that is required to do something in the next process so by defining the output block here we are saying that I don't care what new things appear in this folder but I only care about these ones to be passed to the next process that's what the output block serves there's a reason for it so here you see this maybe a bit tricky command line this is not next log this is just bash you can just do man print for example as you can see it's a bash it's a command line program you can do the same for split so it's not next flow not next flow code nothing extra specific it's just some programs that we are using here for the example and they could be any program so by saying printf hello world for example it's going to print hello world in this screen here we are referring with this $x to the variable that is going to be that has an input to this process so whatever you give to split letters it's going to print in the screen but not only that it will also pipe this forward this to the split command that basically split a string in characters in a group of characters here with the minus b we are saying that we want six characters in each chunk and with the dash and chunk here what we are saying is that with the files that each of them contain the six character piece of the string we want them to be named chunk underline if we give enter here and we list we are going to see some new files chunkAA, chunkAB if we open chunkAA we have hello in a space and if we open chunkAB oops if we open chunkAB we are going to see world so the string was split in files each of them containing six characters no more than six characters of the original string so that's what this process is doing as you see it gets a string as input it runs these commands and in the end we worry about these files starting with chunk underline the next process is called convert to upper by the name you can already guess what it does it converts strings to upper if we receive an input file the output is going to be the standard output which is the screen I'm going to write to the screen and pay attention to this if I need to pass to the next process again here we are just using some bash commands the cat is used to get the content of a file so if you catch chunkAA it shows hello in space and then this TR it does a conversion and in this case here, based on these expressions it makes the strings uppercase that's what happens here but as we saw in the slide deck the processes there are just recipes they're just explaining what's going to happen but it's not doing anything you need the workflow block to tell what's going to happen and what it says here is that with the greeting channel that I created this line here I create this channel here with the channel of factory and the output of this which are these files these files here I'm going to write to the letters underline ch variable then I'm going to call the convert to upper process that makes the strings uppercase the input is going to be the output of the previous channel I'm going to apply a function to it which is called a channel operator because it's a function, a special function that operates in Maxwell channels soon we will understand what it does and I want to store this in the results underscore ch channel then I'm going to use the view channel operator to see the content of this channel next low run, hello.nf and let's see what is going to happen so what's going to happen here is that it's going to call split letters, call convert to upper and then print the output of this channel which is hello world the string was one line it was splitting two pieces and it was made uppercase why we use flatten so let's comment these lines and let's just see the content of this letters ch, I'm going to add a clear here to keep cleaning the screen before running next low again so we have a less dirty screen to work with so as you see the output channel of the split letters is just a list of files, a list of paths because we have here a single process that gets one string but it outputs a lot of different files but if we had a longer string with 36 characters for example we would have 6 files but convert to upper it only receives one path it gets the content of this file and make it uppercase that's why we had two lines before hello and then world to do that we have to flatten this channel because here is just one element it's a list with two items inside but it's one element with a single line by using the flatten we make this not a list anymore but two channel elements we can view this channel here so that we see what's the new content the new way oops, sorry sorry I want to use letters flatten and then view and we cancel this with ctrl c let's run it again so now I haven't and I don't want to run convert to upper yet I want to run just the split letters but before viewing the output channel I want to flatten it before you saw it was a single line between brackets it was a list with two items now it's going to be just the two elements one path per line so we have two elements now in my channel this way when I call the convert to upper it's one for each six characters string let's run it again and then world this is the first script we are seeing here hello.nf and as I said these script blocks they are shell script commands they are command line programs they are bash commands but you could have any language with python so we can open here hello.nf and we are going to see something very similar exact same pipeline but we have python instead of bash but realize it's the same thing I'm getting a string I'm splitting it in pieces and I'm writing two files starting with shell underscore these six characters more than six characters substring for the next one I'm opening these files I'm reading the content and I'm printing to the screen same thing but using python so let's run this same thing happened but now the scripts are using python instead of bash one important thing to mention is that in the script block if you are not using bash you have to use the shebang here in this file so next we'll expect by default for the script block we'll have shell script that's why you don't need the shebang but if you want to use R, Perl, Python, Matbot whatever you have to put the shebang here so that next glow knows which software to use to interpret this code good so when we ran it and we have it here a lot of information appeared now it's screened to have a look is what each of these strings what they mean the first thing you see is that it shows an extra version that you ran this pipeline here is 23.10.1 it tells you the name of the next script that it's launching here is hello underscore python it gives a mnemonic run name to this run which is always an adjective and the last name of a famous scientist here is modest yellow it shows you the version of the domain specific language the next flow language here is TSL2 which is the most recent one and it gives you also a revision idea this is something like a personal idea of the pipeline so it tells you a closed script code this revision is going to change so this is a nice way to track if the script has been tampered with if nothing changed in the script code this revision won't change it tells you what is the executor here is local we are locally running the pipeline where we are typing these commands and it tries to guess the number of tasks here it gets right which is 3 to convert to upper because there are two files and one split letters it's one string so here it's 3 and it gets right but understand that these next flow pipelines can be very complex and sometimes it's very hard to guess to estimate the number of tasks conditions what's going to be the output of a process it's not always deterministic of course if you run it again it will be if nothing changes but from the beginning before it's run it's very hard sometimes to guess how many tasks will be run and then you have a list of processes for each process you have from one of the tasks this hash here this is the location of the task every task is isolated from each other so that you can go to the task folder and see everything that was the input output ever meant everything about a task but by default next flow shows one line per process so when you have two processes this is the hash of one of the tasks here there's only one task so this is the folder of that task and you can go there by default the work directory is called work so you can do work 8E, 8A and you can use tab to complete here and you see here the output files of split letters which are two files it receives a string and it outputs two files chunk 0 and chunk 1 that's what it created here for the convert upper we can do the same thing last work f5 we have chunk 1 so this one was converting chunk 1 if you want to here we have the number of tasks 1 of 1, 2 of 2 this sign means it's been successfully if we want to run this again but instead you want one task per line instead of one process per line you can do dash and see dash log false now you will have one task per line which is great because you can get the hash you can get the hash of the task work directory but usually you have lots of lines like this and it's going to make your screen very dirty that's why by default it has one line per process so here we have the task folder for the convert upper number 2 and this one for the first one so we have all the task folders and this makes it easier for us to find where the files are one interesting thing also to see is that here even though the string is hello world hello the automatic parallelization that next load does is to ask the operating system to run at the same time and sometimes the second task or the third task if we finish before tasks that started before it started first so instead of hello world you would have world hello this is expected and this is because the operating system is doing it's best to be as efficient as possible but it can be troublesome if you are using the position the order of this channel elements to do things in the future so if this is the case what we suggest is that we use a tuple that we're going to see soon so that we have an ID attached to the sample information so let me know from which sample ID that information is another different way is to use a process directive called fair that will make the parallelism fair which means the first might be the first to end, the second to end and so on but this will sacrifice some performance and then it's going to be slower but it depends on what you need, what's your situation right one interesting feature is that you can resume pipelines to only run new thing instead of running everything so let's run again let's open the hello .nf let's run this pipeline it's the we are not using resume or anything so it's running everything again from scratch we have desperate hamano john as a run name ok so it ran the first process has one task, the second process three tasks, good what we're going to do now is that instead of converting to upper we want to reverse this ring again this is just a shell command we could use this oops it expects a file so let's see we have chunkAA here with hello let's do have chunkAA and we're going to have hello reversed ok so we changed here the script block of this process but everything else is the same so if we run this pipeline again with dash resume if we try to use the cache so that it doesn't have to run letters again because it's the same thing but convert to upper changed so it will have to run from scratch so here as you see it was cached it didn't split the hello world message again because it was already split but the convert to upper now is not converting to uppercane anymore it's now reversing strings so it had to run the two tasks again if we run this again with resume without changing the convert to upper body now everything will be cached because nothing changed so there are two very nice things about the resume feature I mentioned it already but just we kept in here when you are developing your pipeline and you make one step work and you add another one you make it work you add another one you don't have to keep running everything again from scratch by using resume if you use the cache it will save you time computing and so on also when something breaks you can just fix and run it again with resume and if you start from where it broke not running everything again now that we played a lot with this hello world default with the message we can run it again but with two dashes we are going to provide a new greeting message we could say next load developed by Sekira and now we have very long string it's going to be split in chunks of six characters with lots of different files even though we use resume it wasn't run before with the string so even though we said that resume it everything was run it was run from scratch and now we have the whole sentence reversed if you run it again now everything is going to be cached because we run this once already we didn't change anything else so now we're going to have one of one cached six of six cached here if we wanted to see one task per line again we can do that and see that law false now we have one test per line instead of one process per line we have the task hash folder for all the tasks good here we are just looking at all this that we did already in a in a DAG format using a direct recipe graph we have some string we put it inside a channel a FIFO kill right a first need for self process consumes it does something has an output here they are two items in a list we queue it and we flatten it so that we have two elements instead of one element with two items two elements now and then we pass this to convert to upper that's going to have a single output which is the screen the standard output it's in a channel but then we want to view it view it and that's why we see the hello world in this screen so ending the section having some basic understanding of an actual pipeline input output script block the output is going to could be the input of the next process and so on you can have in the script blocks shell script bash software that are compiled script like Python or MATLAB Perl and so on so that's an easy example of an extra pipeline using programs that are compiled but also we saw some Python and some shell script so let's go to the next section which we will help us develop a simple proof of concept RNA-seq workflow so in this section we are going to write our own natural pipeline for the first time it's the closest so far we will do to a real pipeline it's going to be a simple RNA-seq workflow still far from the complexity of the NFCOR RNA-seq pipeline for example but still we're going to have a few steps with real data not really real data but still so basically we're going to have a first step in which we create an index out of a transcriptome reference file then we're going to perform some quality controls in our samples then we're going to perform some quantification and at the end we're going to create a full single-modeqc report with logs and output from all the other tools that you use in this pipeline the interesting thing here is that if you go to the file explorer you're going to see script 1 to script 7 and basically these are the 7 steps we're going to go through in this simple RNA-seq RNA-seq workflow chapter every step we go we're going to go to the next script which is going to bring new things but being built on top of the previous one in this pipeline there are basically three tools that we're going to use Salmon is one of them it's a tool for quantifying gene expression but also it's going to create an index based on our firstcriptome reference file we also have the fastqc tool which does quality control of our samples and then modiqc which is going to search a given directory we're going to prepare for it and so on and based on that it's going to create our main report of our pipeline so we're going to start simple let's open the script 1.nf it's very simple no process blocks, no workflow blocks not really nothing we are just creating three parameter variables which means because they start with param start we can change them we can override these variables in the command line by using dash dash reads or dash dash it has here this variable dollar sign project here which is the project of the pipeline and because we have double quotes it means we want to replace this variable with its contents in the end we use the groovy function println to print reads and the content of this reads variable here if we run the script let's see what's going to happen it's going to expand this variable but everything else is going to be the same because it's just a string if we run this other line here which has a new value for reads it's the path for lungs instead of guts it's going to replace this ending of the string with the equivalent lung so in this section in this section in the next of our train today there will be some exercises and what I recommend you to do is that whenever I say it's going to be an exercise and I read the question you should stop the video and try to do it yourself and if you found the solution or maybe you got stuck in some part of the exercise you just come back and play the video and you're going to see the solution and my comments about it so the first exercise is that we have this script 1.nf with three variables reads script 1.scorefile and moduqc and the exercise is telling you to add a fourth parameter which is out there a fourth parameter which is out there so I'm going to open and it tells you to give the value results a string I'm going to open the solution in three two one go it's basically just doing that another thing is that instead of using printlm to print the content of the variable in a new line we could also use log.info which not only prints to the current to the standard output to the screen but also prints to the log file I haven't mentioned yet the log file but soon when we are studying the script 2 or 3 I'm going to show you the log files among other things but for now we can use the debacle the tree debacle to the multiline string as we do in the script blocks of the processes and this is going to write to the screen but also to the log files so in this new exercise it asks you to modify the script 1.f to print all of the workflow parameters not only reads and using log.info instead of printlm there's an example here in the solution pause the video I'm going to show the solution in three two one so basically that's what we do one interesting thing is that you see quite often in extra pipelines that we use indenting like spacing to make it easier to read code but when we print it to the screen we don't want the spacing so that's why we are using here the strip indent true because it's going to strip the indentation and make everything just like you are seeing here starting at the beginning of the line so in this step you've learned how to define parameters in your workflow script how to pass parameters by using the command line with the dash dash and also how to use the log.info to print information and save it in the log execution file having said this now we are going to the first real step of the pipeline which is a process we are going to create the process that creates an index out of the reference for script on file we are going to come to the file explorer and open the script 2.nf as you can see the beginning is the same of the script 1 we are starting from where we stopped in the previous step but now we have a process named index and also workflow block as you've learned already the process block doesn't do anything it just describes what the process consists of but you need a workflow block to tell next row what to do what processes to call this index process is very simple we have an input block saying that it will receive a path like a file or folder a path something and we are going to refer to this something to the variable transcriptome for the output it's also going to be a path but we know the name of the folder because we can choose it based on the salmon command line so we are going to say that the output is a folder named salmon underscore index in the script block we are going to call the salmon command with the index option passing some number of threads the path to the transcriptome file and we are saying we want the index to be in a folder named salmon underscore index this test CPU is very interesting because anything that starts with task that means it's a process directive of this current task process directives are commands that you give at the beginning of the process block and they let you define how things are going to be executed so for now let's just run this script number two because there are some things we have to fix before we go one so an error will occur because we don't have someone installed in this github instance on purpose we are not supposed to have all the software installed in a computer so here is going to be an error let's try to interpret that it says there's an error in the index process it was caused by the exit status 127 which is the code for command not found and indeed when we see here the command error it says command not found good what we need to do if we open the next flow.config configuration file for next flow we can see a few things one of them is that if we are using containers next flow should use the next flow slash rna6-f container image because we are not saying here any container repository, container registry by default it's going to look for docker hub the thing is we are not telling it to use containers we just use it saying that if containers are being used use this one to run this pipeline with docker for example you have to type dash with dash docker and by doing that now it's going to run the task inside a container which has someone and everything is going to work fine of course you don't really want to have to write dash with dash docker every time you want to run a pipeline with docker so instead you can just go to the next flow.config file and at the end say docker enable true and by saying that you don't need to use the dash with dash docker you can just run the script simple way and it will be using that container image so now that we have the task here let's do something interesting let's go to the work directory run d8 tap to auto complete inside it let's list all the files and as you can see just a bunch of stuff we have a symbolic link for the input file that points to the original location of the file but we also have this someone on the score index folder which is the outputable process there's still a lot of hidden files here we know they are hidden files because they are starting with a dot and we can just do a last we won't see them the dot command dot begin the file that is created whenever the task really started so if you are debugging your pipeline and we don't know if the task started or not you can check for the existence of this file then you can also look at the dot command dot error for errors it's going to be registered there there you can also look at dot command dot log which has the logs that you use with log dot input for example you have the dot command dot which is going to say anything that's printed to your screen to the output you have dot command dot run which is like a very powerful script that contains a lot of functions that X-File needs to make sure your pipeline is going to run locally in the cloud with containers without containers and so on a lot of shared scripts here we're not supposed really to meddle with this file but sometimes you want to see what's the docker command that's going to be used and so on that's the file you check and then you have dot command dot SH which is most of the time the file that people come looking for by opening this file you have the actual command line that was run as you can see the variables have been expanded, replaced and so on so we really know by checking this file what command exactly is going to be run in the end you see here instead of task dot CPUs we have one and that's because CPUs which is the number of CPUs you want to request to the operating system to use for that task by default it's one but we could just actually come to strip file and here at the beginning we can say CPUs 4 saying that you want next row to script 2 you want next row to request for CPUs for the task by running now the script it's going to create another task directory we can go there and see what's inside and now you see it's been replaced by 4 so that's what this task dot CPUs does so now the next exercise is asking you to print the output of the index channel index process which is this index channel that was created here let's open so one exercise is to bring to the screen the content of this of this channel so I'm going to open the solution in 3, 2, 1 it's just basically adding this dot few channel operator to the output channel we can print it here from this pipeline again and you're going to see a path to the script on index now we already did this actually to change the CPUs to something else so we can see the command dot SH there's this command in the batch called 3 that helps you see the structure of some folders it's good and here in this tab we learn how to define a process executing a custom command how process inputs are declared how to view a channel and how to add a directive to a process even in CPUs here in the next tab we're going to learn more about a channel factory which are special functions to create next-world channels one called from pyro pairs based on the name you already know that it's used to create to load pyro pairs to an next-world channel so here let's open this script 3 and basically it has a bit of script 1 and the channel factory here channel from pyro pairs to create based on the path it reads a channel that's going to be stored in this variable we have an exercise here to add this instruction to see how it looks like so I'm not going to stop it, this one is very simple basically we're going to add this in the view and if you want to pause for a second to try to guess the structure of this channel you can pause now, but I'm going to show in 3, 2, 1 so here we see it's to a channel element, everything is one line so we know it's one element but it has two items it has a value, which is the beginning of the filename gut for the other one would be lever and lung and the second item is a list with a pair of paths which is the single one reading here, we have the first one and the second one we can override the reads parameter as we know, here I'm going to use a star so we want gut 1 and 2, lever 1 and 2 and lung 1 and 2 and we're going to see 3 rows like this 3 lines like this lever and lung, you see gut, that one, gut 2 lever, lever 1, lever 2 lung 2, lung 1 good another thing you can learn in this section is to use the set operator to define a variable so here you can click on these green words, which are all links to learn more about the set operator but the idea is to replace the assignment symbol the equal one to define variables some people like more one way, some other like more the other way I personally like this one, because I think it reads better I'm going to create a channel from file pairs and I want to call this channel readPairCH so other people that like the equal sign they're going to say I want to create a readPairCH variable that contains a channel from file pairs so it's a personal preference there's nothing special about any of them another interesting thing is that channel factories and operators they may have options here we are mentioning one of the options of the from file pairs channel factory which is check if exists if the path exists so here we have a link in case you want to see more, but let's see the official docs here this is from file pairs channel factory and you see there are many options check if exists, follow links flat, hidden, max depth size and so on so whenever you want to learn more about some operators or some channel factories go to the official documentation docs.nexplo.io to learn more from them so the exercise here is to use the check if exists option for the from file pairs channel factory to check if the specified path contains file pairs and I'm going to open the solution in 3, 2, 1 you just use the comma the name of the option column and the true, false or whatever depending the value that the option expects in the end we are using set to save this as read underscore pairs underscore ch channel so in this step we learn how to use the from file pairs channel factory to handle pair files how to use the set operator to define a new channel variable and also how to use the check if exists option to check for the existence of input files in this next step we are going to perform expression quantification so we can open this with 4.nf and we can see everything come so far like the beginnings and validation the parameters the log info the index and now we have the quantification process this one it's interesting because for the first time we are getting multiple channels we are getting two channels the first channel has a path for the index and the second channel is a tuple which means it's multiple items the tuple in which the first item is the sample ID and the second item is a list of paths so based on that we already know the second input channel is actually the one we create the channel from file pairs because that's the structure we have a sample ID like gut, liver, lung and then a list of reads of paths the output is going to be a folder the path and the name is a sample ID it's going to be gut, liver, lung and so on and the commas that are some index is going to be some quant now same thing about the threads now we are providing the folder of the index and because we know reads is a list we can get the first path with reads brackets 0 and the second path with reads brackets 1 and also we use minus o to create the output folder which is going to be the sample ID gut, liver, lung in this example here for the workflow we are creating here the channel and we are calling quant with the index ch which is the output of the index process and the read pairs channel here if you run this it's good for so we can run again the 3 actually the 2 have also the index right it was the 2 so now what we are going to do is to run the fourth script with resume which has the quantification model so because we had run already the index process it's cached here that's the power of the dash resume you don't have to do everything from scratch you can only do what hasn't been done yet if we run it again everything is going to be cached now because we just ran the quantification process another interesting thing is that we can just to oh it's actually written here run with all the samples not only gut and what's going to happen is that it's going to be one cached which is gut but then 2 and 3 which is lung and liver will not be cached we are going to see it here see 1 of 3 was cached but then we need it to run for the other ones here we have an exercise it's asking you to add a tag directive you can click here to learn more about it to the quantification process to provide a more readable execution log I'm going to open this in 3 2 1 go it's also interesting that we can always run this pipeline with un-c-log-foss we actually put this in the quantification and by doing the dash nc-log false we are going to have every task in its own line instead of a processed line and because of the tag we know now that this cached, this cached cached is actually someone on gut this one is someone on lung and this one is someone on liver another exercise we also need to click here and publish here to investigate this process directive but this one is going to publish the files you think you think they are interesting to a results folder I'm going to, it's asking you to do specifically for the quantification process so that only the outputs of this process will be published to a results folder I'm going to open a solution in 3, 2, 1 it's basically just giving a path in the mode, here it's saying copy you don't want a symbolic link, you really want to copy the files from the work directory to this out here on these results in this tab you've learned how to connect to process together by using the channel decorations in this case we even had a process with two channels as input you learn how to use resume to skip cached tasks steps how to use the tag directive to provide a more readable execution of code you're directed to store process results in a path of your choice in this next step we have quality control with fastqc you can just type this command here to resume script 5 part of it was cached already, fastqc wasn't because it's the first time we are running it and that's basically that we can open this script 5 here so we can have a look basically we have this new process, fastqc it's using tag also as input we have a tuple sampleID as the first item and list of paths for the second one obviously that's again from file pairs channel factory but the output now is sort of a folder with the sampleID name we actually want to interpolate with the rest of the string so that the folder name is actually going to be called fastqc underlying gut, underlying logs or something like this but here we have not only single line of command, we are running mkdir to create a folder because fastqc doesn't create it by default and after creating that running fastqc to do the quality control for these samples and to store the results inside this folder for the workflow we just call it on the output of the string file pairs channel factory and that's it in the next step we're going to work on the modicufc report so we can open the script 6.enact and you're going to see here a new process which is the modicufc basically get everything in the current folder, in the test folder and as output create a single file which is an HTML file the command is basically modicufc. and it's going to take care of everything in the current folder which is the test folder so we have to move things there for the workflow block we have this line here which is slightly computing so let's create a snipped file and play a bit with this channel operators the mix, the collect so you understand what's going on so let's for example create here a channel from 1 to 5 and we're going to call it numbers underscore ch we're also going to create one from a to 3 we're going to call it ladders ch if we do ladders ch view and run that, you're going to see one ladder per line because they are each a channel element the mix channel operator basically it combines two channels or more so we can do mix numbers ch view and with that we can create now one single channel but with lots of elements and every element has its own line the thing is as we've seen for the multi qc process it expects everything to be in the folder so I don't want multi qc to be ran ten times I don't want for every tool, every sample to run multi qc once I want to collect everything in a single element so that I'm going to call multi qc once one task and this task is going to print the full report so for that I'm going to use the collect channel operator by using the collect all these channel elements are going to become items of the same single channel elements and we also learn about flatten that does the opposite it gets a single element with multiple items and you want to convert this to multiple elements I can do flatten here and you're going to see it's going to come back to this vertical format let's say and then we could collect again and flatten again I'm just trying to show you how opposite they are and that's why we have the mix here and the collect because we want everything to be provided to the multi qc at once so let's run this here because we have published here for here the multi qc and also here for qualification what this means is that there's going to be results folder which is here containing the reports we have the multi qc report we can preview it here and we also have the output task you see here get liver and so on here is the multi qc report as you can see there's a lot of things from different tools that you can see and the more tools you use, the more you're going to see you can go to multi qc.info which is the website and you can see the list of over 100 tools that are supported automatically you will be interpreted for the logs and results to be considered for the multi qc report you can also handle completion event which means that you want when the pipeline is really finished it sends you a message or something so that's what we have for the script 7.nf basically it's going to use this ternary operator here so it says the workflow success it means the workflow finished successfully it could be true or false if it's true do this it's log info right so log info done open the following report in your browser if it's false then show this one something went wrong you can do many different things you could for example do email notifications you configure your SMTP server and then when the pipeline is over you can mail a mail account saying you know the pipeline has finished there was an error or not this is the link to get the report you could do this you can go to the mail documentation here for more information on that and you can also have custom scripts so actually many times you want to write your script in the script block here like we didn't go to the fastqc we have these two lines create the folder run fastqc sometimes you don't want that that's what we're going to do here we're going to create in a project directory a folder called bin we're going to go inside we're going to create this fastqc.sh file store inside you can see by the code here that what's going to happen is that it's going to store in simple ID the first argument for the command line call of the script and the second one is going to be the reads create the folder run fastqc good so we can give running permissions for this file then we come back we open the script 7 and for this fastqc part we're going to remove that we're going to add this one we're calling this fastqc script now the first argument in the simple ID the second one is the reads so when we run this that's what we presume so we don't waste time lots of it were cached fastqc wasn't because we changed the code even the revision changed and what you can see also we have to be rerun because fastqc is new so everything worked as expected good the next part is metrics and reports actually there are many different options you can turn on to get some nice information about your pipeline execution one of them is the width report which gives you a report width trace it gives you a ksv file with lots of information for every task the amount of CPU that was used memory disk you have width timeline that shows how long it took to move files around tasks and so on you have width bag that creates a direct the simple graph like a visualization of your pipeline and that's it so it asks you to execute all these things let's do that but instead of using png for that I want to use html which means it's going to use it's going to use I forgot the javascript mermaid for rendering the so when we run this command some files are going to start to appear here during the execution of the pipeline so we get here the report we can click on the right button and go to show preview to see the report this report gives you a lot of information about the run when it was run the number of tasks and if they all succeeded the actual next row command that you typed CPU hours, launch directory lots of information, lots of plots it has a raw usage for every process it also shows the person allocated which means that if you ask for 10 CPUs and only one was used you're going to see it here that you actually need to request so much resources you have the same thing for RAM and for job duration and IO reading write so the task also the person of the CPU the container I mean there's a lot of information the container that was used so it's a very rich report about pipeline you can also see a timeline again let's show the preview the timeline the pipeline shows you in gray how long it took to move files around and in blue you have the actual execution of the process like the tasks you can also go to trace which is the tc file with lots of information about the run let me see if there's anything else all the dag the direct the secret graph is the visualization that we have for the pipeline and here it is started with channel from file pairs and also the transcript tone came to index both went to quantification the reads were also long sent to the fresh qc then the output of fresh qc in quantification are provided to model qc that generates the final report so another interesting thing is that you can run a pipeline directly from github it's very nice so you can just do next glow run next glow IO because you didn't say any you didn't specify github or bitbucket or github it tries github by default that's why it's doing it if you don't want to give the full URL but still ask next glow to go for bitbucket or github you can use the dash hub option in the command line so here by saying next glow run it's pulled and downloaded the pipeline because it hadn't been downloaded before and ran it and it was put as a charm you can also use next glow info to get some info based on the manifest file in the github repository and here we're going to see the name of the pipeline the repository, the local path where it's stored it's in a .nextLow home folder inside assets the main script file is going to be main.nf and also the description of this pipeline along with the offer name here you can also see the revisions the branches, it's a monster but by providing the minus r you can pick a different branch so let's pick here version 2.1 and we want to run it with Docker so basically in this section in this last step we learn how to execute a project directly from github and how to specify a specific revision of the project sometimes the default branch of a github repository is not master, it's main sometimes it's a custom name that is created so being able to provide this with minus r is very important let me see there's something I missed here well I think that the main message here is that the processes are connected through channels sometimes you can have multiple channels for the same process we can open here the script 7 to see that so you see here for the quantification one we are calling this process and providing two channels the first one has the output of the index process which means this one has a folder with the index for the transcript on reference file and the second element is the second channel sorry, it's a read-pair channel which contains the sample and the reads of the samples you connect them this way by getting the output of the previous one and so on sometimes you need to mix channels which means to make two or more channels into one and sometimes you want to give all of them at once to a single task in this case you're going to use collect you want modic you see to create one report of many you have all these items in a single element and you want them to be split into multiple elements and you're going to use flatten as be used in the getting started section for the split ladders and convert to upper processes I open there everything here right with that we finish this RNA-seq workflow the proof of concept which is the next one which is the dependencies and containers section that's going to help us understand how we have to manage the installation of the software, the versions, the interactions between the dependencies that they themselves have and also how to isolate these tasks and containers so that we can really achieve reproducibility in our pipeline the manage dependencies and containers section is a very important one if we think about reproducibility so the idea is to make it easier for you to install and manage all the software that you have in your pipeline and also try to run that to run them in an isolated manner so if you think so far we've been using fastqc and someone and modic you see and it's a very simple example in terms of interaction of the software but still you have to install them there's a specific version you want to do it and once they are installed and working your computer you want them to be run isolated from each other so that they are not going to interfere with each other so what in this section that's what we are going to do we're going to start with docker and we're going to do things in docker way and then after that we're going to see how it's much easier with next row so I already briefly mentioned that containers are this isolated areas in your computer I don't want to get into much detail here but think of it as a way to have a small physical space in your computer where things occur, tasks are executed without interference docker is a program for that and if you have a container image like a recipe let's say already ready for you to use you can just docker run in the name of this container image and it will be called because it's not local it's going to be downloaded and run for you so we don't have the hello world container image locally it says that here we're not going to find it locally so it downloads it it expects to be on docker hub which is like github but for docker container images it downloads and then it runs and we have this message here a very single container image it's like a hello world for docker if you want to just pull and run you can just docker pull in the container name it's going to work and if you want to list all the images in your computer you can type docker images it's going to list them all hello world which we just pulled but also next flow slash RNA-seq and F that we used before so here as an exercise it asks you to pull the publicly available Debian colon both slim container image and check that it has been downloaded you can do this with the commands that we just learned I'm going to open a solution in three two one go so basically we're going to use docker pull command to pull this image we put the name here debian colon it's one it's going to pull and then after that we can just type docker images and it's going to appear here so yep it worked you see that the first part of the container image is the name of it but after the column you have the tag which here is the both sides name is like a version of Debian so we brand the hello world container image we saw some text on the screen but what if we want to get inside the container and interact there are running commands and so on for that we can use the dash it option and bash at the end to run bash that is the prompt the command line so we can type commands so if we type this here we are inside as you can see here we are root we can do who am I it's going to say root we can list the files and we see the root file system here of the container image if we do ls home for example we have nothing but by typing exit and leaving the container if we do who am I to get pod if we do ls you see all the training folders and if you do home in this case also empty but it's clearly separated run from the other how do we create our docker image this recipe right the first thing is to create a docker file which is a file with a specific docker syntax to create an image whenever you use docker to build an image it's going to take into consideration everything in your folder so it's the best practice to create a new folder I'm going to just call it container image here I'm going to go inside it from it I can create my docker file it has to have this name I'm going to paste here I could put here my names Marcelle I could put it here on my email here it's telling you from what image it's going to start building your new image I'm using as a base the WN bullseye slim and then inside of it I'm going to use apt-get which is a package manager for indepient to update and install curl and calse which are two programs this end here is just for calse to be easily accessible in your path or something I'm not quite sure remember now you save and now you can use this command a docker build to build your image the name that I'm going to use here is just my-image minus t is the tag name and dot means the docker file is in the current directory that's going to build after that we can type in docker images to see what images we have in this machine we should have now the my-images my-image is here cool and here it asks you to create our image containing calse but it's what we just did this is the docker file we created we're going to use command now we can do docker run my image and then we're going to say calse which is the command and some parameter for it we're going to say hello nextflow users and based on what the calse program does it should draw a call saying what you set it to say here is hello nextflow users if you want to add more software you can just change the docker file here for example I'm going to add this command run another command to download the salamone program you can just go here you can do it after run or after everything you can do it after everything here and actually I should open it with pscode sorry for that so I'm adding here the run to download this file untar it and move the binaries to the right location so that you can type salamone from anywhere in this github instance and if you find the github the salamone binaries and now I'm going to rebuild this with the same command I had before but now my docker file is different so it could have some cache but you see the first two steps are cached but the third one which is installing salamone is done so it's doing it now here doing what I just did now we can do docker images it's going to show that my image 11 seconds ago so it's clear that it will be created and now what we can do is to do docker run my image salamone which is a program and a parameter for that which is going to be the dash dash version I want to see the version of salamone and here it is I could just like before run it interactively my STI, my image and bash and from inside the container I can do salamone dash dash version if I exit the container and type salamone dash dash version it's going to be an error because I don't have salamone installed in my machine so command not found as you saw it separated the container from the github from your computer so they cannot see files from one but not the other what you can do usually is to mount the file system so that you can see it inside so with this command here for example what's going to happen basically is that I'm going to run my image with the command salamone index minus t in the address of introscriptome.fx here pwdgs the working directory which is the NF training then we go to data glintroscriptome.fx it should work it won't because the container cannot see my file system so here it says there's the error it says the file does not appear to exist I mean it does exist in a computer but not inside the container so what you have to do is to mount a volume saying this file in my computer I want to be at the root of the file system in the container so now finally the it says it appears to be a directory I'm in the wrong folder I was inside the folder I used to create the container image so now it worked good but where is the in the index folder it doesn't exist because even though I made the file be accessible inside the container I didn't allow the container to see anything else so it cannot write back the folder to do that I'm going to create a volume which is the whole current directory there and now finally if I run this I can finally get inside the folder with the results so now my container can see my folder and also write back to it but that's not usually what you want to do to mount the whole volume but it's solution here you can also use environment variables to do the same thing and this is going to work now this step I'm going to skip this one it's basically just like you can put the pipeline or your source code of some software in GitHub you can put your custom personal container image on ducahub this takes a while we have signed up for an account I'm going to skip this now that we were able to run some container images run some commands make it see our file system in vice versa what I'm going to do is to run the script2.nf as we saw before so script2 basically has an index process that is going to call the sum of program to calculate the index of the transcriptome and we have a container image for that but what I want to do is because we install someone in a container image I want to run this script2.nf with my image and basically what's going to happen is that again, I'm in the wrong folder and now by running this with Docker providing the name of my container image which is the one I just created with someone installed everything is going to work as expected and for the first time now I'm going to run the container image that you created we have a whole section here with Singularity you can also play with Singularity in this gitbot instance what I want to spend time with it here so Singularity is another way of managing containers at the beginning a lot of people in HPC clusters in some environments they chose to use Singularity because it was safer it allowed you to have files as container images which is easier to process if you have a cluster without access to internet you can just have these files it had several advantages on top of Docker mostly security at the time but now Docker evolved a lot and personally I don't see much benefit from using Singularity if you can choose because Docker apparently has everything that wants Singularity had against it so let's just go to Conda which is definitely not is definitely not as good as containers for reproducibility you don't really have an isolation Conda is more like to make it easier for you to install and manage installation of software but it doesn't give you the isolation but still we're going to play a bit with it here so if it's the first time you're using Conda you have to do Conda init and then you open your terminal again I'm doing that here by typing bash so it's going to load Conda it's going to be the name of the environment in parentheses here you can just type Conda install and keep installing the software but an easier way is to have an environment file like the one that you're being shown here but you can also have the file you also have the file in the file explorer so basically it gives a name to the to the Conda environment some Conda channels some repositories where to find these packages and the dependencies and the packages in the channel that you chose in a specific version so by asking Conda to create an environment with this file it will basically create some set of folders in your computer and install these programs there in a way that if you type someone it will still give you command of file but if you activate this environment it will be able to find it but everything is shared like file system libraries, configuration so it's not good for your productivity although it's better than not having anything for that but if you can use containers it's much much better so here we are creating the Conda environment it takes a while usually like a few minutes at least because think of it for a second we have like Salmon but for Salmon to work it relies on three other softwares that do some small things for Salmon and one of these three softwares also depends on other software to do that so you have like a network of software interactions that makes it not so trivial to make sure one program will work so when you ask Conda to create an environment or to install a software if you create a dependency tree to install everything that's required for all the requirements of the software that you are trying to install it takes a while at the beginning because it's trying to build this dependency graph but once it's done and it has a list of all the files it needs all softwares it needs to install then you're going to see a pretty long list of softwares that's going to be installed for these four softwares to be available and working for you so here you see lots of different softwares being installed now it's the easy and quick part like resolving the dependency graph takes longer usually after this is done we can just run the Conda End List command to list all the environments the Conda environment we have in this machine now we're going to run our command to list all the environments we have two base which is the default but now we also have the NF tutorial now which is the name that we provided here right it has an asterisk and I can start in the base because the default is activated but we can both do Conda deactivate to not have any environment loaded or we could just activate on top of that one and it would work so let's type now command not found Conda activate an app tutorial now if we type someone it's installed in the version that we asked which is 1.5.1 now what we can do is to run the script 7 the final script for the RNA-seq pipeline we built in the previous session but now we set up with what we're going to say with Conda and we're going to provide this path which is the path for the NF tutorial Conda environment so apparently everything worked we are not using it over here we are using Conda as I mentioned before it may take a while for Conda to resolve so some developers ended up creating a program called Mamba which uses Conda software packages and Conda repositories and channels and everything else but it's much faster for the resolving so some people they like the Conda repositories but they use MicroMamba to manage the installation and configuration of these programs then we get to the gold standard when using when manually creating your container images we just use containers but inside the container image you use MicroMamba and Conda to have the software installed and managed so that's what we have here we have a Docker file but instead of using apt-get we have MicroMamba here creating Conda environments and installing software for the environment file that has the Conda channels and everything else so here we just like clean the solution afterwards, install, don't ask just install everything the name is going to be NF tutorial and the file is this one how can the container image find the file inside it we're going to use the copy instruction here to copy the file every dot YML that we have here to this folder inside our container image that's how it's going to work so we can just copy this let's go to this folder container image that we've created I'm going to remove Docker file data oh I created this with a container so I cannot move it oops so let's create a new docker file inside let's put this let's save let's use this is an exercise but we can do it anyway together let's build this container image I'm not going to push this to docker hub or anything or some issue here we have to say that YML it's in a folder behind the one we are right now it's still an issue not found for the sake of it let's just create this container in the previous folder so now it's installing the thing you'll see how it's faster than the condo after that we can open the next flow dot com fig and instead of next flow RNA simulation F on docker hub let's just do my image my dash image which is the one we are creating right now with that done we can just run see it's already installing so it was much faster to build the dependency graph with that done we're gonna actually I think the occurs enable already yeah we don't even need to do the with docker here we can just run this part here and with that we'll run the full pipeline we built before but now using a container image built with docker with software managed by macro mamba using channel and condo environments in this step we learn how to create condo like environments using macro mamba and also how to create docker containers using micro mamba even though my condo environment is turned on because docker dot enable is true it's going to use docker and not telling next flow to use condo not using dash with dash condo here everything is working but you know do I have myself to build my container image every time so actually depending on the container image that we are trying to build depending on the software that you want to use probably someone already built this container image right so there is a project called bio containers and they try to have a container image ready for every package and version that you already have on condo so fastqc for example in the version 0.11.5 you have this on condo so you have this on bio containers also so you can just type this command for example you don't have to create your container image with fastqc it really exists you just can use it right away with natural so here I'm going to docker one it's a finish installing actually pulling right downloading the container image so we can play a bit with it so let's do an interactive access to this container once inside we can do fastqc dash dash version and here it goes let me deactivate this on the environment here if I do fastqc not on so bio containers is very interesting if you want to use commonly used analytics tools because you don't have to create your container image you can just use the container image created in bio containers but still that's not that good actually what we have right now is a software by sikara is an officer software called wave we just go to wavecontainers.io here we have the wave documentation with plenty of information on how to use wave so wave is a service that is going to build containers on the fly for you you can just give your content directives on Maxwell or your content environment fly or something like this and remotely it's going to build your container image on the fly during the execution of the pipeline and provide it back for you very very quickly it's a very amazing feature maybe it's not so for beginners but still it's very very interesting and you can also see the wave showcase I think you have here wave that showcase sikara labs name space and github a lot of different things you can do with wave so you can authenticate private container repositories during the pipeline execution you can build and deliver next to module containers you can build and deliver module containers to a private repository you can build a container based on the packages on Docker files a lot of different things you can do here you can interactively bug you can interact with the bug remotely executed tasks as if they were executed in your machine because of containers and everything else it's a very powerful tool for using container images for using containers in your pipeline and to achieve reproducibility so we mentioned a few times already about this process directive we use CPUs and among others and we also used in the nexo.config this process.container so this is the way to say that all the processes in my pipeline I want to use this container which is my image but actually you could just go to the body of the process the main.nf file in your script files and add container in the name of the container here in quotes be aware that you don't need an equal here you need it in the config file you can open it again so you can see it equal my image here it's not required you can just do space in the name of the container and this process specifically is going to use this container the conda is the same you can have a directive just like this conda and give you the name of the conda package in the version you can also add a channel so that when you run your pipeline with dash with dash conda it will take advantage of this information to build an environment install the software for this task specifically and run it with that we get to the end of the dependencies and containers section I think the important knowledge to grasp from this section is that installing and managing software installations is not straightforward it's not so easy and you need softwares to help you with that api.get is one of them you also have conda and micromumba which is a faster version but even when you are finished in the installation of the software you still don't really get reproducibility because you don't have isolation for every task for that you need containers and there are different technologies that provide new containers like docker and singularity and many others like podman, charlie cloud and so on but docker is the most famous one with that you can create isolated areas in your computer where you can manage the installations inside with api.get and conda and micromumba and so on and I believe that this is very isolated including file system and everything else so you can really make sure this task is going to run the same way in my machine in yours, in the cloud in the cluster, in different machine because it's very isolated and everything required for the task is managed by docker and api.get and conda and micromumba and so on with that we go to the last section of today which is an introduction to groovy the programming language which is at the root of nextload so nextload is a domain specific language as we learn on top of groovy which is on top of Java so if you have a library that works with Java or with groovy it works with nextload for this final chapter today the first part of the nextload training we are going to talk about some groovy basic structures and idioms so it should be clear by now that nextload is a runtime is a software but it's also a domain specific language which means you write your pipeline with the nextload language and use the nextload runtime to orchestrate that of course inside every process in the script block you can have any language you want even calling compile programs I also mentioned that as the best practice we have process names as uppercase you can see here again the script nf for example in which we have vqc quantification they are all uppercase because they are process names when you go to the workflow block you very easily see that these are the process names because they are uppercase and these guys in our lowercase in this case here they are channel operators good the print ln that you saw it's groovy and some other function that you have they could be nextload code on top of groovy but still groovy in some circumstances it will be useful to know groovy so tomorrow with Cress you will learn about operators for example channel operators and with them you are going to be able to use closures which is something we are going to see soon still today and inside the closures you can just write some groovy code something with every element of the channel so knowing groovy can prove to be very useful for your nextload pipeline development skills so let's try with the most basic thing which is to print values we already used at the very beginning of the simple RNA-seq workflow section you want to print something to the screen with a new line after that you can just use hello world and in this case if you don't have any option you can just make it without the parenthesis so here you have hello world and that's it for comments we already saw a bit of it if you want to do a single line comment you use slash twice but if you want to do a multi-line comment you can use this slash star slash for variables you can just use the assignment operator the equal to create variables so here we are creating let's copy this let's create a file called snippet.f we are going to put let's call it inside and you can use groovy to interpret that but because nextload is built on top of groovy you can just use nextload to interpret this groovy code here this is all groovy here no nextload actually this then here is nextload clearing a channel so here we are going to print 1 to the screen which is the value of x here then we are going to create some why it's not working now it should work everything is groovy here now it's going to print x here which means 1, here is 8, here is this floating number it's minus 3 but 14 here x is a boolean it's false and here is the string hi everything you see here good if you want to define a local variable you can use that keyword otherwise it's going to be global by local variable it means if you put it inside a function for example that's the scope of this variable outside this function if you try to call the x variable it will not have this value we already saw that when values are limited by brackets they are a list so we can play a bit with lists here lists they can be list elements can be accessed based on the position so here it started to 0 so list 1 would be 20 so here I'm printing out the entire list and then only the second element you can actually also use the dot get method to get the element of this position there are these two different ways you can also use the dot size method to get the size of the list so here it's easy to count we have four elements but if you have so many that you can count with your eyes this dot size is very useful or if you want to do this like implicitly without having to interact with the result the dot size method is very useful here it's 4 okay we can also use assert to test the condition so here for example what's going to happen is like we are saying that the first element of this list equals 10 the assertion is going to be true so nothing happens in the end but if we have size 15 which is wrong because the first is 10 not 15 then we have an assertion error that's what the exercise is asking you to do here to make it incorrect I did I changed this 10 to 15 and we're going to have an assertion error here there are many different things you can do so you can get you can get the last element with minus 1 you're going to get 2 you can also use minus 1 dot dot 0 to make it reversed which is the same thing as calling the dot reverse method it's going to do the same thing and here you have a lot of different operations you can do with groovy code here is what you're doing with operators and here is the result so you can use this smaller than smaller than to insert this element into the list same thing with the plus and the brackets here you can remove elements you can repeat elements here I want 1, 2, 3 repeat it twice that's what we see on the right here here by flattening we're going to put everything in separate entities we can reverse this element apply something to every element here I want to sum 3 to every element so 1 plus 3 is 4 5 is 6 you can get the side of the list with only the unique values so here it's going to be only 1, 2 and 3 and then inside it's going to be 3 and so on there are many different things that you can do here with min, max, sum sort, find, find all and so on so here we're having a data structure here instead of referring to values by its position in the structure we want to refer to it based on a key if I want to get the value 0 I would say I would like to use the key a if I want the value 2 I get to see the key c and so on so here for example again we're doing some assertions you know I want the value for the key a and this map called map it's 0 referring to the key is just with the dot notation 1 and we can also use dot get method also works here same thing we are rewriting the value of the key a from 0 to x b to y and c to z and in the end that's the new map we are having that's why it's asserting here the equality for string interpolation we can just use the dot join function so very nice one to play with so we have quick we have a list of characters and we have the instruction to bring to the screen the fox side which is going to be quick and here it's going to join all the characters of the fox color list it's going to be the quick brown fox it's also going there's also going to write hello world to the screen well actually it won't it will run it will show just the rsign x and the rsign y because we have single quotes we need double quotes to expand these variables so after we run it once we're going to change this to see the result quick brown fox fox it worked but then the rsign x the rsign y if we replace this by double quotes now it's going to expand the x and y variables and it's going to it's going to print to the screen hello world that's why it's asking you to do this exercise but double quotes it's going to work hello world here it shows you how to create multi-line strings but we already knew that from the script block and the processes it shows we can also limit them with slash instead of the three double quotes you can also have conditions in your pipelines you can use the if some boolean expression and something to be done else for otherwise something else going to be done so we could do something like this let's say x is one if x is larger than 10 say x is larger than 10 otherwise say x is smaller or equal to 10 let's run this saying x smaller or equal to 10 it's correct because x is one but let's say x is 11 now now it's going to say x is larger than 10 so this is interesting because you can even have a pipeline with a process that does things differently based on some condition tomorrow you're going to see more in the process and the channels and operators section here you're just showing that new empty strings and empty collections are always evaluated to force so because this list exists this is going to be true if we just had like if list for example here just some examples this grouping production I mean it's something you should do like in your own pace and running this code and studying the group of truth links and all the links that appear here the Elvis operator I'm just trying to give you some basic idea of the language we had a temporary operator again here so if list is true if it's empty it's going to be false otherwise it's going to be true print the list otherwise say the list is empty we can play with that here when we do like list is empty it's going to say it's going to print the list is empty but if the list has anything it's going to print the content of the list so let's let it run once so we can change the list list is empty right now I put something inside the list the value 1 so list now is going to be true and then this is going to be evaluated which is going to print list to our screen 1 as you can see here good so here it's an exercise asking you to write an if statement that prints hello if the variable axis greater than 10 and goodbye if it's less than 10 so write a groovy code that you already have a string defined called x so you have a next variable that has a value an engine and a number and you compare if it's larger than 10 you say hello if it's less than 10 it says goodbye and you could even do like otherwise which means when it's equal 10 I'm going to open the solution in 3, 2, 1 it's here so let's do the third condition information that's from this now so now it's 11 so 11 is larger than 10 it's going to print hello after that I'm going to replace this with 9 which is smaller than 10 and it should say goodbye let's wait for it to do something it should be run let's see how it is it's elusive my bad, sorry so because 11 is larger than 10 it should say hello if we say 9 which is smaller than 10 it should say goodbye because there's already a single quote here we should rather instead use double quotes or escape the single quote with this river slash let's use double quotes now it should say goodbye and if we say 10 which is the other condition it's going to say it's 10 good, you can also use the 10 early operator here this is true to the first otherwise to the second there's also a repetition when you want to do multiple things repeating this is the 4 structure you declare and initialize a counter we are calling it I here then we have this topping condition so while Y is smaller than 3 it's fine increase I and do this and then now I is 2 it's 1 to this now I is 2 to this then I is 3 it stops because now it's not smaller than 3 anymore it equals 3 here we have one example with a list with 3 elements A B and C we're using 4 to create this element list and bring every element let's open here snippet let's run it and you're going to see A B and C it's iterating over the list A B and C here you can also see how to create functions sometimes you want to create functions to be used inside the closures of your processes and your channel operators so we're really close to closures soon you're going to see one of the most important features of Groovy mostly for Nexplo very very useful so basically what's going to happen here is that when you create this function called fib and you provide the number N if you get the NF number in the Fibonite sequence so here for fib 10 for example it's going to give you 89 so this return keyword can be omitted and the function explicitly returns the value that is less evaluated in the body so here we only have one line we could have multiple lines and the last one would be evaluated and returned from this function and now we get to closure which is like the hottest thing when it comes to Groovy and extremely useful for Nexplo pipelines so Groovy is sorry so closures are bodies of code as an argument to a function so here you define closures with curly braces and whatever you put inside is passed to it so you would just define a square which refers to whatever you're applying to a channel for example every element I want to multiply one by the other it's going to be much clearer tomorrow when you're talking about channel operators with Chris but today we can see some examples here so this square of 9 we basically do 9 times 9 which is 81 another way to call a closure is do.call and here we provide 5, 5 times 5 25 but there are different ways in which you can do that so here for example you have a list that we are calling x with 4 elements, with 4 items 1, 2, 3, 4 we use collect which is going to do something so this collect is different from the Nexplo collect okay this is a method for groovy it's going to do something to each element of the list and Nexplo we use the map channel operator for that we're going to see it tomorrow it's a very commonly used Nexplo channel operator but here we're going to do something to each of these items and square we define it here it just multiplies whatever number you have by itself so here we're going to have 1 squared, 2 squared 3 squared and 4 squared that's the output 1, 4, 9 and 16 as you see we are always referring the element with it which you can think of it like an iterator but if you want to give it a different name you just do the new name you want to get and use this arrow symbol here this dash larger than and you can use whatever name you use here to refer to the item of the structure so here for example we are creating another closure between colleague races which is going to receive two things we're going to call the first A and the second B and we're going to print to the screen I string say A with the value with A because we have double quotes here so we are expanding this door sign A variable so A with value of B so here then we have a map the key is U and the value is V the key is Mark and the value of Williams the key is Suda and the value of Komari so the same thing as collect when we use the dot each here it's going to apply the sprint map closure to each element here we're just going to say U with value Mark with value of Williams I mean we sit here Suda with value Komari so here we have another example we have result as 0 and then we have this map we now have 1 in year 2 USA3, the key and the value here what we do is to call the key set to the values map and apply to each this closure here which is actually changing the content of a variable that is the final size so result is going to be result that's why the minus equal means result equals result plus in what we have after that and values it in this case for China is going to be 1 then 2, then 3 so basically what this line here is doing is to summing 0 to every value in this map 1, 2, 3, we're going to have 6 and that's the output we have here we could just actually do this in multiple steps let's comment this we could do printLM values key set what happens China, India and USA which are the keys cool and then we can do each and just say it's going to be the same thing but we could at the same time do something like the country as let me say it here it's going to be still list but each element we didn't change it here this example so the next would be to change the value of the result so we can just uncomment this print we're going to see 6 if we had but here like 30 for example it would be 33 and so on so there are some links here for you to go further with groovy there are multiple links that I didn't click but you could like see the documentation here the multiple links in this section and in the others with more information but for now that's what we have planned for today tomorrow you're going to see channels processes operators configuration some as more sections if you have any question you can go to the equivalent prepared channel for that which is in the NF course lack you can go there and join the training channel so basically you go to the NF course website about join NF course here and you click on slack this will help you open our NF course lack and from there you can join the training channel where you can ask questions and people will be around to help you with questions with that I'm done for today we concluded the schedule was prepared for that and see you tomorrow probably will be around helping with questions but tomorrow the session 2 will be given by Chris Huckard who just like me is a developer advocate at Sekira so see you tomorrow