 bent and kicking things off here. You can see that we're in the first session here in the Amur region. And if you scroll down, you can find information for all of the sessions that we're gonna be doing. So this is day one of three sessions. And we're gonna be kicking things off with a little bit of an introduction on Nextflow, jumping into our first script with Nextflow and then spending the rest of the day looking at the very basic R and EC pipeline and sort of the dependencies. So just to kick things off, my name's Evan Floden. I'm the CEO of Co-Founders here at Labs and one of the early members of the Nextflow project. I'm gonna be running you through the introductions Nextflow. And I'm gonna be handing things off to Marcel who's also here and he's gonna be taking things over from there with the R and EC pipeline. I'd like to stress a couple of things. So it's a live session. We do have people in the Slack room and that's the best place you want to join if you wanna ask questions or follow along with the conversation there. If you scroll down, you'll see a section in here for asking questions. So we're on the third one here, this Training October 22 and there. And if you select that, you should be able to find the channel inside of the NFCore Slack. So if you click on this link, if you're not part of the Slack already, you can go into there, you'll find everyone there. So we chained away and asking questions and hopefully you get some benefit as well from seeing the questions that other people are asking as part of that. So I'm gonna kick things off with a bit of a small presentation on this and just in terms of prerequisites, you don't need to have anything installed or anything to follow along. All the material that we have is available online and we're gonna be using the Gitpod environment for all of the actual code. So Gitpod is the live coding environment. We can spin that up. It's got the next flow and all the resources and data and scripts, et cetera, installed in there. And from there, the only thing you really need is an internet connection of Rouser, which I assume you've got from here as well as a GitHub account for logging into that. So we've got quite a bit of material to cover today. So without stopping too much, I'd like to jump into the first set of slides and just give you a little bit of an overview of next flow, kind of discuss a little bit the problem that we're trying to solve with next flow is that I'm gonna dive into some of the kind of theoretical background for what's going on. We're gonna cover, say, a lot of material the next couple of days. It's gonna start off with some things where you might see some, so to say a gentle introduction, but also maybe see some repetition of the things. So don't worry if you don't catch the concepts first time. Some of these things just take a couple of times. I've seen them in a little bit of practice as well to get going with that. So without further ado kicking things off, let's discuss and starting off with basically the problem definition of writing this sort of modern scientific pipeline. Here's an example of an ancient DNA pipeline, which is very timely given the Nobel Prize, which got awarded earlier on today for looking at ancient genomes in here. And this is a pipeline, which is from AF Core. You can see that if we think about some of the characteristics of what this is, we've got lots of different tools here that we're trying to essentially chain together as well as kind of orchestrate the data through that pipeline. So in biometrics, we very often got a whole bunch of different tools and different resources. Maybe you've got something from GitHub, some script you've written yourself, maybe even some proprietary software, some containers from all over the place. And you have to kind of pull that all together into a piece of software, which you can rely on. On the left-hand side here, you can see that we've got some data which is going into it. And biometrics itself is, and in most of the life sciences, is sequencing data is very sort of file heavy. So it's usually big, large files. And those in themselves present some kind of challenges to how we're going to process these. And we don't just want it to work on our computer or onto even our environment. We really want to be able to take these analysis and share them. Maybe you want to take them into the scale and across different environments or share them for publication or even at least just make sure that they are reproducible in the kind of broader sense of the term as well. So summarizing the problem, we all have sort of larger and much datasets which are being, I need to be processed which we need to get some sort of answers from. These can be sequencing data, imaging data, larger sort of more structured datasets as well. And within the biometrics sort of pipelines and without them the tools that we've got, we often end up with this problem or embarrassing parallelization where we want to run our pipeline, but we want to run each sample through that pipeline or we want to run, for example, taking our genome and splitting it by chromosome. And we want to run each chromosome through that pipeline. So you end up with this parallelization problem where essentially splitting up your work and you're running that maybe tens, hundreds, thousands of jobs over some sort of distributed compute. It's typically the kind of analysis that you can't maybe do on a single machine or on your laptop. And therefore you're sending it out to a cluster or to a cloud. And we have this kind of big mashup of different tools as I mentioned before from all sorts of different resources. Now that creates a bigger problem in terms of sort of managing all the dependencies there, like typically the old style where you have to go install each piece of software and you spend so long just trying to go and get that going where you can move on to the next step. But with the advent of containers and systems like Nextflow, a lot of that has become a lot easier. So let's go against some of the core problems trying to solve in this regard. If you think about it, maybe from a more kind of goal orientated perspective, everyone's heard of fear data. And we like to take fear and sort of apply it a little bit more not to the data itself, but to the data analysis. So the idea of making our analysis findable and this is kind of principles of open science as well that we're making these things, your pipelines can be available from GitHub or different repositories as well. That they're accessible and that they can be used by everyone. There's no real limitation on the fact that we can spin up environments now pretty much at will for everyone to use and to run. Obviously the interoperability is key to this whether we're running in cloud or in cluster. All these different systems and Nextflow is a big enabler of that reproducibility. I'm going to stress a little bit in a moment. This is a key tenant of Nextflow and a key part. And then sort of like we've extended out fear a little bit to include now equitable, scalable that you can run the same analysis from your laptop onto a supercomputer as well as tested really using modern software engineering practices. And you'll see later on with some of the NFCOR pipelines we've got some unit testing in there. We recommend always having a small test data set as part of your workflow, your pipeline that you run through. And you start to apply some of these good practices using Git for example into all of the work that you do. And that obviously pays off a lot when it comes to making your software more robust. Okay, so when I mentioned before about all of the different pieces of software you have to install, this is an example of a pipeline. I'm going to say this, this is a genome annotation pipeline. So it was like looking at parasites and trying to annotate their genomes here. You can see each of these little circles, hopefully you can make it out in the live stream but each little circle is a piece of software or process. And you've got these black arrows which are going on between them and those black arrows are essentially directly in the data that the data comes from one process into another made it as a transformation that occurs and sort of installing and managing all of this software. It's a real hassle. So kind of a key example of how trying to do this in sort of the pre workflow manager era was a real kind of nightmare to take care of all of this. And this is an example of the different pieces of software and tools which needs to be put together and to run. Obviously that presents a big challenge only just to usability, which is of course the key part but also like reusability and going further than that making the whole thing reproducible. So here's an example of a paper that looked at taking in the idea of a computational science paper basically a computational biology paper trying to replicate it based on just looking at the methods that are provided all the information provided from the authors. So when you've got some software numbers, versions, command lines, et cetera even when that's all provided it still makes it very difficult for a typical person to kind of reproduce those results. And here I say it's been a couple of months trying to do that. And obviously most people are not gonna spend the two months just trying to replicate the same results of a paper but it provides a kind of a challenge to us. And if we think that we're gonna be basing so much of the work that we do on the results that are coming from this kind of analysis maybe even outside academia if we think about the applications and the clinic, et cetera it's not really something that we wanna be relying on. But doing this is gonna be a better way than describing our work, describing our science and describing our analysis in this kind of way. So the problem is even worse than that a little bit though. So even though that we may be able to say reproduce or got the same software versions we know what the pipeline's doing even in those cases so these situations where the same workflow essentially gives us different results or different roles depending on the environment that we're using. And this is what we showed early on that sort of the next flow of paper was we were looked at a couple of pipelines including one that I showed you before that the genome annotation pipeline and we saw that we get different essentially the genes were annotated in different locations. So you can see here that depending if we ran this in Mac or in Linux we ended up with a start sign with these genes being differently located as well as we did a transcript quantification with Callisto and SLU so very common tools very simple pipeline and you were seeing that the genes that were called differentially expressed were different depending on if it was Linux or Mac as well. And obviously that presents again some big challenges to us even when we've got everything sort of controlled for and to a certain extent we end up with these differences and this is likely down to underlying changes or differences in the libraries and kind of the low level code because it's sort of a very hard hard to control for particularly before the use of the widespread use of containers this is part of the work that we do. Okay, so NextFlow provides obviously a big solution to that and just going and so now providing us some sort of background on how NextFlow works with regards to that. So NextFlow is obviously solving the reproducibility part and a lot of that's coming from the containers but the kind of wider problem in NextFlow solving here is in a concept we have a language so we have a way to write the pipelines and that's what you can see in this top section here. So NextFlow is a syntax it's a language you can take code from any other language so you can take your R scripts you can take your Python you can take your bash and you take those pieces of code and you essentially link them together with this DataFlow programming. Now we're gonna be spending a lot of time looking at command line tools but that's exactly the exact same thing whether you're running command line tools or other tools in here the concept is that you're wrapping those essentially sections in this process blocks linking them together and then defining all of the containers with all of the dependencies with containers I should say. All of that gets wrapped in a Git repository and it's from there that you can take any NextFlow pipeline and then run it in any environment. So NextFlow is both a language as well as the runtime so those two things are linked together here and it's really from that from the idea that you can take any pipeline and run it in any environment that a lot of the community's aspect of NextFlow comes in because it means that people can come together independent of they're running in Azure or if they're running in their local machine maybe they're running with a slurm cluster the whole code base is exactly the same. Those people can work on those code base really contribute to it and then run the actual execution in their environment as well. So how does it work and why is NextFlow a little bit different I saved them than some of the other workflow systems out there because there is definitely many out there. NextFlow is a DSL so this is a domain specific language it's a language written on top of another programming language and that other programming language is called groovy it allows you for the most cases for 99% of the times that you need to write things you have access to this domain specific language which is specific for writing the pipelines makes it very easy to define processes to define the key parts of your pipeline if you are in cases where essentially there's something which is not in NextFlow which you need of course you can write an issue on GitHub but you can also access underlying programming language so you can access groovy or even Java underneath if you need to get yourself out of those cornplit cases and account for those situations. NextFlow also has this reactive programming model and an active programming model is where the data flow part comes from and here the data is essentially pushing or was kind of flowing through the pipeline and reactive in the sense that the processes are reacting to that data so that the processes are sitting there waiting for the data once it's received they're reacting and they're submitting their tasks or emitting those tasks as well. This will be seen in practice a few times later on. Very early on we made a decision to keep NextFlow as this kind of concept where each task is self-contained what that means is in practice is that each task lives in its own working directory it's essentially its own unit of compute in a way and as part of that it means that each task could very easily become containerized and this actually was slightly before the use of containers in this so it was a kind of advantageous or fortuitous decision in doing that but this has really led to many of the systems which have been built afterwards things like AWS batch and the managed services and the cloud, et cetera all fit this model very nicely. And finally, as I mentioned before this concept of keeping the workflow definition separate from the execution the configuration of where you run those pipelines as well. So I'm gonna show you here an example this is a DSL one example and I'll point out that we're gonna spend the rest of the workshop on DSL too but this is purely just to show you the linking between the two processes here. So this is a task, BWA mem tasks so this is a simple thing you may see sometimes in the script or maybe you're running from the command line to use this in NextFlow we can keep this thing exactly the same. The idea is we're not trying to recode it or we have to define exactly what we've got. We're gonna wrap it in what's called a process block where we define the inputs, the outputs and then the script section itself. I'm just looking a little bit at this so you can see that we're defining the input as a reference file, a fast file, a sample queue. And then the key part is that we're gonna take when the task is finished we're gonna take the sample.bam this is the output here which is generated from this task. And this is the piece that we care about we're gonna use that say downstream from here. You'll notice that we've got this image maybe a little bit different to you is this concept of these channels so you've got from the Genome channel from the Reads channel and then into the BAM channel. What that's doing is those are the definition of those channels which are the things that are gonna link together our processes. Now in TSL2 they're not specifically written in the process but they're exactly the same they're just happening behind the scenes and we'll see that. The purpose of this though is that if I wish to use now my BAM files essentially downstream from this I can take the output of the aligned sample and use it as the input of my index sample test of my downstream task. And this kind of link between these two things is what kind of drives the execution. So when this pipeline starts both of these processes essentially begin they're set there they wait for their inputs the first process on the left-hand side here is gonna receive the sample and the reference that's gonna send it's gonna essentially submit the BWA task. Once that's completed the sample BAM file will enter into the BAM channel and then be driven into this second process here which will fire off the execution. And of course this whole thing will happen in parallel and sort of across the whole pipeline as well. So these channels here are kind of a key driver of that. We can visualize this in a couple of ways as we'll see some examples in a moment of this. In TSL2 and this is what we're gonna be seeing you can see you've got the definition of the process on the left-hand side it looks very similar we're just missing this from and this into and we can now call we can essentially reference when we want to run in this case that the quant process from inside of this workflow script. So you can see that the quant process takes the index and the repeaters channel and we'll see this several times over the next couple of days. With regards to data flow, I said I mentioned that the key part of the update of flow is these channels and these channels are what we call first in first out queues. This is the kind of asynchronous first in first out queues because the kind of name to them you can think of that the channels are really linkers between the processes themselves and they're the kind of thing which are connecting those processes so that they're firing off as soon as data enters into them and that the correct data is received inside of that. Often this has been seen a little bit in practice. So if you consider here, we've got a channel which has got three elements. Now these elements could be values, they could be files, they could be whatever you want. In this case, we've just got three elements outside of this channel, data XYZ. And you can see that we have a process and this is like the process definition like we saw before. And the fact that we have three elements in here means that we're gonna have three tasks being emitted of that process. So each task is essentially a sort of an execution of a process and in this case itself and then itself would be generating some outputs as defined by the process itself. What does it look like? So I mentioned that you had, for example, a single file here inside of this channel here. In this case, we're just creating a channel called samples channel. Because we have a single file here as part of this, when we execute this pipeline, fast QC would run once. But if I was to expand that out and said actually my channel now contains all of the samples, all of the fast Q files, then from there on the individual, the fast QC process would run as many times as I had files as part of that. And likewise, all the way following through the pipeline as well. Here's a slightly different view on the same thing here. And this is providing this parallelization is coming about in that manner. So now jumping gears a little bit and just considering that syntax and the kind of some differences that you see here between some other workflow tools. So next flow itself compared to CWL has this concept of being the language and the runtime tied together. It has this idea of being able to use this DSL on top of it versus just sort of being a YAML specification in the sense. We like to think next flow is a little bit more concise and fluent. So it's an easier to read from top to bottom and can the understanding of what's going on as well as this concept that we have a single implementation so that the runtime is very tightly tied to the syntax and those two things can really result results in a lot of quick iteration and quick development process as well. On the comparisons with snake make obviously next was a lot more similar to snake make if you're familiar with snake make. Here the main differences are is that the next flow is using this push model there's this concept where we are using this reactive data flow model whereas snake make is sort of the opposite it has to start from the bottom essentially build up the DAG. Sometimes you have to wait a little while for that DAG to be built and then it will sort of start the execution whereas next flow is kind of starting from the top and pushing down. In terms of support for the container runtimes next flows had support for four or five of the moments as well and then continues to expand and the support for AWS batch, Google Cloud, Azure batch et cetera I think is kind of well known for having that kind of support as well. Okay. Now sort of sitting the syntax aside considering now how we can run pipelines and over the next couple of days we're going to be doing mostly local execution we're going to see a small section tomorrow where we're looking at the different deployment models with regards to configuration and how you can run that. When you're running the pipelines typically we're submitting with a local executor this is just next floor run and this is the kind of default way of running the tasks run on your local machine on local operating system using the containers or not so that's optional how you can set that up and then you're using local storage for that set this most common for development but also if you're just running the big box like a single virtual machine that you just want to run everything on you lack a little bit of the ability to get this kind of fine grains control of the individual tasks though because in situations maybe you're different resources or you have a very large pipeline could end up waiting a long time. When it comes to cluster orchestration next floor has support for 10 or 12 so of the major schedulers here so things like SLURM, LSF, grid engine, et cetera and here next floor submits each task to the cluster in itself. So it wraps all of those tasks in the correct say into a SLURM job and then does the B sub command for example and in this case next floor is really taking care of a lot of that and then the actual scheduler itself will place that onto the actual compute nodes. The compute nodes typically here will have some sort of shared file system that sticks those right and this is a pretty typical set up in an academia or some large organization where you've got a lot of compute resources. What's become really popular in the last few years is these managed services in the cloud. So this is an example of AWS batch but the exact same thing exists now for Azure batch and in Google batch. So it's very similar and this case next floor submits each task to the cloud providers API and from there on the individual API jobs will spin up some virtual machines dependent on the resources that are required from those tasks and then from there the individual tasks will then run on those virtual machines. The great thing about this is that you can spin up resources as you need. So maybe data is coming up in sequence or you've got a whole bunch of analysis you need to run in one time. Maybe you need to access GPUs or some very large memory machines and all of that can be done now basically spitting up those resources and demand for those jobs as well. The kicker here is that that typically the storage that's been used there is object storage. So you can think that these VMs are located all the way around this very large data center maybe the connection that they have for the individual input and output files is different. So they're being able to essentially being able to use that some sort of object storage is key and we have quite a few solutions around that as well. Finally, there's Kubernetes. This is something maybe more for the future or for teams who have already got that infrastructure set up. You obviously now instead of being a managed service you have to manage like for example, the horizontal scaling here but it's something that we see we see going to be very key part of the future execution of the pipelines as we move towards a more cloud native approach to all of this. How does that portability look like? Well, here's an example if we take an extra script and we just run local like this is what we're going to be doing in five minutes or so. There's an extra run and we're going to run that extra script on a local machine here. It's going to run by itself. If we then like wants to take that same script and notice how we won't change any code of it we will simply add into a conflict file that we want to run on slum that we want to use a queue here eight gigs for four CPUs, et cetera. You can simply add that in an extra then we'll wrap each task as a slum job and submit it in this way. And likewise, if we wish to switch out we used to run that same job in AWS batch the same thing can happen here and we can see that we've got these abstractions for the concepts of queue memory and CPUs which makes the pipelines very portable across different environments. A lot of that portability is really made available because of the fact that we decoupled these two things both the configuration and the actual workflow definition as well as containers. And as I mentioned before containers are kind of key to the reproducibility but also to the portability of the pipelines and the way you can think about this is that the containers themselves were typically developed for long running web applications. That's kind of the key application as you have said running web service, for example but what we can see is we've essentially taken containers and trying to use them more for the use and data pipelines themselves. Earlier instances of this where people were trying to and still do to some extent using virtual machines to sort of define all of the dependencies that are required for analysis. There's some disadvantages to this though virtual machines are typically very large so they're very big, they're difficult to move around. You've got this kind of slow startup time in terms of how long. So this is what we call the cold start problem. If you try and spin up a VM you'll often have to wait five, 10 minutes for it to start versus a container which can spin up in seconds. There's also this idea of all the tooling which is available. So containers and in particularly the Docker tooling that came out really allowed us very easily to build containers, to have registries to store them in and we've got this concept of composability so we can make layers, we can build on top of other containers which really just makes them fantastic for the kind of work that we do and there'll be a session later on today where we'll start to build some containers and show you that. As far as next flow is concerned, next flow treats each task as this kind of independent unit of compute. And so that led very well to the adoption of containers. You can think that next flow submits each task to the job. Each of those jobs just becomes a container in itself and really from there they can be scheduled out and submit a different execution engines. All of the managed services in the cloud now sort of use containers as their kind of default way. And if you think about something like AWS batch kind of sells itself as a container batch service. So kind of lends very nicely to that. Next force has support for other container engines and there's actually a lot of interoperability between the containers themselves here and the ability to convert them. So going towards these kind of standards helps a lot for all of the work that we're gonna do. When to use containers? Well, I would say always once you get used to building these this is a really big advantage to your work. You're really helping your future self as well. You're often in part of a pipeline we'll define how we build those containers as well. And with things like GitHub actions or other kind of CI, CD work and DevOps tooling you can build those containers on the fly now. So it really becomes very easy and then sort of manageable ways to take care of your environment as well as your code. And then of course the data itself which we will see later on. It's part of next force growth. It's on the community side it's really been growing quite dramatically. A lot of that growth I say comes about because of this connection between these two things between the execution environment and the definition of the workflow. This is a little bit out of date. I think we're up to 12,000 or so now developers who are working on next floor. And this is again, just looking at the contributions of the runtime but if we consider more broadly with next flow and then of course, it's a lot of sort of bigger use cases there as well. If you're looking to get into next floor and editing this there's a couple of editors which are available. These are syntax highlighters. In fact, we're gonna see one in a moment as part of our training that we're gonna look at for Visual Studio Code. And finally, as I mentioned, in F core which is available here and something that we're gonna spend a lot more time on on day three where I believe Phil will come in and how she'll provide a little bit more information on that. But with that, I'm gonna head back over to my screen here. So just if you'd be here with me, I'm just gonna change over to my screen here. And we're gonna jump into now the practical exercise the practical part of this. And if you go either into the Slack channel there'll be people pasting some links here. But the key point we're gonna see is we're gonna go to this training material which is at training.secara.io or secara.io is this training or if you type in next floor training you'll see it is coming up as one of the first hits on Google. Is on this website here we have everything available and this is gonna stay here as well. This is public so you can use this. All of the material in effect, this whole repository that including the environment is available over in this NF training public get repo. So if you wanna put any pull requests or you have any suggestions for the material please feel free to have a look here. And the other kind of key resource that we're gonna be looking at here is obviously the next load docs. And if you have any questions on things as we go through here there's always the next load documentation which is always good to sort of get familiar with in terms of learning how to read it and hopefully we'll see the key concepts as we go through which should make it easy to find information as you kind of join your journey. Okay, as I said, we're gonna be using get pod for this and get pod has got everything we need for the environment. What I am gonna do though is show you how you could do this if you don't wanna use get pod if you wanna install next floor yourself there's a couple of requirements for next floor. Next floor requires a POSIX file system. So this is essentially Linux, Mac OS or also Windows subsystem for Linux. And the two main requirements are a dash and in Java. So with those two things installed you can pretty much get going. There's also Git and Docker which are used for this workshop as well as a couple of exercises here for if you wanna use AWS batch for example, there's this batch CLI and a few things about for our case we're gonna use the get pod environment so everything is set up there for you. If you go to 1.2 on the left-hand side there what you can do is you can select this link here which says click the following URL. And what this will do is it will spin up a get pod instance which is actually a container running in the cloud and you'll notice that it's actually just the same repository as this. It's just the same repository with this get pod.io at the beginning of it. And what that does is that essentially because of this file that we have in here this .gitpod.yaml it's gonna spin up everything that we need according to this so it's got extra installed inside of it it's gonna have our container already it's gonna have for example the syntax highlighting for our VS code extensions and all of these things as well. So I'm gonna give you just 30 seconds or so to go through find 1.2.1 and we're gonna select that there and then that's gonna spin up it's gonna exit my previous one here and it may ask you to log into GitHub so you can put your GitHub login there it's gonna give that a few seconds to go through I'm just gonna find the training again let's go to here you can see mine's just spinning up now just give it a few seconds okay and now we're all good to go. So on the left hand side you can see that this is a you've got like a file browser so if you're familiar with that kind of this is the VS code it's kind of got this option here you can select these other things but for the most part we just need this explorer I'm gonna shrink that sometimes so that's just by clicking on this explorer button just to give some more real estate for you and I'm also just going to personally is gonna increase the screen size a little bit so you can see a little bit more of my screen as I'm typing here and see this information on the top here you can see a space for file browser sorry for file editor and the left hand side here you can see sorry down the bottom here you can see that we've got some terminal so if I do LS here you can see all of the material that we have we've got some data we've got some scripts we've got some config everything's sort of available in here for us to get started the first script that we're gonna be starting with though is a Hello World script and I'm gonna start with the training to go through this and show you exactly kind of what this looks like I can move this to the side here and do that here let's go from here okay if we look at that this is gonna be starting from the section your first script or if you're all set up with that you'll see in your first script this is an example which is it looks a little bit trivial to start with but there's quite a few things going on here it's to try and give you a little bit of a flavor of the different elements of the next low pipeline so we're gonna start by looking at we've got some parameters we've got the definition of a couple of processes we're creating some channels we're actually doing an operator that manipulates those channels and we've got these various elements of that which is available to us in terms of what the pipeline does it's hopefully very simple to see here and I'm gonna run through it without next low so I'm gonna show you how would you run this without next low prior to doing it so we get a feeling for what's going on the basics of the pipeline can be described as taking a string in this case Hello World we're gonna print that to the screen with this print F here and then we're gonna split the string using the split tool tool, this is a Unix tool which is available on your machine and from there it's just gonna create for us some files which is just gonna contain the text itself and the reason for this is is creation of files the parallelization and the parallel processing of files is something that we see often in pipelines as well the next step down is that we're gonna take that those files themselves we're gonna cat them and we're just gonna take them and turn them from lower case to uppercase so in our case we're gonna take Hello make it uppercase, World make it uppercase and then we're gonna print that to the screen so it's really easy to actually trivial what the pipeline is doing but hopefully we can get a bit of a feeling for how that works so I'm gonna show you firstly without next low it's off so you can imagine that the first process is simply saying print F here and then it's taking Hello World so Hello World here allowed copy and then we're just simply piping that here to this tool split now what's happening here split is saying I'm gonna split that screen into six bytes so six characters here so you can think one, two, three, four, five, six so the first one is gonna be Hello and then the second one is gonna be one, two, three, four, five, six so World and then what this is doing is this is saying okay put that with this prefix chunk so this is gonna create two files which are both prefixed by chunk this is just a convention here so when I run this if I do LS now on chunk you'll see that we have two files chunkAA and chunkAB and if we do a cat on them we see the contents of them you can see we've got Hello and we've got World here so that's all the first process is doing taking the string splitting it into two files and then basically making those available for us now then the next process downstream of that is all about the parallel processing of those files it's about taking those files and then using them for something else in our case we're gonna say cat here the actual files themselves if you look at just a quick reminder if you look at this process here we could even copy this if we wish like this in here and I'm simply gonna replace now the Y with the actual file name so chunkAA and you can see it's taking it from Hello to uppercase Hello and lowercase World to uppercase World there the final part of the pipeline is some next flow syntax around how we're going to basically be viewing that how we're gonna view the contents of that which is essentially printing it to the screen itself so now that we can kind of know now we've got a feeling for hopefully got a feeling for what we want the pipeline to do or at least the Unix commands of the pipeline are doing what about the next flow part of it and how is next flow handling all of this well in the first case we have a shebang here which is something you'll often see in many scripting languages just a finding that we're gonna be running this as a next flow pipeline we have in this case here the params creating file which is a parameter next flow has a special scope for parameters they're treated slightly different they allow us to do this so parameter assignment here so we're saying we're creating a parameter so called greeting in our case we're giving the default value which is a hello world and parameters can be defined in many places they can be defined for example in a config they can be defined from the command line they can be defined as a parameter file and all these different variety of ways of allowing us to modify those parameters so to make it useful for the way that we want to to use those pipelines in the future as well the next step that we've got is the definition of a channel and this is for the first time we see what's called the channel factory we're creating a channel called greetings channel this is again it's like an assignment so greetings channel equals this channel params dot creating this is all kind of this is basically just saying this channel contains a string for our purposes we can then see that we've got the process definitions in this case the first process is defined as taking a value this is just meaning a value itself in our case a value is a string but it could be an integer it could be any kind of basic data object the key difference is that this is a value this is not a path so next flow typically just treats those two things differently because of the way that we have to treat sort of file objects versus we have to treat every other basic object which can easily be passed around so the way that this runs is this takes is in this case our string as an input it does the x that essentially performs this on the string and you'll notice there's dollar x and the dollar x here is a this is a next flow variable so we're assigning to the string this x value and if this could be y this could be whatever you want to call this whatever you call this here it has to be shown here this is what this kind of mixing of programming languages now you can see we've got some command line and then we're mixing this up with a next flow variable and all of this kind of gets interpreted and it's run and then finally you've got this output so when this is run when this task is complete when there's essentially a successful exit code what you'll see is the output will try and capture this so it will capture chunk underscore star anything which matches that any path which matches that will be the output of this process and then downstream from this we have convert to upper and in this case here converts to upper instead of taking a value takes a path so we want convert to upper to take a file as an input or a path as an input and what this does is it takes that file it cats it when it runs this so just prints it essentially takes the contents of that file I should say and then it's just piping this here to do this transformation of taking it from lowercase to uppercase right here finally all of this is captured in the output here and in this case instead of being a value or a path this is just a special one which is a special output qualifier which just captures the standard output whatever is printed to the screen here is going to go inside of this channel so you'll notice that we've defined the processes here so you've got these two processes but we haven't actually referenced them we haven't actually the processes are not actually linked this discreting this channel is not linked to the process and this is one of the things around DSL2 so in DSL2 you can define processes they can even be in different files it can be a lot of the place and you've kind of got this modularity so you can then reference them inside of a workflow section so I can say the workflow itself contains why I wanted to run split letters on this greetings channel which is going to create for us letters channels the letters channel here in this case is going to be the output the greetings channel here is the input because you reference basically the name of the process then the input this has got one input so it's this one here and then the next step we're going to take the letters channel which we know now contains our files here and they're going to be inside of the convert to up-up process so convert to up-up is going to be running in here and then we're going to do a flatten I'll show you this later on but this is our first example of an operator and finally the results channel which is now standard output the screen is essentially just going to be printed to the screen don't worry if you don't get all of that that's basically like all of the kind of key parts of an exploit pipeline explained to you all in 30 lines or so so let's go through and start running this to get a better feeling of what's going on there's a full description of every line as well in the notes below if you wish to see that so what I'm going to do is I'm going to open up on the left-hand side here the hello.nf file sort of open up that always on the left-hand side here and you can see that we've got these the exact same pipeline you should see if you wish to edit this you can just change the code there and you can press control S to save that or two or two do this let's just get things running to begin whether I'm going to say next flow run and then I'm going to just say point to the file itself so I'm going to write hello.nf and when I do this we can see what's so it comes through here it just takes a few seconds to launch up you can see we get the version of next flow that we're running you can see where we're launching this local file we get this name here and then we get this revision you can see that we had three tasks which ran they all ran locally we're running directly on this machine and we get split messages which ran once and we get convert to upper the process which ran twice let's do this again see what we get so now I'm just pressed up and I'm just running the exact same the exact same code here notice here now we get hello world whereas before we had world hello but why is that the case? well what's happening here is is that the convert to upper process is running twice so we can say that we got two tasks of convert to upper running and they ran in parallel and they ran in parallel at the CPU level what this means is we don't know exactly which one is going to finish first essentially which one is going to print the screen first because of that really true parallelization and this means that we can get different order in terms of the way that they are printed to the screen here and this is extremely powerful when you think that you may have tens of thousands of samples which you need to process and each one of those may be running in parallel let's say not on a single machine but potentially say across the cloud hundreds or thousands of tasks running at the same time and all of that enables this parallelization just because there was two files that were generated to begin with as part of that the other thing that's interesting to look at here is this letter these letters at the beginning and this is what we call the hash so these are hexadecimal letters they are what they call the task hash and they are unique every time you run that task and every time you run that task and if you don't change anything and you're not using a resume I'll share in a moment then this is like a unique hash here the other thing about it is that this actually defines where the pipeline task runs I mentioned before that each next load task so runs in its own working directory and from that if I just I'm just gonna quickly remove my working directory here and then do the same thing run through and what takes place here is when I run this and if I just do a quick tree here on the working directory you'll see that we have inside of my working directory I get three sort of subdirectories which are created and inside of that those other three and you'll see that these correspond so you'll see that split letters has 83, 49 you can see that 83, 49 was created and that's where the files are themselves so the individual task the individual task is actually run inside of there and the files which are generated from that task out there we can also notice that in the subsequent downstream process that the actual files are in this case symbolically linked so the, for example, the converter upper task here this 36E1 is actually pointing to the original location of the 83, 49 this 83, 49 there so in this case because it's a shared file system there's no need to pass the files around and this whole point that this whole, this is why I was stressing before the difference between value and path because when we have value and path it's a very different from the way that we treat these things and hence why next load and handle these things, particularly in cloud environments where you need to be very careful about where the files actually are when we're referencing them versus just normal objects which we can pass around okay, now let's consider now what if we wanted to make some changes here so I'm kind of happy with my pipeline let's say I'm kind of content with how this is running but maybe I wanted to make some changes to the second process so we've got split letters it's working fine but maybe I made a mistake here and convert to upper instead of transforming it convert to upper in this case now I want to simply reverse the file, reverse the contents of those files instead of being uppercase I just want to reverse this now what happens if I want to do this but I don't want to rerun the pipeline from the beginning I don't want to wait to do all the processes now this is a trivial example because I showed you it takes a second to run but in the cases where maybe it takes days to launch your pipeline this is obviously not an ideal situation what we can do is we can do exactly the same as we had before but we can instead of now specify resumed so next I run holo single hyphen resumed and when this runs here you'll notice that the first process this first task of split letters is cached and what this means is it hasn't actually run that task so if that task was going to take two days essentially we'll just be cached here and we don't need to rerun the whole thing again here you'll notice that the second process though converts to upper has run so that's actually the piece which is run because we've made some changes here and you can use resume whenever you want so I can just leave resume on in some cases maybe leaving resume on is the kind of default behavior that you want to have in this case here when we leave resume on because I haven't made any changes to the pipeline you'll notice that everything is cached here so the whole pipeline is essentially not had to run it's used that caching mechanism for that and what happens if I look now a little bit deeper and I say if I look at the split letters process you'll notice it's got this hash 8349 8349 the hash remains the same it's actually the hash is a key defining piece that next one uses to know whether to run that pipeline again that hash is made up of this the actual input files it's made up of this timestamps on those files made up of the script section this is the piece I changed here and all of that comes together to know whether to change that as well okay so far this pipeline that we're running is really kind of simple enough but it's only for us it's just running the this hello world piece what happens if I wish to change this now if I wish to say update this so I'm not just running hello world but I want to run it on my own data maybe I've got my my own data I come around you've given me this pipeline or we should run it this way and with this we can simply set up and we can say next floor run again everything is the same hello but now if we wish to reference and or change this parameter parameters can be changed using this double dash here so I can say double dash greeting because this is the name of the of the parameter and from the command line I can I can override that so I'm going to change it from hello world to Bonjour Le Mans here and with this we can print this and now instead here of when it launches you can see that the first task is going to be very similar split basis is going to have one of one but converts to upper now is having three tasks and the reason for that is that if we take Bonjour Le Mans we can split it by six characters three times and not simply two with hello world and this again is showing a little bit of the the data for parallelization so if I was to do the exact same thing and I just said whatever here or however many times that I can split in this case this this thing however many files I'm creating many files I'm starting with I'm ending up with this parallelization and here you can see we end up with 10 tasks being run for that as for that as well so something that's kind of being useful for for driving for driving that forward okay let's consider one other thing which I which I didn't dive into yet before we before I hand you over to Marcel and that is considering what's taking place with this flatten here I didn't mention it so far but I want to examine basically what happens if we remove this here and just have a look exactly what the lesses channel really is and this is something which can be very useful when you're when you're trying to debug your channels or debug your pipeline simply right view on the channel and see exactly what's taking place here so I'm just going to go back to the default one next floor run hello and what actually is the contents of that channel what is really the question I'm asking is what is this and you can see here that the that the contents of this channel is two files and those two files are the two chunk AA and chunk AB files that we saw originally you can see that they're both in the same working directory because that task is being executed from that working directory but importantly when I look at them you'll see that I've got these square brackets and that square brackets and the fact that they're on the same line indicates that they are a single element so we could describe this as the lesses channel contains a single element and that single element is those those both those files together the problem this has is that if I was to then take this originally I'm going to remove flatten here which is essentially doing the exact same thing so I'm removing flatten and when I run this you can think a guess of what's going to happen here what's going to happen is that the letters channel now because it contains a single element convert to upper just runs once and essentially runs on those on those two files together so flatten in this case is taking looking at again you take letters channel and you're going to now flatten it you do a view and I want you to look where the brackets are now we'll see here that we no longer have the square brackets and this means that every every file every element that we have is every file that we have is a new element so this this the fact that we have two elements now means we'll read two tasks we should generate it and likewise the same thing if I was to show you here like the example here you'll note you'll see that every file that gets generated with this with this one that creates 10 files every file that goes all the way down here you can see we've got a lot of different files here would be processed in parallel and that'd be processed in parallel or using this on this convert to upper process as well there's a little bit more of a description of what's taking place in the pipeline here you can see all of the all of the different pieces of that otherwise otherwise feel free I haven't I haven't had a chance to check some questions but I'm sure there's lots of questions going on with regards to that very basic pipeline don't worry if you didn't capture all of this we're going to go through now a a RDC pipeline where we're going to see many of these same things again but maybe a little bit more applied to buy informatics and then later on tomorrow sort of the first thing tomorrow we're actually going to go through every one of the processes so we're going to look at how we make processes how we make channels etc so you're going to see these things many times over the next little while but this is just a little bit of a primer on the on the kind of main basic pieces of that so with that I will I will pass you now back over to Marcel and I believe there's going to be a 10 minute break or so and to take a breather and then when we come back we're going to take on the the simple RNA seek pipeline which is a great example of using some real data some some real fast Q files some some fast A files as well as some real software where we're going to be doing some alignment as well on that so thanks for your time everyone I'll pass you back over to Marcel now and see you in shortly okay everyone so now we're gonna have a 15 minute break okay and feel free to use this time to go to Slack and ask questions try to redo some snippets that you you had some questions during the the presentation by Evan so in 15 minutes we're gonna be back hello everyone so let's go back I'm gonna share this here so we are back to get pot okay you've seen the first two sections with Evan so now we're gonna talk about this section three and section four as you could see before in the schedule of the training okay so the idea here with this section three is that we're gonna build a full even though simple RNA sec pipeline okay so the idea is that we're gonna really build like step by step using a few things that you've already seen about next flow and some things that you haven't if you if you get a bit lost or confused during this don't worry okay because the idea is that we're gonna show you like a full pipeline we're gonna build it together and in the next sessions like tomorrow you're gonna understand bit by bit and then everything's gonna be clear so the idea here don't focus too much on the details about next flow but focus on the idea of how next flow can make this pipeline work so we're gonna build seven scripts okay and basically what they're gonna do is that they're gonna it's a bioinformatics pipeline okay the RNA sec pipeline we're gonna index the transcriptome file then we're gonna perform some quality control then some quantification create some report so step by step okay so let's start so the first thing as you saw previously we can come here in enough training and the scripts are here okay and here's just the training.secure.io that we've seen before so my name is Marcel okay I'm the next flow and NF core developer advocate for Latin America so I'm hired by Secura Labs which is the company that created next flow and one of my goals is to grow the community in Latin America so if you are in the region and you need some support either if you are like in a startup a biotech company or university in academia in general feel free to get in touch with me for support for some help that you may need some tips some explanations about next flow and of course if you want me to give a talk or a training just reach any conversation would be great so the first thing here is we're gonna start with the script 1.NF okay which is pretty simple and in a way it covers something that's been already shown in the in the previous subsection with Evan okay so we see here that we have a parameter which is reads which has a path to a pair read here to reads we have a transcriptome file it's a path again and a molecule c path okay and then basically what this NF file is gonna do there's not really a pipeline here it's just gonna print the content of this variable okay so we can click here to open this script 1.NF I'm gonna increase a bit my okay some zoom here so it's easier for you basically we're gonna go to the terminal I'm gonna run this script 1.NF so basically it's gonna print the content of this variable okay with this part at the beginning so reads we have here the project here which is where the NF file is okay so it's workspace and f20 public NF training and then followed by this path which is where or FASTQ files are so one thing we can do basically is to cease it's a parameter as you saw here this reserve keyword let's say you can set it with the reads and there's something very interesting to know so if you do next flow dash h then I have the help from next flow and all the parameters you're gonna see they have one single dash as you can see here so whenever you were wondering if you should use one dash or two dash so one dash are options and parameters for next flow itself the next flow engine okay and two dashes they're parameters for pipelines as the ones that we've seen here okay reads from scheme tone file multi qc and so on so we're gonna do here it's gonna run again basically just gonna print okay but we are setting the read so we are not using the default values that you have here okay so again it's just gonna print nothing new but then let's instead of just using that the parameters that we have already in the file let's try to add a new one so we're gonna come here to the strip one dot f I'm gonna use the editor that I like also uh script one is spin have to zoom out a bit okay I'm gonna add here a new one is out here uh directory where one files to be saved even though it's not gonna do anything just a variable that I'm setting okay and basically what I'm gonna do is just add like results in the current directory create this one another thing we can do is to start adding some information not just say where the reads are but information about the pipeline okay so here uh I'm gonna copy paste to make it easier okay and I'm gonna stop here just show you something so once that I add this new uh parameter which is out there I'm gonna add this log info to print to the screen a few things so it's gonna say what's the transcription file that I'm using in this case which is the default one so if you set it the user when it's running your pipeline he's gonna see the transcription file that you set okay or the default one the reads and so on the thing is that if you run it this way it's gonna be a bit ugly because you can see it's kind of pushed to the side which we do here in reality but for readings like indentation so what we have to do to have the the the info like really tied to the to the border of the screen is that we're gonna use this strip indent okay so by using that we're gonna strip the indentation and then our information is gonna be really close to the border of the screen which is usually what command line uh softwares do right okay that's the first thing so here's a very beginning we cannot even call this a pipeline because it's not really doing anything it's just getting some parameters in printing to the screen so what we're gonna do now is really start uh doing a pipeline we're gonna create a process which is the atomic unit of some working that you have to do with next flow so every task is a process being run okay so we're gonna create here uh a task called index just we said at the beginning is gonna create that is gonna create an index from this transcription file that we are providing okay so this is already in the script to uh file okay so it's already everything we did before so as I said we're gonna view like step by step and these next script files they they're gonna get together with the final pipeline in the end so everything we don't we we don't so far it's still here and now we're gonna create this index process so a very simple process has basically three blocks we have an input block which says what are the input channels that are entering this process we have an output block which is gonna say what's leaving from this process we are writing to a file if you're writing to the standard output which is the the command line for example and what this process is going to do which because usually we have multiple lines we are used to to have the the multi-line uh string which is the three double quotes okay and here what this process is going to do is to call the salmon binary the software okay uh to create an index we're gonna give some cpu some transcriptome file and then we're gonna the the index that we create is going to be called salmon index so this that's the process but just defining the process doesn't run anything you're just describing how a task would be but you still have to call it and that's the thing here like how we're gonna call it what process you're gonna call first what process is going to be the input or output one of the process so that's the workflow block in here you're gonna say how things will be run in the sense that how process will be called okay so here we're gonna call the index process as you saw okay we're gonna give us a parameter and here you see there's an input which is a path for the transcriptome file so here we're gonna do that params this is an argument for this function this process called index and the output of that we decided to store in this variable which we call index underline ch so this underline ch it's not mandatory but it's a good practice to to let it very clear in your code which variables are channels okay so it's a good practice to add at some point at the beginning or at the end this underline ch so that we know that this variable is a channel in the in here it's an output channel for the index process okay so once this is all understood let's run this so we're gonna run next flow run script two enough so one thing is gonna happen here so what let's wait a bit what's gonna happen is that someone is not installed so it's pretty clear here with the message right so someone command not found so someone is not installed and here that's not ever actually we made it this way because we what we what we want to show you is that you don't need to install everything in your machine or wherever you're gonna run your your pipeline either if it's in the cloud or a cluster or your local computer your desktop laptop you can use containers or other tools that can help you isolate your environment okay so here what we're gonna do is that we're gonna run with this parameter with docker and as you can see here because it's one dash it's a next flow option it's not a pipeline on so indeed if you go to the script two dot enough there's nothing with docker in here because it's not a pipeline argument so we're gonna do it again next flow run script two dot enough but now we're gonna say with docker what's gonna happen now is that next flow is a creator is gonna run a container image with docker and this container image has someone inside and that's why it ran just fine so we can find based on the hash that we see here we can go there and see that we have all the files created by someone okay but then you might be asking okay so next flow knows that we want that him to it's to use docker and but how does it know what the image and how is someone installed so there's a file called next flow dot config okay and inside this file you can provide some information some configuring information about next flow that's a very nice feature because in your script file in your pipeline file you write specifically about the pipeline and any other configuration you come to this next next flow dot config file and here what happens is that I'm gonna say that all the processes are gonna run with a container which is in the namespace next flow and the name of the container image is RNA seek dot enough so we didn't say anything here about uh I mean because we say with docker here's in docker hub we didn't say anything so implicitly it says it's on docker hub and if you go to docker hub with the name space next flow you're gonna find this RNA seek dash enough container image that we created and we put someone inside of it so not always you have to create your images sometimes there are already images the way you want ready to use in this case there is this one for us so that's how next flow knows which container image to use and how someone is installed inside and that's why this command here it worked okay uh if you don't want to all the time keep typing this uh with docker what you can do is that you can go to your next flow dot config and at the bottom you can do docker enabled true so some people they do a type or sometimes they use enable and it doesn't work so it's enabled with a dndn okay so by doing that we can both use the with docker if you want but we don't need it anymore because we enable docker by default in the next flow config so now as you as you're gonna see there will be no command not found because it's using docker by default and you can do the same thing to other container technologies for now let's just focus on on docker so you saw at some point here in the code that we have this uh let's open this as you can see here when we were calling someone we were giving some variables which are cpu's this is our cpu uh and here by default it's going to be one but I think as you can say how many cpu's you want to request the operating system to use so what you can do here you can simply come to the beginning of your process and you're going to specify cpu's there are many other directives so this is directive this optional configuration at the beginning of the process blocks block are called directives so one example is cpu but you have many others like cu and so on here we can say two and automatically when we run that it's going to start to two instead of one and actually we can see what's going on so we're going to go to our work directory we're going to use the hash that we know where things are and you can see the command sh and as you saw here this is what's being called inside of whatever the virtual machine the current environment the container technology the container image here it's going to be run with the two that we set the cpu's okay another thing you can do is actually to see the channel that's something that Evan did a while ago and we can do just the same so at the end here we are just writing to the path with all those files that you saw a few minutes ago but not only that I want to see what's inside this channel okay so I'm going to use the view operator so this functions that we apply to channels we call them operators okay so we're going to use the view operator next row let's see what's going to show so we have the path okay this is very close to what Evan did when he was presenting a while ago we already did this with the tree to see what what was inside so now it's a question that someone asked in slack a few minutes ago before we came back actually in bifurmax we very often we have this pair and sequencing sequences right so how we do that how can we read that in next row in a way that makes sense it's not messy so now we're going to go to script three not enough okay and we're going to see that not only the parameters are being provided to call for the processes even though here we don't even have a process but you know what I mean so basically what we're going to do is that we're going to create a channel so we're going to use the channel factory from file pairs so Evan showed some channel factories we're going to see many more like tomorrow this one basically you're going to give a path and it's going to understand you have five pairs actually so by looking at these params reads it's the first line here we see that we give a path starting from our project here okay and then at some point it gets to this it's a block okay so actually if we go to data you're going to see here that we have gut one and gut two there are two files they're both samples from the gut here from liver here from lung but they are parent sequences so here we have two samples okay for the for the same two files for the same so what we're going to do is that when we have the script three and here we have this blob saying it's good one and good two and when we use the from file pairs it's going to understand that and it's going to do something which is very nice so there's something I'm not sure someone asked a few minutes ago in slack so we're going to have a tuple okay didn't bring anything so we're going to have a tuple which is the the first element here is like a key okay which is gut and then the second element of the tuple is a list of elements which are the files here they are two okay there's a pair in each file and an identifier of these files okay now now instead of just using the default okay so as you saw here if you go to script three instead of using the default which is just gut we're going to set in the comment line all the files there that that end with one or two okay then have gut liver and so on so as you can see here now we have three tuples because we have three elements in this channel in this pair okay we have gut and the two files liver and the two fires and lung and the two files okay one interesting thing that some people do instead of setting a file a variable name a channel name and equal to the name of the to the rest of the expression is going to evaluate that the channel factory some people they're going to do they're going to use set okay which basically you're going to say the same thing instead of using the equal at the beginning and for reading some people they will even like split the line something like this do the same here with sets sometimes just make it easier to read okay and this is going to be the same thing that happened before another thing you can do to make it easier to interpret the errors that may come with your pipeline is use the check if exist just like we did before here so from pile pairs it's the channel factory not only given the path we can also just add check if exist true and it's going to give a more next flow way of reporting the error let's say okay so now that we already discussed a bit this processes this log info some new channel factory let's start coming back to a more like real pipeline thing and that's this script for okay which again has everything we've done so far so the audio here we have the log info printing some information about the pipeline brand we have the index processes work process that receives a transcript on file outputs an index created with someone and now we have a new process we're going to call it quantification okay this process basically has two inputs which is a path to the summit index in a tuple that's going to have the simple id in the reads which just what we just which is actually what you just saw like guts and the two files lever and the two files that's what we have here is tuple okay and it's going to output a path and here we just don't want like a summit index folder some folder with a static name we want actually a folder with the simple id name okay so that's why we have the dollar sign here with the double quotes and we're going to use someone again the same tool but with a different command here which is quant so really want to quantify that again we're going to use the test CPUs which by default is one but we can change here as a directive at the beginning of the process block and some other parameters okay again as here for the tuple we have the reads because we know this is actually two elements we can just call it here with reads zero and reads one okay and the output is going to be the simple idea as we already said here for the output so here's for the command line someone here is for next load to know what's the output and then of course we have our workflow block which is going to coordinate all these things so we're going to create a channel from five pairs we want to check the existence and we want to give the reads path in the end we're going to save this to a channel called read pair ch then we're going to call the index process to create the index and then we're going to call the quantification process to quantify that okay and then the output of that we're going to save to a channel variable which is quant underline ch so we can call next we'll run script four everything's gonna work because we have docker enable it's gonna really enable automatically docker for us one thing here is that it did everything again and you know here it's easy because we have few samples it's like a sorry example but sometimes it take one process can take a lot of time and you don't want to repeat everything just because we had a small thing at the end so one thing we could have done is to add this dash resume again because it's one dash we know it's a next flow parameter okay so by using resume what's gonna do is that everything that didn't change from the last run is going to be cached we don't run anything it's much quicker and new things will be run so here because everything has been run already we're gonna have everything cached so the index was cached and the quantification was cached if we had new samples we're gonna we would have part of it being cached and the new ones being run okay one thing we can do is to use the tag directive again directives they are added at the beginning of the process block okay so we can go to quantification we're gonna add here the tag salmon and sample ID and you're gonna actually call it with all those samples not only gut but liver and lung also okay still it's not it won't much change much here because we have one line being printed per task per process sorry so you're gonna see here while they're being done but in the end you know you only see the last one so one thing you can do is to use unc log dashes dash log false it's a bit more verbose but it tells you more detail what everything is happening if you have like a thousand samples going to be a mess but here it's interesting because you know quantification on lung on liver and guts and so on but then where are all these files you know and here doesn't appear like anything is happening we saw already that they're gonna be in the work directory with my default is this one you can change by providing minus w to next flow okay uh this last one for example was 74 4AA and the file's gonna be here okay okay we can find it easily because we have the hashes and everything but it's a bit annoying let's say again it is a tri-example it's easy but sometimes it can be really like a monster pipeline and this is a bit bothersome so what you can do is to use another directive called published here and by using published here here in the quantification for example oops we call it published here the name of the directive we say where we want the results to be published and here we're going to use the out there which is a parameter that we created at the beginning specifically for that a path where we want our results to be and then the mode by default what's gonna happen if you don't see any mode is that a symbolic link a shortcut to the file in the work directory will be created where we want in the results uh folder by saying copy we really want the files to be there so if we run next flow uh again now we have the published the directive the files are gonna appear inside the results folder everything was cached so as you see here i did three in the results folder and now they're here everything so another step is to do quality control there's something that's very common in pipelines uh you do quality control then in this case you align or you see you quantify so here what you're gonna do is to look at this script file sorry script five script uh file uh again it has everything that we've been working on so far so they all you hear the log info about the pipeline uh the index process the quantification process and now we have the fast qc process we call it like that because there's a application of a software called fast qc that does quality control so basically let's first go to the workflow so so that's clear to you what's the the flow here so we're gonna create the channel with the channel factory we're gonna call the index process to create the index for that run script on then we're gonna quantify that by using the the index that was created and the repairs and then we're gonna use fast qc to call to do quality control on the repairs here so here we don't depend on the output of quantification so we didn't really have to do it this way but that's the way we're gonna do it here so basically we just need the repairs to do the quality control because that's what we want we want to do the quality control of the samples okay so this new process here which is called fast qc again we're gonna use the tag to make it easier to understand what task is being run so again a process is the block and when you run it we call a task so we have a task for every sample okay every pair uh samples per file sorry uh the input is gonna be again a tuple which is the id gut for example and then the the collection with the two paths in case we call in the case here we call bits and for the output there's gonna be a folder the path and part of the name of this folder is gonna be the same quality it's gonna be fast qc gut logs fast qc lung logs and so on so here we have for the first time two lines in the script block like i said at the beginning because we have triple double quads three double quads this was always a multi-line string so we could have many lines here uh we're gonna first create this folder because it doesn't exist and then we're gonna call fast qc by telling with the minus o where's the the folder the format which is fast q and then the reads path okay so let's run that this time i'm gonna use the the resume okay to make sure that we don't have to do everything again so it's a toy example but if it was a lot a lot of samples it would take like forever so we're gonna use resume and all the other steps they already cached only fast qc is the new one and that's what we're gonna see here so index was cached quantification was cached and then fast qc is not cached we are running it so here's one of one for everything for quantification fast qc because we only have gut right so we could instead use like everything again we presume telling all the the three like gut liver and and gut one lung and with ansi log of false so we can see each one of them so quantification on saw on liver quantification on lung quantification and gut they're all cached as you can see here and the fast qc again on lung liver and gut this was cached for gut but these two they were not because we hadn't run it before so one of the tasks for fast qc was cached the other two they were brown so this is something that you do at the beginning sometimes the thing is at the end of after everything is done you want to have like a report of everything that happened okay so multi qc is a very nice tool for that it supports many different bioinformatics tools and we're going to use that here at the end so we're going to open the script 6.nf which again has everything we've been doing so far okay uh so here we have the deer the log info we're going to have the process index uh we're going to have the quantification process we're going to have the fast qc process and now we also going to have the multi qc process so we're going to use the published here the terms out neither we sat we're going to use the mode for copy input going to be a lot of paths the output is going to be an html file okay and it's simply multi qc space and dot in the current folder and it's going to do everything for everything we've done for the workflow block we have everything like before but now we also have this call for the multi qc process uh we're going to get the output of the quantification which we call quant underline ch we're going to use the mix operator so again operators they operate on channels so whenever you have these functions working on channels specifically for channels they are operators so we're going to use the mix operator to combine the the channel quant underline ch and the fast qc underline ch and we're going to use collect which is another operator to get everything and put like in the list okay and then we're going to provide that to multi qc so in the end what multi qc is going to receive is a lot of paths so we can run this and see what's going to happen so again a lot of things are cached for multi qc nothing is cached because it was the first time we did and now we can actually see that in the results they're going to have multi qc report dot html and we can even open it results multi qc report oh it's not right here uh and soon it's hiding the thing okay anyway in the browser this will appear as a page in a very nice one i cannot close it oh my god okay so the next thing that we're going to do let me close this file so actually one thing we can do is this so instead of calling multi qc let's expect this channel okay everything is going to be cached but the end let's see so as i was explaining when we had this where is it so the mix combines the two channels the collector something in the list so we have here as you saw the brackets okay and everything is just separated by commas so we have the logs from fast qc for lungs for liver for uh guts then we have uh this folder so we have everything that we generated so far just together in a simple list and this is being provided to the scripts to the multi qc process that as i showed before it's just like everything the input is just a path many paths in this case okay well that's how it works the thing is as i said again this is a try example sometimes this can take like forever and one thing you you may want to do is that okay so in the end i want some customized uh message or even some email notification okay so you get an email when everything is done telling okay your process finished sometimes it's not in your local machine that can be in the cloud or it can be in an hbc cluster and you're not waiting for it to finish you're doing something else and you want to be notified so by looking at script seven dot nf again everything that we've done so far but in the end we are calling this handler this incomplete handler of the workflow which means that whenever the workflow is complete call log info and if it's a success against another keyword here for the workflow you're going to say we're going to print a new line that's why you have the the inverted slash here n it's going to say don so you can open the following report in your browser so you click here otherwise it's going to say oops something went wrong and for email notification you can even do that so you can create this structure here with other information from your smtp server to to send the email and then you can send an email you can have more documentation here and a lot of things that i'm telling you actually can go to nexlo.dox.nexlo.io and have all the documentation with much more details okay so the idea of this training is to be very specific with a few things that we want you to see but there are many more things in the official documentation of nexlo another thing you can do is to have actually custom scripts so our examples here they were pretty simple in terms of the script block so for the index process basically we had one line which is someone then for the quantification process again we had like one line for salmon again the first qc we had two okay but again they were very simple sometimes your script they are much more complex maybe not only complex but longer with more instructions okay and that's what custom scripts are so instead of writing everything in this nexlo file you could actually write it in a script file and call it from from the script block just like we are calling mkd and fastqc and salmon and multi qc we could call morcell's custom script for example and one nice thing is that you can write your script file with python or anything else like a shared script and you can put it in the bin folder in the folder where the nf file is and by doing that nexlo you automatically add this bin folder to the path so that it will be able to call your script from inside the pipeline script okay so here we are doing that we are making this fastqc.sh which is this script here we're making it exact as executable executable oh my god okay so we're going to create the folder mkd here with bin we're going to move it inside and then by just calling fastqc.sh on your script block you will be able to call it okay so this is how you can work with custom scripts again one thing that's important is that when you are doing that you must tell what if it's really a script language what interpret is going to be used to interpret that so here's bash but it could be a python script because a python script you would have here something like a user and or being python3 or something like this and then when nexlo calls it actually nexlo is going to give it to the operating system to call it if we will know how to interpret that file because you have the that's what we call a shebang that we know the shebang and what software to call to to interpret that a very nice thing is that you can actually run some reports and metrics about your pipeline so what we're going to do here is this long line which basically is going to do everything we've done so far so this this pipeline that we built from script one to script seven dot nf is actually the toy pipeline called an rna seek dash nf that we created at nexlo so you can run it with the name and automatically because you didn't say any hub by default if things it's on nexlo in here it already knows it's nexlo.io so it's going to download it from from github if it's not already downloaded not cloned or pulled it's going to run it for you what's happening here was okay this should work anyway basically what we're going to do here is that we are saying that we want to run this with docker we want it to generate a report and the trace and timeline so basically what it's going to tell you in the end is going to generate a picture for you with the with the directed acyclic graph which is a graph of the connection that you have in your pipeline like the processes the the input the output and so on it's going to tell you how long each process took to be run the amount of memory all these things so that you can tweak your pipeline to be the most efficient with the resources that you have so that's what this all does it generates an html html report okay um that is why this is not working so look at the log here it's a directory let's go to a different folder and run it there let's make sure okay i'm gonna try to investigate what's going on here and come back to that uh now we're gonna go to the next part let's actually take a a 10 minute break i think it'd be ideal to have like a 10 minute break i'm gonna investigate that and then we come back and i'm gonna talk about the dependencies and containers okay okay let's go back so the first thing is that there was a change in the revision of the repository so we have to set one that works in this case it's the dash r def that's where we have this pipeline am i providing that then the pipeline is gonna work it's gonna pull from github and everything is gonna run everything as you can see i called here it's still running but it's going to do everything another thing that i forgot to show you i tried but was just the html code here wasn't the the page rendered and we can render that and one way we can do that is that one second let's finish with the modic you see okay so we can go into here to the modic you see before it was just like that the html but actually we can click with the right button and go on show reveal and by doing that we're really going to see the page rendered here the html page and here we have the multi qc output okay with the percent of align missing alignment fragments a lot of information about that multi qc gives to you but then we also want other things here so we used in the command we use with report with trace and everything so we also have this report dot html and we're going to do the same thing we're going to ask to render it so show preview and by doing that you're going to see here the workflow report that's what i told you earlier it has information about the cpu used by every process we have four processes so here we have box plot of each of them okay we have the percent allocator the raw usage we have for memory the same thing so here the first qc is the one that uses the most memory then we have multi qc here sorry fast qc the modic you see and quantification index are very uh more footprint but here is again as a toy example we have not a lot of data okay we can have the job duration so fast qc by far is the one that took longer took the longest compared to the other ones modic you see a bit but fast qc took a lot more we have here input and output so number of bytes read okay we had tasks each one it was completed or not the liquidity views we went by default so it's one so everything here we have six entries okay so the number of processes and the number of time it was called for every sample so it's nice to have that because sometimes people ask like oh so what's the number of cpu that you use how much memory i should request from my hbc cluster or from the from the cloud uh all these questions that actually it's very difficult to answer if you have never run your pipeline but by using these reports you can keep tweaking adding more cpu less memory and in the end you have the machine that expects the specifications to request that are really tied to how your pipeline works so these reports they help a lot with that uh one thing you can do is to run your project from github as you just saw so you do that it automatically is going to think it's github if you want you can use the dash hub and it's going to tell you uh what how to use if it's big bucket github or get the it and so on um here we're going to use the revision from davis i should before again there are much more documentation and details about everything that i said so far and much more in the next load official documentation page which is docs.nextload.io we have this next load patterns page which is very nice it's a collection of implementation of processes that are a bit more tricky but they often come up so how you do a process collect outputs from many different processes like we did with multi qc here it's an extra pattern so many things that you cannot do at first because it's a bit complicated someone has already faced that and created an extra pattern for you to look and you can check it there now the last section of today which is the section four with managing dependencies and containers this one is very interesting it's tied to many things that haven't said at the beginning about reproducibility and isolating environments and so on i talked about docker and singularity so here we're going to do a very quick uh introduction to how to use docker okay so docker is already installed and running in this uh in skip point and sense so basically we can just use docker run and say any container name so here we low world is a very common uh to use it the first time that you're using docker uh here as you saw when i typed the name wrong correctly it's going to try to pull from the internet it's it can find it locally so it's going to try to pull from the internet but then doesn't exist but for this one which exists again it cannot find it locally but then it can remotely and then it pulls it from docker hub and here it's running so basically you have hello from docker it's nice to show that your installation appears to be working correctly and so on it's a toy content and images so there's not much to do here if we really want to use a real one we can use for example a debian image debian is a linux distribution so here we want to not run for now we want to pull like download and the tag is stretched slim because i can have many different versions of a container image so here's a version of that mean for example so by using docker pull i download it so he says downloaded the new images for debian stretch limb uh from here which is in docker hub if you type docker images in your command line you're going to see all the images that are downloaded in your machine so here we have all these images okay if you want to run it you already saw you can just use docker run and the name of your image and then what you want to do inside so when we use the hello world the image was created in a way that's going to print some content to us but here we don't only want to do that now so now actually i want to run a container to have a terminal and to interactively interact with it and then we're going to call bash which is a shell the terminal for me to interact with this container image by typing that we are inside it so we are not in our machine anymore i mean we are in our machine but we are in this isolated environment which is a container being run okay so we already know how to we already know how to download a container image and how to run it and how to run it interactively so that you can do things inside but it's not very useful if you want to change things because when you kill a container image when you kill a container everything is lost the information there is not persistent you have to create in a way that is if you need it but here we don't want to use only container that container images that other people created we want to create our own so here one example is to use a docker file so i saw something commenting why i'm using vim them as an editor here so i could simply do the docker file and right here but i could also come here and the thing is the zoom thing is always at the top bothering me so i could come here and create a new file okay it's here create a new file and call it docker file for example and then i would write it here for you it's the same thing i'm just using the vim here because when i'm writing with this uh the dialogue here i lose the content i'm trying to show to you here and while i'm typing some people may want to read what's in the training material in the transmission in the broadcast so by typing here in the terminal i make it easy for people to still follow what's being shown here okay but you don't have to use vim you can use anything you want so here i'm going to create a docker file okay and i'm going to basically copy paste to make it quicker but i'm going to explain everything to you so basically in the docker file you you can have a maintainer field which is your name the creator of the container image but here i'm just going to ignore that then you tell every line is a layer that's how we call it so the first layer we use from which is the base image from the very beginning where we're going to start usually as you saw in evan stock container images are much smaller than virtual machines and the the idea here is that depending on what you want to do you want a very very light and simple image to start building on top of it so here we're going to use the stretch slim debian it's a small image and we're going to run one layer a command which is basically going to update it and install curl and call say which are two uh softwares okay that's going to give some environment variable okay here path i'm going to add actually to the path the slash usr slash uh games once we have the docker file we have to build the image okay basically we're going to use docker again but the command this time is build and then we have to give a tag or a name to this image i'm going to call it my image and then you have to give this dot at the end to say that the docker file is in the current folder so create an image based on this docker file and by doing that in the end we're going to use docker image again to see all the images we have in the computer images and my image is now here so we didn't say anything so the tag is latest has an id create seven seconds ago the size 130 megabytes and again we're going to use docker run to run our container so the name is not hello world now it's my image and instead of bash to interactively enter i don't want that i want to run call say which i know is installed because i said it to be in my docker file and then i'm going to give a parameter an argument for this call say software which is a phrase you do and a call is going to appear and say something so here it's not this call say is not running locally me in this is in this environment we are using it's actually inside a container inside the machine that we are using okay so it's isolated and this guarantees that other machines in your computer or the internet and the cloud and hpc it's going to use this container and we're going to get the same results but again just using a curl a curl and call says not really a lot of things we want to do more things what we're going to do is to add this line here again i'm going to cut paste to make it quicker and i'm going to use the curl which is a software to download things download files i'm going to download salmon and i'm going to move salmon to this folders that are already in the path of the operating system so when i say salmon the container we will know where salmon is okay and you say you probably can see here that i'm using curl i'm using tar to uncompress it and i'm using move to move it somewhere else in the system there's a lot of commands and i'm using a single line so why don't i have many different lines like you run with curl run with tar run with mv run with mv again the thing is as i said earlier every line in in the docker file is a layer and the more layers you have the largest gonna it's not only going to be larger but it's the way docker works maybe a bit slower because it's going to download per layer if you change one layer it's gonna even tiny thing is going to change is going to download the whole layer again so the way you do that you have to think wisely if you want to have one run expression with a lot of things or many different runs if for example you have one run with installing many different packages it's not very ideal because you know sometimes you want to change only one package and the whole layer will be we have to be downloaded and if it was separated runs only the last layer would have to be changed so you know you have to think on how to do that it works both ways but sometimes it's better to use one way or the other okay so by having this here now adding someone again i'm gonna build my docker image uh minus t i'm gonna say the same name again and dot because the docker file is in the current directory so it's gonna do a thing if it was pulling from the internet the first part wouldn't have to be used because it was already there here it was much quicker than the first time for example because it already did everything else and now i can do the same thing i did before we call save and bash i can just run my image i'm gonna call someone and i'm gonna give an argument dash dash version and we can see here the output which is someone 1.5.2 but if i type here someone version oops comment not found because someone's on stall so i'm saying i'm saying this just to make sure it's clear to you that someone only exists within this container that we created it doesn't exist here in my normal environment okay another thing you can do as we did before whoops is to enter interactively in my container and play with someone from inside so here if i do someone version it will work because i'm inside the container but as long as soon as i leave the container i cannot use someone anymore okay a few minutes ago i was saying that you know things we put inside the container they don't persist they are isolated so if they're isolated how can they create files there or here and make them read and so on what we're gonna do here is do something that is very similar to what we've done before just docker run my image and i call sauman i'm gonna call index and i'm gonna give the path to the transcriptome reference file that we used in the rnsc pipeline okay and the output is gonna be the transcript index this is not gonna work and the reason this is not gonna work is because even though we know that here in data uh ggl and so on we have this transcriptome.fa inside the container it doesn't it doesn't know where is that you know so here we can go to data uh this this and then we see that but in the container it doesn't so what we're going to do here is to create a volume so again docker run but then we're going to do desktop volume we're gonna say you know this file here in my local machine i wanted to be here in the container image and then i'm gonna say the image i want to run the command and so on and now it's gonna find but again everything happened inside the container so i don't have the output of this command do that i have to do the other thing i have to also say that what's there is here what's inside the container i want to be here so this this time we're gonna use a larger volume i'm gonna say like my working director the director i am right now it's going to be the same thing there the work here is going to be here image and everything else and by doing this the container will see the transcriptome.fa that is my local director here and i'm going to see the output that was generated there which is for the transcript index okay as we said here there are other ways to do that but i mean you're free to get into the other details that could be done here's just one example as i said you can see everything inside the folder this is locally so it's a very simple container image but i could have you know a very complex one and everyone in my lab wants to use it or at the company or we want to ship to clients so it's just locally now so one thing we can do is to pass it to docker hub or to any other image repository like quay.io and by doing that anyone can pull from that container image repository just like we pulled the hello world example or the debio one with docker pool anyone can do that so i'm not gonna do it here but basically you're gonna you have to create an account in docker hub you're gonna login you're gonna tag and you're gonna push instead of pull like we did before we're going to push our image and by doing that anyone can pull it the way we did with the debio one okay so we did it before using the image so when we had the next fold.config we saw here the process container we said next flow slash rna seek-nf we could do now is having like your username is slash my image so there's something that you can do in the break or tomorrow or today after the training and ask questions so the channel for questions on slack is not only for questions during the training today after the training tomorrow all these three days you are free to ask questions there and we're gonna do the best to help you so you can try to redo everything i did here you are free to do this part like pushing an image that you created to docker hub or quay io and then using it with salmon for the script 2 and f so to remember what's inside the script 2 it's just the very beginning using salmon to create an index okay so here we're gonna use different container image which is the one you you would create by following these instructions okay another thing that it's not only docker that you can use to have container you have singularity you have charlie cloud you have podman you had many different technologies for working with containers and depending on on your infrastructure one can be preferred over the other so in hbc clusters usually people prefer singularity or podman over docker in the cloud it depends so you can really and they are very interpretable so images from docker can be run with singularity with podman and so on okay so here let's go to this section on conda so another thing you can do actually is to use conda to install so we are using api to get when we were looking at the docker file we were using api to get here to install curl and cal say and then cur to download the binary of someone in bioinformatics is very common to use condas install applications and not only that but to create environments in your machine where you can run these applications very isolated ones even though they're not even close to containers in terms of isolation but they're much better than just running locally okay so basically what you can do is that you can write a yml file with instructions for conda on how to install this environment that you want to use okay so you can create this tv.yml here it's already created so basically it has a name of the environment that you want to name in your in your machine the channels of conda that's going to use to to look for the software that you want to install and here the names so here you have saumon fosqqc multi qc here we have i want specifically from this channel because some channels may have the same software you can tell the specific version that you want and by having that you can create your environment by typing conda and create a new past and the yml so it's going to collect package metadata the dependencies and everything and it's going to create this environment and install all the software that you requested which here are saumon fosqc and multi qc this may take a while i'm going to keep talking while it's doing that actually it's better to wait a bit but the idea here is that you're going to download all this this software install it in the place in your computer so that just like saumon when you call saumon in your command line it won't find it because it's installed in an isolated environment so you can use conda and the list to list these environments okay and you can use activate with the name of the environment the one we just created is nf dash tutorial so by doing that it's going to activate your environment and now you can call saumon and it's going to be found it's not going to tell you command not found it's really going to work and with that we could run the script seven that's like the final script will be the whole rnsc pipeline that we created in the section before and by saying with conda and telling the path of the environment that you created here's nf tutorial it will use saumon even though it's not installed normally let's say in this machine it's installed in a conda environments so if you don't want to use containers you can use this to have the installed softwares isolated in an environment in your machine but still this is not really a good practice for reproducibility because you still have a lot of other things in your computer that can mess with that okay it's taking too long here another thing you can do is to use mamba or micro mamba which is very related to conda to install these packages and in micro mamba is much faster but apart from that it's very similar to conda so the same way we have the xaml file all the dependencies channels and name but i think here what we're going to do is that we're going to write a docker file instead of using curl or wget or apt get we're going to use micro mamba to install all these packages here and then we're going to use inside our container which is a very isolated way a very reproducible way of installing software which is with micro mamba i want this software here like fastukc and this specific version from this channel in a conda repository installed with micro mamba so it can really take control of how it's installed and because you're also using a docker file because you're using uh container technology it's also isolated in this sense it's very reproducible you can share your image and everyone can reproduce what you did so that's what we're going to do it now we're going to create this docker file here so you saw that it's still finishing to create the conda environment but we're going to create this here that's taking longer than usual i'm going to create another terminal here create this docker file it's going to have micro mamba now let's put here just so that you know how you could do it so i'm going to have this i'm going to copy the yaml file that we have with instructions to the container image to the container i'm going to create this environment i'm going to install i'm going to clean and tell the container where these things are so before i have to create my macro mamba yaml file i can build my docker image go to the other terminal so it finished it finished creating here i could activate my nf tutorial so here as you see there is no summon but if i activate let's look at that what was the name and nf tutorial i think it closed in the end so here we're creating the the micro mamba the container image that have everything installed with micro mamba and with that we can just call after appealing that we could run the script 7.nf with docker by providing this image that we just created with micro mamba managing the installation of the packages so this is taking a long time just to do that i will skip this and go to buy container so we can end which is the last subsection of our fourth section on dependencies and containers so buy container is very interesting so you see said okay Marcel you said that konda used to manage these installations in buy informatics we have containers to have view isolation but konda and macro mamba they manage very well the installation uh so okay so the best way is to do what you just did which is to create a container image with docker or podman or something else but manage the installation of the packages with konda or micro mamba but i don't want to do that all the time like every time i want to have a container image with this in this package i have to create everything i don't want to do that so the bio containers is a project that was created to have it ready for you so you can just go to the bio containers project website and look for containers container images that are ready there with the software that you need already installed so for fast you can see for example we already have a bio container that already has fast qc installed inside and actually you can look for all their bio containers images that have everything you need and then you don't have to create anything it's already there very lightweight very simple with the only things that you need so here what we can do for example is to run the script to which again is a very simple creating index process that we created at the beginning of this session today but then i want to use this docker container which is the bio container that has someone was the wrong folder so now the script to you remember it's only with this index process you're just calling someone once so the only software we need is someone so i'm going to use a bio container that is very lightweight but has someone inside and then it's going to work again even though someone is not installed in this machine so when this is finished i'm going to run someone and you're going to see again command not found but here i'm saying use this container image docker so again someone is not installed so there are many some there are some bonus exercises and some other exercising contents here that you can check so i really uh suggest you to to go over the whole train today with the things that evan and i showed okay feel free to go to this like channel and ask every question that you have no question is simple enough so just there's no problem at all it's a lot of people who are really looking forward to reply to your to your answers and see