 So this last chapter treats two kind of distinct things. One of them is a singularity and how it relates to other container software like Docker. So we have been only using Docker in the exercise until now. Now we'll do some exercises singularity as well. And I will of course introduce you to the software. And in addition to that is some information about how you can use containers inside pipelines and we will show some small examples on how to use containers in for example, snake make or next flow. All right, so why singularity we already have this great software called Docker so why would we need some other software to work with containers. Well, one major reason for using singularity is to be able to easily run containers on the hype for this compute cluster or more specifically on a computer that is used by multiple people. So on high forms compute cluster or service in general, and that most users service in general users have different levels of privileges usually have one or two people that have, for example, root access and others they can they do have access to some directories and and not to others, for example. In addition, what's also often the case especially for high forms compute cluster is that users submit jobs to a specific note, for example to a different note with some time memory or CPU restrictions so that's usually. Well what you do if you use a job scheduler like like slur. These are two things that Docker doesn't really facilitate very well. So, to use Docker to its full extent you almost always require super user privileges, meaning that you kind of have to be roots on the computer you're using Docker at. And in addition. So that makes it difficult to use on HPC you don't want to give everybody super user privileges, of course, because otherwise, people might destroy stuff. And in addition, these Docker commands as we have learned are an API of a demon and the demon always runs in the background, and is therefore parameters and that means kind of that you cannot really ship a status of a demon to a different note and do the calculation there for example so it's difficult to have your collection of images for example that you now have on your computer and then ship that entire thing to a specific note to do a specific calculation. So that's why Docker is not really made for high performance compute clusters and we need something else. Well, singularity can help you with that. One very important feature of singularity is that your user identity and therefore also permissions are the same inside the container as outside the container meaning that if you run a container with singularity, you have exactly the same permissions inside the container that of course, give some issues when you're developing a container for example, if you are on your high performance compute class and you want to develop a singularity container, then you still have the same permission so you're not rude anymore inside your container so you cannot install specific things but there are ways to work around that. And singularity is a child processor there's no demon there, meaning that you can just take a container take an image execute as a container somewhere on a distant note, and then stop it, and then also of course then restrict memory time and CPU usage of that That also means that images are false because you have to move your images around or at least it has to be executed somewhere. And you have a different image format compared to Docker. In fact that images are false, of course, therefore you can almost for large part you lose the nice advanced you have a Docker where you can reuse these layers. So if you store an image with singularity so it's stored on disk and it is really the entire image that is stored in that file. So when should you use Docker when should you use singularity. Basically, you can do most of the things with both Docker and singularity, but in practice what often happens is especially with mathematicians but also with other people with other with other professions. They tend to use Docker for for example container development for testing of scripts for example for using them for continuous integration continuous development. So you can really, when you are making some changes to software package for example you can it's very nice to use Docker containers to test those scripts in and for for sharing images, particularly through Docker hub. And most mathematicians use singularity if they just want to deploy an image on a high performance compute cluster. So there's a very basic commands of singularity it's called singularity pool like Docker pool, and you can just directly pull images from Docker hub and convert them into a singularity image and directly use them with singularity. You can also use singularity without Docker. There are, for example, similar files to Docker files which are called singularity recipes that you can write and with based on that singularity recipe you can build an image. So there is singularity hop, meaning you can both share and build images directly on singularity hop. But you have to take into account that you have the same permissions inside the outside container so in order to build a singularity image you would need to do that somewhere where you actually do have root access can be a Linux computer for example. And what you can also do is is build a singularity definition file and then build the image on a runner somewhere. I think I did that few months ago, and I think singularity hop has a runner that you can use to actually build your singularity images. So you can do that either with a fake root option or build with an external runner. It's not entirely necessary, so as I just said in the previous slide, what people tend to do is they develop with Docker, they test with Docker, and ship with Docker, but then if they want to use that on a high-form compute cluster, they just use singularity pool to convert a Docker image into a singularity image. So what about singularity? When the exercises we will use singularity to do exactly that. So what we'll do is you have uploaded a image to Docker hop and we will use singularity now to actually pull that image and convert it into a singularity image and then actually execute it with singularity and see if it still works, yes or no. Then a few words about pipeline development. So the whole idea of course about pipelines or not the whole idea of maybe 80% of the idea of pipelines is reproducibility. With reproducibility we mean that you can, for example, build a pipeline once or your colleague can build a pipeline once and you can just move it to your specific cluster, for example, you should be able to do exactly the same calculations as your colleagues. I think you can almost say the only way to do that is with containers. So with containers, you can exactly have a duplicate of a computational environment as somewhere else. So usually containers are used inside pipelines or at least that's done very often. Luckily, you don't have to develop these containers yourself, or of course if you have a specific part of your pipeline where you have developed a specific script for, of course you can. But a lot of very common bioinformatics tools. You can find them fast, you see that tools, pretty much everything that is on bioconda is also packed inside a container so you can directly just pull from by containers. The image for your tool of interest and use that inside your pipeline. So how do you use them in a pipeline. I have two examples, one of next flow one of snake make. One example of next low with next row you can specify and that actually also comes with snake make, you can specify containers at multiple levels. Here I have an example of specify a container directly in the process. So you really specify, okay within this specific process I want to use this specific container and you can just use the namespace and repository on on darker hub. And tie it so next flow takes care of all the user identities if you're running in Docker or singularity and execute the command directly inside your container and they'll use base. Yes, so it will use the command of that container, if I'm not mistaken, I'm pretty sure. So I think if you have an entry point you might run into issues here. For snake make looks pretty similar. There you specify that in the rule so in next row you call the process snake make you call the rule. You have your input in your output and your, for example, yourself month in a snake make rule, and then you can also specify a container in which that rule should actually run for both of them you can also specify containers at higher levels. You can for example have containers that can handle multiple, multiple different processes or rules and then you can specify the container only once and refer to that container. You can also specify your process for next low or inside your with, for example, labels and labels and tags. You can take it a step further. Because both snake make and next flow support the use of both conda Docker and singularity so you can specify whether you want to use conda to run your process or rules in Docker or or singularity. So you can combine those. So you can with both snake make and next flow, you can say okay, I'm going to specify a conda environment for example in a conda yaml file. And then you can say to the, to the, to the, to the pipeline software, okay build me my containers based on these conda environments. That is quite nice, because you can. You make your, your, your pipeline, a very diverse so meaning that a user can choose to you to to run your pipeline with conda with Docker or its singularity and the only thing you have to specify as a developer is only the conda yaml file where you have all your dependency that is specified with all their versions in there. So, by doing that, you are very close to making very scalable and reproducible pipelines because you can use all three of those tools in order to make reproducible calculations. So what that work well here's an example for snake make. So let's say you have your conda environment defined in the yaml file. So this is how one of those yaml files typically look so you have you specify channels, you specify dependencies with their versions. And then you can say in your snake make rule. Okay, if you're using conda use this particular yaml file. You then execute snake make with a conda flag. What we'll do is create a conda environment based on the on the yaml file and then execute your this particular rule inside that conda environment. Once you have this specified, what you can do is use snake make to actually containerize this these conda environments, and it will create a dockerfile for you. So, entirely automatically, you can build that dockerfile and make a container or make an image out of it. And then specify that particular container in your SMK file. And then what you can watch them nice is you both have your conda environment specified but also specified what image to use if you're using containers to run that particular rule in or the particular set of rules in. And then, if a user is using your pipeline what you can do is at the snake make executable level can say okay I want to execute my pipeline using conda singularity or docker and everything will work and everything will be based on that particular conda environment yaml file. There's a question of ribbon I think. Yeah, I mean I see some similarities here maybe and we're not with make files, new system, like you give some recipes and some dependencies on building a pipeline. Is that similar to this, for instance, snake file? Yeah, so I don't have too much experience with make files. Yeah, basically I mean me neither I just started a few weeks ago with and but basically you can you can write a list of recipes and those recipes also commands, which are interdependent of other files or other tasks. So, in order to execute let's say in your case is the final plot or plot of stuff, you need probably some input files and some need probably some processing file in between and some stuff like this so you build the whole pipeline. Somehow, and then at the end you have you. Yeah, yeah, so snake bake next flow so that that's of course that's a that's a whole other topic than containers so they snake maker next flow they use containers. Well they can use containers. Well they are running the pipeline but the concept of snake maker and next flow is indeed describing these pipelines. So what should happen when and if I have different input files. They can very nicely perform the calculations for you in the correct order and that's everything you have specified in these in these rule rules. Yeah, thank you. But yeah, I mean, I would probably share some information so you can oh for everybody. Sure. Thanks. So, the big advantage of this so if you are interested in reproducibility really have a look at this in particular if you're looking if you're using snake make next flow has similar functionality for snake make I put the link to documentation on how to do this at the end of this of this slide. So big advances is that you specify your environment only once and that's only in the Yama fell for for us you don't care too much, or you don't care about what kind of installations you have to do specified in the Yama fell. In this case snake make will make a docker fell for you the only thing you would need to do is just have to build it. And then you can just run your your pipeline in either conda or docker singularity. It's very much improved your ability because you have your everything specified in this in Yama fell which is relatively easy to read. And if you compare it to conda. You have run pipelines with conda before and if you re run a pipeline especially with snake make it needs to read download read install and everything, all the conda dependencies if you're rerunning your pipeline. And that's not necessary with you have containers available because we'll just reuse the already downloaded container. So, we have some exercises. That will consist of converting your own docker image to a singularity image and that will be basically or only just by running the command singularity pool. We will do some execution of containers and mounting with singularity so with singularity you can also mount directories. It happens in a different way actually by default. All of the, all of your environment is actually mounted already by default inside the container. And what you also do is use bio containers by containers image to do some actual bioinformatics. So, we will use the fast QC container bio containers in order to do some quality control on on sequence reads.