 Hello everyone and welcome to another Bite Size talk. With me today is again Marcel from the Care Labs and he is going to talk about how to use wave containers. Thank you very much and off to you. Hello everyone, it's a pleasure to be here again. Let's talk about wave containers today. So I'm going to share with you my browser so we can discuss a few things. Let me see what's the right page to share. Okay, can you confirm you can see my screen? Yes, we can see it. Okay, so wave containers. So that says next generation container provisioning for data analysis. So what's the thing? So containers were created. I think the first time I got into the Linux kernel was 1998 or something like this. So it's been a long time the idea of having containerized applications exist. But it took a while before, maybe 29 I think. So it took a while before it really got to us. Nice application, nice package and very clear how to use it. So I think around 2012, 2011 was when Docker came up with all these nice applications and nice examples and really kicked off, right? At the right time, at the same time around 2012, 2013, we had some pipeline orchestrators technology is emerging. So it was very nice that at the same time when people were trying to fix their reproducibility crisis in science, we had a technology that allowed us to containerize applications, right? So most of this pipeline orchestrators like NexFlow, SnakeBake and many others, they at some point started to try to incorporate Docker but not only Docker, other container technologies to make sure you could isolate some tasks, right? So the idea of a container, just what the name says, is to put stuff inside a container. So you can put your dependencies, your libraries, your software, maybe you can move your data inside so that you have a container, like a box, which is very isolated from everything else and you can do stuff inside. This means that I can move my container to your machine and you can try to do the same thing I did and everything points to the fact that it's probably going to be the same thing, like the output, because everything's isolated in a way that you have this reproducibility, right? So because of that, a lot of different projects and fields, they start using, heavily using containers. However, at some point we saw that there were some limitations and Paolo decided, like Paolo did to Mazzo from Sekiera, the creator of NexFlow, he saw an opportunity to create a technology that would empower us to use containers even better, right? So that's containers, that's wave containers, right? The next generation of container provisioning for the advance, but provisioning means like building, deploying, and all these things related to containers. So I'm going to move through the website, which is very short, so I'm going to be very also short on the description here, and then we go to get pod where I prepare a few things for us to see, right? The main thing, there are a lot of things you can do with wave. I picked a few of them for us in the short timeframe we have today to go through and so you can have an idea about the amazing things that you can do with wave. So with wave you can build on the fly and you can deliver these on the fly build container images. So what does it mean? So you're probably aware, you're probably aware that the best way to use containers with NexFlow most of the time is to have a single container image per process. So instead of having a single container image for the whole pipeline, we choose to have one image for every process. So this means these images, they are light, they are easy to pull, they're easy to push, they're easy to start, and easy to stop. Both easy and fast, right? Because they're really light containers. They only have like a single application or a few applications. It's very light. And because it's so simple, it's easy to debug. So if you run into some issues in your pipeline, if you have one container for the whole pipeline, there are too many soft, there's too many pieces that can go wrong, right? So by having small and simple container images for every process, it's easy to debug, it's quick to pull, and you may think that maybe pulling is not an issue, but imagine you have a thousand samples, you're going to use a cloud, cloud computing, you're going to create virtual machines for every task, you may have to pull a container a thousand times. So if it takes a minute or five seconds to pull or to run or to stop, this makes a difference in the end. Mostly because not only time, it's also money you're spending, right, on cloud computing or time in your plus infrastructure and so on. So the idea here is that instead of having to preview your images, you can view them on the fly. You can give instructions to wave through next flow in several different ways, there are many ways you can do that, and wave you build this container image for you, provision that and deliver it to you so that you can run your container. This recently built on the fly container, okay? So the idea is to simplify pipeline development because you don't have to worry about building your images anymore. You just work with next flow and wave with next flow to create your image. You can improve pipeline performance because your containers can be optimized for platform. So if you're running on ARM, like Apple Silicon, it's going to build the container for Apple Silicon. If you have like Gravitron on AWS, right? If you are using Intel, you see it identifies the platform, the target platform, it's going to build the container image for that platform, optimized. The emulation will be required. So you have an improvement in pipeline performance. You can also authenticate against container registries with centralized credential management. So I think here, you probably saw this announcement by Docker where they were saying that soon people will be limited, there will be several limits, right? And you can think that, okay, so I have to pay for Docker and then I have better rates and so on for limits of pulling and pushing and so on. But if I don't want to pay for Docker, it's the same. Actually it's not. So as you can see here, anonymous users, so if you just type on your command line, Docker pull some container image, a anonymous user will be limited to 100 containers pulls per six hours, which is like nothing, right? And free users, which means you are authenticated, even though you're not paying, you're just having a free account and you authenticated your limit twice number of container pulls, the 200. Other container registries, not the only Docker Hub, they are starting to work in this direction. So being able to authenticate is turning to be a crucial thing when you're running your data pipelines. And the fact that all these things are automated, you have like a 20,000 tasks, you don't want to authenticate manually or anything like this. So the first interesting thing about Wave is that it can authenticate for you, right? You can also improve productivity by focusing on pipeline logic, right? You only work on your pipeline, don't worry about building your container or augmenting it or anything like this. And you can have them in a secure, regulated environment. So you can maximize productivity using for the package and so on. So I will try to be, I will try to stop here. I mean, you can go to the Wave website and see all these topics here and read with more call. I will instead take you to this repository, which is github.com slash secure labs slash wave dash showcase. Okay? So here you have 11 interesting things you can do with Wave. I will pick some of them to do it here. So for this, I will focus you on the demo. So if at some point I am like going through the time, please let me know when we stop, okay? Because I think it will be in time, but we never know, right? Okay. So the first thing I would like to mention is authenticating, right? So we can go to example, the folder, let me turn off this here. So example one. So in every example, you have this run.sh file, which basically shows you what the command you should run, which is basically next low run demo.net with wave. So the same way when you have with tower, with wave and so on, you have with wave here. So I created a tower access token specifically for this. Let me get it here so that they can also see on tower when things are run, right? It's a bit slow I think. And then I'm using tower cloud at this access token. So we can do next low run demo.net with wave and with tower. Or I'm going to open this demo.net so you can see. So you see here we have, oh, it does not work because it's Paolo's private container. But you see, there's a container here, which is private, right? And I cannot authenticate to this one because it's Paolo's applied. But the thing is, when I simply run this pipeline with the container, which is required authentication, and I don't do anything but say with wave, it's going to use my tower token to try to authenticate to the credentials that I added at tower. And by doing that, automatically we don't have to do anything. Next, we will be able to pull this container, right? So this is the first example. So basically don't do anything. You just do with wave. And because you have your tower access token configured like I did, if you export tower underlying token, it will be able to authenticate and pull the container, which is in a private container registry, right? This container two example is more interesting. It's building and delivering a container. So let's see how the folders are here. So you see, I have a demo NF again, which is just importing a module called hello and calling this module, right? So let's go and see the source code of this guy, modules, who, main, but an F. And here's a very simple module. Hello, how say hello summits, right? The interesting thing here is that in my Nexo.config, you know, I have Docker enable true, and I'm going to run this with wave, right? I'm going to run this with wave. And it's going to work just fine. I'm going to say the hello world that we saw. It's pulling the NF tower NF wave, but I have to talk, oh, the token, let's take a few seconds because it's building a container image, it's pulling it and running it. But then I want to show you something because the interesting thing is that when you look at the module here, you're going to see a Docker file. You can see this Docker file. It's based on Alpine. It's installing Cowsay. That's why Cowsay is working here, because in Alpine, in this container image, we don't have Cowsay, but that's the whole Docker file. That's how we know that it was built and everything else. If you do a Docker image here, you see this wave.secure.io. This is the container image that was built with wave and pulled by this machine so that we can use it, right? It's really annoying, this thing, and I can't, I don't remember how to turn it off. Oh, it's here, I think. Okay, great. Okay, so in the second example, chat or something. Okay, then I will answer some questions. So the interesting thing here with this example, too, and we can even, let me try to do like Docker 1, TI, Alpine, Cowsay, and you see that it's going to, I don't have the Alpine image, but they have this executable final file. So I'm proving to you that in the Alpine container image, we don't have the Cowsay. So we have to, this Docker file was really built, right? And how does Wave know what Docker file to build? So basically, inside the modules folder, the full folder for this module, we have a Docker file. So automatically, there's the default for Wave. If you find this Docker file to build, there's no container instruction here on main, but because there was a Docker file in this folder, it will add here container, blah, blah, blah, blah, that we pulled, right? So that's the most straightforward of using Wave. You have a pipeline, you have your modules, and you want every module to have its container image, which is the default, but you don't pre-build your image. You just create Docker files in these folders and Wave will build these images for you. That's the example, too, right? It built, it gave it to me, and that next, the next integration with Wave, which is this NF plugin that appeared at some point, we coordinate everything so we don't even have to look at it to just build and load and everything else. The other example, which is very nice is example four. This one is about building based on Conda. So the run command is to do the same, run. Here we have RNA-seq-NF, which is a next load, next load-io-slash. So it's this pipeline that you can find on GitHub. We don't have to say GitHub here because it's the default. If it was GitLab or something else, we would have to add it before. So the interesting thing here is that if you look at Metro.config, you see this scope here, this configuration scope called Wave, and the strategy is Conda. So because of that, it won't look for Docker files. If you look for the Conda directives that we have, and based on these Conda directives, build a container image, so on these Conda packages. So basically, you don't have to create a Docker file. You just say what Conda packages are there and Wave will be able to find the dependencies and everything else. It takes a while because it's going to resolve the Conda dependencies and going to build the environment and create a container image and everything else. So I won't do a demo here, but just bear in mind that just by saying this strategy is Conda, that you find the Conda directives in your pipeline and you build images for them. So if you didn't do anything related to containers in your pipeline, you just use Conda by using this Wave strategy and adding actually that wave in the command line, it will be the container images and your pipeline will run on Docker. The last example is about augmenting containers. That's very, very, very nice. It's one thing that I like a lot. So this means that basically the command is going to be the same again. So next flow run demo.nf-wave. The interesting thing here now is that we have our demo.nf like always. It's the main.nf, right? It's the pipeline script. We have a modules folder and here we have full again and we have the main.nf, but we have another free here, which is resources, usr, locale, bin, say-hello.sg. We have the next local fig again. Let's open my next local fig, say Docker enable. That's fine. The interesting thing here is that if we look at the demo.nf, it's just importing the full. So it's just a module. I'm importing and running this module again. Nice. Let's open the main.nf of this module. So it's using container. It's using this container, which is very light, contains bash and it's running say-hello.sg. Okay. So let's run this bash container. Let's get inside it. So it's not here. I'm pulling it, downloading it. It's going to run the container. That's why this run command minus TI does it. It gives me an interactive terminal, right? So I'm here. I'm going to do say-hello.sg. No command, no file found. Let's do like this. Okay. So I'm proving to you that the local being here, this say-hello doesn't exist. Okay. And I can even do that to show to you it doesn't exist, right? So no such file directory. So I'm proving to you that this container here, it won't work. This task will fail because I'm telling you to use this container and to run this binary file, this bash grid, which doesn't exist. The thing is, because I have this resources folder inside the module folder, Wave knows that, you know, we're going to use this container image, but I want to augment it. I want to add something to it. I will create a new container image, which is just like the one we are specifying here, but I want to add another layer with this binary. And I want to place it in USR local being. I want to place this file there because this is in the path, this will become executable, right? So I'll leave the container, take your case, for example, five, as we were showing before, that's the research, the restructure, right? So we're going to run now the same way we were doing before, just with Wave, let's clear this queen so we can see it better. And this was a failure, right? Because I showed you that this binary here, this script, it doesn't exist in this container image. But actually what's happening as we saw is that it's going to build this, it's going to pull this container image, it's going to augment it with another layer, which is this script that we saw in resources USR local being. And then I'm able to run it towards hello world, right? Which is what inside this modules, full resources, USR local being. So this script basically says echo hello world, okay? So how can I prove to you that this happens? If you do docker images, you're going to see all these images here, you're going to get this guy, say hello, and now it works. But you see, we did it before, like a few minutes ago, and it didn't work. It said no file directory. So you see this container image, it's indeed this one, but augmented a layer with this script that we can do here. So I'm not sure I can explain to you how important this is. So we have a technology that's here for an X-Local fusion that makes you run pipelines on the cloud, spending less money, faster and all these things. And fusion makes buckets look like they're shared file systems, right? The issue with that is that every time you run a container and you try to use fusion, you need the fusion driver, right? You need the fusion client to connect to the share file system and to manage all these things. So this augmentation here works exactly there. Every time you have a container and you're using fusion, it will augment it to include the fusion client. The same thing, like maybe you work in a company that you have some authentication software that you have to add to every container, you don't have to rebuild all your containers from scratch. You can just use the other images you have and use Wave to augment a layer with the authentication software. Or maybe there is this amazing container by someone, they have some tools installed or BWA, but you need to add some tools. You need to add a second tool in this container. You can augment it with Wave. So Wave allows you to get these already built container images and augment them with new tools or new things you want. And you don't have to do it manually. Next one's going to do that for you. So as I said, you can come to this GitHub repository and see all these nice examples. There are some very, very nice examples, even with fusion, with Kubernetes, interactive debugging of remotely executed tasks. It's very, very nice. So many things you can come here and look by yourself. And every folder has this read me, get into more detail about what's happening, right? They run at that stage, which is the exact command you have to type, the nexel.config, among other things. So with that, I will stop. I'm sorry if I went over the time and I will be happy to answer your questions. Oh, thank you very much. There are indeed already questions. I'm going to read the ones in the chat, but after that, anyone is also allowed to just unmute themselves. But for now, let's go to the questions that are already there. First one is, what are the differences between Docker containers and Wave containers for high performance computing workloads? And what are the advantages of Wave? So Wave containers, they are not a different type of technology, like Docker containers or Singularity containers or Podman containers. Wave is a functionality that adds to disguise. So currently, Wave supports Singularity, supports Docker, I think Podman. So it's a new technology. It was released last year in October in the previous Nexel Summit. So there's still a way to go, but it already supports, I think, Podman, Singularity and Indogram, I'm not sure, which means that it's not a specific different type of container. You just work with these container images you have with these technologies and Wave is going to add functionality. It's going to make it easier for you. So it's not different. And the advantages for workloads, well, the augmentation is a very interesting one. If you want to build containers easily, if you want to augment containers easily, Wave is going to help you a lot. And because it's built by the architecture, let's say you have a cluster, I think it's Gravitron on AWS that you have ARM. And I think the 10th, one of the top 100 supercomputers in the world right now, it's using ARM, which is a different architecture, right? So if in your HPC you don't have Intel and the container image was built by Intel, I mean, you can build your own Wave and it will allow you to be target at this platform and then there will no emulation will be required and it will be faster. Or the other way around, your HPC is a regular Intel architecture, HPC, but the container image that Adam built, for example, is an ARM. So that will require emulation. So you don't need that. You don't want that. So you can build automatically with Wave, your container image specifically for your architecture, which is going to make it faster. Not only faster, sometimes it hangs. Emulating containers is not an easy thing. That's why people don't usually run pipelines on macOS because even though it works, there's the emulation, it's not only takes longer, sometimes it hangs, it's very unpredictable. So you should always have containers for the target platform that you're using. So with Wave, that's it. It's like the flexibility that you can focus on your pipeline logic instead of having to fight with creating containers and all these things, documentation, that's the thing. There was kind of a follow up question. It's part two of the question by Simon specifically focusing on the fly ability. So he's asking, how is the local container construction hampering the reproducibility aspect? Okay. So the thing is, indeed, it takes a while to build containers. And if you're using Konda, it takes a bit longer. I think there's some cash so that Wave won't build something that already built. You have the thing about layers. So you can cash layers even if the whole pipeline is not the same that has already been built. And you don't rely on Wave necessarily. So you can use Wave to build, but you can ask Wave to, after building the image, pushing it to your personal container registry. It could be a public one like Docker Hub or a private container registry that you have. So you don't depend on Wave. After Wave builds that for you, it can push your container registry and then you can use in your pipeline. So there's no unnecessarily revealing, right? There's this cash. It's more of the question about everything. Yeah, there is more questions. It seems that a local rebuilding of a container or a Konda environment is very expensive, given that each user will build their own platform dependent version of it instead of sharing one pre-built container. Yeah, that's what the one that was answering actually, sorry. So there's the thing. There's the cash. So you won't be rebuilding multiple times, right? And if you have like the same image, I mean, you can share pre-built images if it's what people want. I mean, if there's Intel image with everything I want, I won't use Wave. I would just use this container image. But if I want to change something, if I want to rebuild it, if I want to augment it, then Wave is going to help me. So Wave is not supposed to replace everything regarding containers. It's just to be useful when you have to fight with them. If you have this biocontainer image, you just add to the container process directive in your pipeline and that's great. That's the usual case, right? But if you need to rebuild it, if you need to rebuild it to a different platform, if you want to change something, if you want to augment that, then Wave is very useful for you. If you don't have a container image that already exists, sometimes you have some Konda packages that in biocontainers you have a container for every Konda recipe. But what if you want two Konda packages in a very specific version and there is no biocontainer for that? Biocontainer container image for that. What do you do? You can use Wave to build that for you so that you don't have to write your own Docker file or open a PR for someone to build that and so on. Good. Then moving on to the next question, Matthias is asking, can Wave also build signed and encrypted containers suitable for confidential computing on Kubernetes? I'm not sure, Matthias. That's one thing. I think so, but I won't say that because I'm not sure. So every question you guys may have in the future and even this one that I couldn't properly answer, there's the Wave-containers channel on the next flow slack and it's a great place to ask these questions. So this one about Kubernetes, I don't know specifically, I'm sorry. Then the last question in the chat from Jung, can Wave be used independently from NextFlow? Yeah. So you have a command line interface called WaveLit. It's on GitHub. Wave-L-L-I-T. I'm going to try it because we always think it's WaveLit, but it's WaveLit. So this command line interface, yeah, great Manish, thanks. So by using this command line interface, WaveLit, you can use Wave independent from NextFlow. And that's the idea. Even though NextFlow has a very nice integration with Wave through the NF-Wave plugin, you're not tied to NextFlow to use Wave. You can use Wave manually. I mean, you can do what NextFlow is doing on the behind, right? So you can use Wave without having to use NextFlow. Yes. Then I also have a question. Is Wave possible to be used free or is there a fee? It's free for now. It's free for now. Using Wave, of course, it's free. You have the Wave backend, but it's free. Okay, cool. Are there any more questions from the audience that have not been asked and answered? So just one thing, last year in October, in the NextFlow summit, we had the release of Wave, as I said. It was very nice because Paolo put me in charge of pressing the merge button. So during his talk, I pressed the merge button and then Wave was public, right? So it was a very great summit. It was very, very nice. We heard of Wave, Fusion and so many things. And in October this year, we have again the NextFlow summit in Barcelona, right? And this year, we're also going to have a Boston in November. So I'm not going to say anything. I'm just going to say that come to the NextFlow summit and some new things are going to be shared. So just saying. Oh, there is one more question. Do you have any docs about encrypted containers? I don't. I'm sorry. Maybe it was for Matthias. Yes. Okay. If there are no more questions from the audience at this point, then I would like to thank you for the talk and this very nice discussion at the end. I would also like to thank the audience. And of course, as usual, the Jan Zuckerberg initiative for funding our bite-sized talks. Thank you, everyone. Bye-bye, everyone.