 So we would move to the next talk. And I'm delighted to introduce you today to Björn Grünning from, and he will introduce us to Bioconda and Biocontainers in his talk today. So welcome Björn. You should be able to unmute yourself and share your screen with the presentation. Does it work? And we are very much looking forward to hear your talk about Bioconda and Biocontainers. OK. Hey, all. I hope you can hear me. Yeah, so I was asked to talk a little bit about packages and containers. So kind of going a step back from workflows and talking about tools and how we can make sure that tools executed anywhere are actually reproducible. And also easily deployable. And if we start really from the beginning, I think the usual development is that you have a question, you develop code, or you develop a program, and then you try to deploy it. And this step in between developing code and deployment is what is the packaging problem since the beginning of the IT world? So the simplest packaging system that you can imagine is probably a tarball where we have, I don't know, 40 years back where we shipping around directories or tarballs and people needed to compile that on their own. But still, that's such a tarball you can't think about a packaging system. But as you can imagine, compiling these stuff, getting it to work on your computer and then maybe on your neighbor's computer is very, very challenging. So this problem is since the beginning of computers, we have this problem. How do we actually insert software on a given architecture on a given computer? And you probably know all of these different package managers around. So I kind of consider these entire C-Pan, PIP, Cran package managers as very language-specific, so more or less every language, every computing language that we know has its own package manager. And then there are these operating system-dependent package managers for the Red Hat operating systems for the Debian-based operating systems and Alpine and so on. They all have their own package managers as well. But actually for scientific research, we have kind of special needs. And there is no standard available which we could embrace. For example, in Debian, there is a Debian Met community and they try their best to package a lot of scientific packages for the Debian operating system. But of course, if you are bound to CentOS or some Red Hat system, yeah, you cannot use easily the Debian packages. So this is still a problem or this was still a problem. And a few years back, I think that's now 2016, we had still this problem that it was not easy to deploy software on the cloud, on HBC systems, and so on. And there was also no package manager available where you can think of, this is the right approach for scientific computing. So, I mean, as you know from your next-flow workflows, these programs are written in all different kind of languages. So we cannot just rely on PIP or C-Pen or whatever. It also needs to be operating system independent because we are forced to use HPC computers. We are forced to use cloud environments and they all have probably different operating systems installed. Ideally, you don't need root privileges to install it. So if you know these module systems on HPC environments, the normal procedure was to ask your administrator to install you this software as a module and then you needed to wait a few weeks or months until the software was deployed. And this is a no-go. I mean, you have seen in the talks previously that we have a huge turnaround in developing our tools and workflows and especially in a pandemic where you need to spin up a workflow in a week to actually process data. You cannot simply wait for installing a software on HPC. So you want to do it on your own and you want to do it in user space. And for reproducible reasons, you probably want to manage multiple versions of one given software. And I think what we have as a special requirement in bioinformatics, those packages needs to be very easy to maintain. Bioinformaticians are not strictly hardcore computer scientists that know how to compile and software efficiently or that can package software for different architectures and so on. So we need to have an easy to understandable language to maintain all these packages. And last but not least, and this is where we will focus in the second half of the talk, there is also something new was coming up in the last five, six years that was called containers. And you can also think about containers just as a different kind of packaging system. Containers have different requirements and has a few nice features. But essentially, if you have a Docker container, it's nothing else in the package that you install, that you run, and so on. And the question was, where is the community going? How can we address for the computational scientists for the bioinformaticians? How can we address this problem efficiently? And what was coming up in the year 2015, 2016 was a new package manager that was called Conda, which more or less addresses all our needs that we have previously defined. So we actually evaluated a lot of these package managers back at the time. So we also considered Brew, so Brew Linux, Home Brew, and so on, and also other package managers more for HPC, like EasyBuild, and so on. But finally, we decided to go for Conda for a few reasons. It's very easy to use. So potentially, or there is a potential that actually biologists can use it. It's so easy. It's, yeah, and it adheres to all our requirements. So it's language independent. It's fast, robust. You can install it on any operating system or you can use it on any operating system. It ships pre-compiled binary, so you don't need to compile it on your host system, which makes it very fast also and very reliable. And you can manage multiple versions of a particular tool, which is for reproducible a key argument. And Conda is also kind of easy to write. I mean, we saw the rise from the 2010 on that YAML is more or less dominating the field. And you can write a Conda package in YAML to a very large degree. So you define more or less the table where you get the source code. You define build requirements, runtime requirements, a few other metadata elements if you like. You can have tests as part of your package, and then you pass it on to a continuous integration system. It will build your package and everything will hopefully work. So this is how you build packages in Conda. And back in 2016, we actually created a community that is called BioConda that is specifically meant for bioinformatic packages. We have nowadays a little bit more like cam informatics and so on in BioConda. But to a large degree, BioConda is still a bioinformatic package archive where we created a community exactly around that. So creating packages for all the tools that you are using, for example, in your pipelines and making these builds reproducible. There's also a different community that is called Conda Forge. And we are working with them together. Actually, BioConda is depending on Conda Forge. Conda Forge is maintaining more or less the entire basic computational stack. So they are maintaining the Ruby, the Python packages, the entire Python ecosystem, all CRAN packages and so on. And BioConda is more specialized on the BioConductor packages, on spades, on all the nice utilities that we need to deploy for our workflows. If you're interested on BioConda, please visit the website. So we have good documentation, how you can get started, how you can use these packages, how you can build these packages for various languages. So we built for C++, Rust, Java, Perl, Python, Go. So all these different languages are supported and we built packages for them. Yeah, you have here a more or less very sophisticated contribution guide that you can use. And as I already said, this community is still growing. So we are currently maintaining more than 7,000 packages for you. But we also see that the growth of packages is kind of saturated. So it does not seem that there are many more highly used bioinformatic packages than, let's say, 8,000, 9,000. So this is kind of the number that we need to deal on bioinformatics, that what we estimate. And I think we have the most common used bioinformatic packages already here, packaged and maintained. So have a look at that if you care about it. From a community perspective, yeah, it's kind of a large community. And we have nearly 1,000 contributors and we merged in the last years more than 20,000 pull requests. You see it's an international effort. And what I'm in particular proud of is that this is more or it seems that Conda or Bioconda has established as the common base for all different kind of workflow managers. I mean, we all can disagree of different workflow managers, but it seems that at least the most common workflow managers have support for Conda packages or biocontainers in one or the other way. And this is, I think, a very good achievement that brought the entire community, I think, a step further because we can agree on the package manager level and at least here we can ensure that we are all reproducible and that we use the same binaries more or less on different arch gate textures, on different workflow managers across the globe. Yeah, but my talk is also called Biocontainers. And so let me talk a little bit about containers and the new cool kit in town. So probably Docker, if you are on an HPC environment, you're probably using more singularity, I assume, but also Rocket or the native container engine in Linux operating system, getting more traction recently. And the question is how can we actually combine both worlds? So how can we combine that you run your workflow in singularity containers on an HPC environment but maybe on Conda, on your developed big mission and in the cloud, maybe on Docker? So and how can we guarantee that no matter which execution engine you are actually using, that the same binary is included and that it's reproducible even across these different kind of technologies. So let's go back to our initial figure where we talked about packaging and let's assume we now agree that Conda, Bioconda and the Conda community is kind of the de facto standard at the given time point, who knows what's in five years. But let's assume we have these Conda packages and we are kind of happy, but now for reproducible reasons, for different other technical reasons, you want to create containers and you want to use containers. So what we are doing here is we have developed a system that now converts your Conda package into a container. And this approach is called layer donning so we don't use Docker files. This is a complete automatic process where we have a build environment and a runtime environment. And in the build environment, we do more or less a Conda install step. And everything that gets installed into the build environment as an additional layer is just transferred to the runtime engine. So if you have your runtime container, which is just a busy box with one megabyte and you compile SEM tools, for example, which is 10 to 11 megabyte, your final container is 12 megabyte in size. So the file size of the container is as minimal as you can get if you use this technology. And one design principle was really that we don't write Docker files. I consider Docker files as kind of reinventing a package manager again. So we have seen that many people started to do configure, make install in a Docker file to just create a container with SEM tools. I don't want to reinvent the wheel here again. So we invented, yeah, hopefully a smarter approach to create these containers. Another aspect of that is that we can completely automatize this entire setup. We can run it on CI systems. So essentially what you need to do, and this is why my introduction was so much about Conda, if you create a package for BioConda, you will get nowadays the container for free. And with container, I mean Docker, Singularity, Rocket, so whatever you need. And you don't even know that. So many people just contributing to BioConda because they want to have a Conda package. And then they are really surprised that, yeah, there's also a BioContainer now available for your package and so on. Because all of that is really completely automatized and part of our CI system on BioConda. So over the last years, we have created more than 40,000 containers for roughly 8,000 tools. And as I said, for these different container engines. So how does this automated container build works? And what are the advantages or what could be the advantages of the NextFlow community? So what I would like to have is that finally, we could specify just, I want to have SAM tools in this version and not the entire container pass. And NextFlow, for example, or other workflow engines could take this information, just the pure metadata, SAM tools, the version, and do a little bit of magic and figure out if a given container is actually available, if you can download it, which type of container you need? Do you need an ARM container? Do you need a whatever, x86 container? Do you need maybe singularity? Do you need Docker? But this should be part of the engine. So if your workflow engine or if you're executing the engine and these hard coded container purse, I think are limiting yourself too much. In contrast, if you have just the metadata and you let the engine tweak which container is best suited for a given architecture where you schedule the workflow, you gain way more flexibility. And this is what we are aiming here. So when you build a container, and this is what's happening on the CI infrastructure, but we have also command line utilities for that, you can simply say, build, SAM tools equals a specific version, run it, and what you will get is actually this container. So this will be automatically created, but you will also get the singularity stuff and so on. An interesting aspect here is that this name here, which is in bold, is now predictable because it's always the condo name and the condo version. So given this annotation, SAM tools 131, your workflow engine can now be smart enough to actually create this container name and just pull it down for you. So this is very easy to do. It gets a little bit more tricky if you consider containers with multiple tools inside. So let's assume you need a container with SAM tools and bad tools in parallel. One question that is still up in the air and everyone can define and best practices here, I guess, but you should really ask, are these multi-tool containers actually needed or can we decompose workflows in a smarter way? But let's assume you need to have a container with both packages inside. How do you name them? Now, you could name the container SAM tools dash bad tools, but you could also name them bad tools dash SAM tools. So it's not really unique and what we need to make these metadata idea working and the findability working is actually that we have a unified name scheme. So in the end, what we want is, we want to treat containers like catalysts, right? We don't want to handcraft containers. We don't really care about the name of the container. We just want to use a container that is deployed somewhere and run our workflows in it. Essentially, you as workflow developer, you should not care about the container. It should just magically work. All what you need to provide is what binaries needs to be there to run my step in my workflow. And this brings us to the big question of findability. So how does a workflow engine can find a given container for you given annotations? And what we developed here is, especially for these multi-tool containers, is a naming scheme that is very trivial in the end, but you don't need to search. You just retrieve or fail. So you don't need to go to the bio container's registry or you don't need to go to Docker Hub and search for an arbitrary container that maybe contains your binaries, maybe not because most of the containers are also just black boxes. You just give the annotation, SamTools131, and your workflow engine will just pull it down or fail in this step and give you hopefully a meaningful error message. We could also think about to build these containers on the fly, right? So other workflow managers have implemented that already that if you don't find your container, but you have your condo package, build your container on the fly, ship the container to the cluster and then execute it inside the container. And what we could also think about or what I will show you in a minute is that we can also build containers in advance, for example, by scraping GitHub and looking at all different workflows that are lying around on GitHub and then creating specialized containers for all of them in advance before they are executed. So to make a long story short, the same step that I showed you with the simple SamTools container is now also working for these multi-tool containers by simply normalizing the package namespace, hashing these names, the same we do for versions. And what you will get is kind of such a string here. So this is actually your container name. And yes, it's not readable. It's for a human being, it's completely useless, but the point is a little bit that we don't create these containers for human beings. We create them for workflow engines. And workflow engines, on the other hand, can just take this metadata, construct the hash and just pull the container down without looking up some APIs without doing some other magic. You just create the hash, you try to pull it down. If you fail, you can build your container locally on demand. And this works quite nicely for a few of our users at least. And this is kind of what also, yeah, is very special in bio containers that we try to have these predictable namespace for all your containers. Because then workflow engines can just make use of it without you as a developer worrying to find a container where now SAM tools and bad tools are inset and this container should be minimal in size and the correct versions and so on. So this is more or less the problem which we try to solve here. How can you create such a multi-tool containers? So as I said, the single-tool containers are created automatically for you via Bioconda. The multi-tool containers, you just need to create a pull request on GitHub in a special repo. You provide the string here with all your dependencies that you want to have in such a container. And as soon as someone merges a pull request, you will get your container. You can even go a step further. So we have also bots running around that are searching for workflows, for example. And if they find a workflow where no container is available. So for example, here again, our example of bad tools and SAM tools. And there was no container with this particular bad tools and this particular SAM tools version. So the bot has created this pull request and on merge, the new container was or is then available and can be used by your favorite workflow engine. So a few things about bots because these bots are everywhere nowadays. So even on the Bioconda level, we have now a bot that if a developer, so if Phil is releasing a new multi-QC version, there's actually a bot running around and creating new Bioconda pull requests that just needs to be merged. So even that process is kind of an automatized. We still require a community review, but essentially the pull request is created. Someone needs to review the pull request from the bot and merge it, it's in. If it's merged, the bot or the CI system is creating actually the container for you is pushing the container. There is a bot that updates tool descriptions or your workflows, if you think about that. Then there's another community review kind of reviewing your workflow updates, if you think about that in this way. If you actually have a step in your workflow which require multiple packages, there's a bot that creates these multi-package containers for you. And finally, there's a bot that triggers the workflow test for you. So I mean, all the system is a little bit scary if you think about it that we drove it so far that most of that is now automatized. But on the other hand, that's probably what we need. We want to care about science. We want to analyze data and we don't want to care about, yeah, how do I compile a package? How do I get sent to its insert on my cluster environment and so on? But also keep in mind that all these bots needs to be maintained and so on. It's hopefully less maintenance effort than if you don't have the bots. But the people that care about the bots is way less people than the other people that would create manual pull requests. So it's a shift in responsibilities which is probably not good. On the other hand, a lot of people are quite happy that this bot is keeping track of new versions for all software that we maintain. So essentially you can assume that nowadays for all high quality bioinformatics tools you have every version now as Conner package and as biocontainer. Yeah, who's using it? And as I said, this is kind of cool because all these different workflow engines or the most used ones are actually making use of that. And they are actually sharing code, at least from the ones I know that are written in Python. They share actually library codes to pull down these containers and so on. So I think that's really cool that we have a common ground, a common denominator to actually work here together and improve the packaging ecosystem. So the tool side of our ecosystem. Yeah, and with that, I would like to thank all the, I mean, all the different communities that have contributed in it, all these different funding agencies. There's a lot of infrastructure needed to keep that running. So, I mean, just think about 40,000 containers. What is that in size? And we replicate all these containers in different regions in the world. We have CDMFS repositories for all singularity containers, also distributed and mirrored across the world. So there are a lot of funding agencies included that try to back that up, but also make the download and availability of all these containers, hopefully super robust in the future. So even if NIH, whatever cuts funding that we have mirrors in Europe and Australia and so on. So to keep the ecosystem in our community still kicking. Okay, thanks a lot for listening. Happy to take any questions. Thank you very much, Bjorn, for this very interesting talk. We can also say for the NFCOR community, Bioconda and biocontainers are indispensable. So already for including tools in NFCOR pipelines, there is a guideline that they need to be added into Bioconda or any Conda channel. Otherwise we cannot use them inside the pipeline. And now moving to DSL2 and modules, biocontainers are getting more and more important. So we have a lot of questions for you. Thank you for joining us today. And we could get started by a question by Phil, which is, could you tell us a bit more about this mold tool? Is it a tool that you wrote or is it a general tool that you are using? So that's, yeah, so this is a tool that we developed here in Freiburg. It's, again, it's super simple. I mean, if you think about it, it's you have these both containers, these runtime engine, the build container and the run container. And what you are really doing is you're triggering the installation in the build container and you make sure that everything that gets added in addition gets copied to the runtime engine. The mold script or the mold technology actually works for different package managers. So for example, you could create for all Debian packages, Debian containers. The disadvantage of Debian is that if you trigger an up get install and you will get these new additional software installed, the runtime container needs also to be a Debian, a base Debian run time engine. So it's whatever 60 megabyte in size. So you will end up with bigger containers in the end, but mold is in principle from a technology standpoint able to use any package manager. And this was the main idea. I don't want to repeat package managing or package creation with Docker. So our initial idea was just take a package manager, just repackages and distribute it as container. And this is also the reason why we did it initially for Debian, for Alpine packages, but then considering that conda is actually going until Zlib with all these dependencies. Yeah, we base bio containers on conda packages, but the technology is completely independent of conda. And maybe I forgot to say that, I mean, I'm completely aware of disadvantages of conda and so on. And who knows if conda will survive the next five or 10 years. But this is not, I mean, I don't worry too much about it. The only thing that I worry is that our ecosystem and the things that we developed can be transferred to new technologies, right? So we have packages, we have package managers. So move them just or convert them just to new freaky technologies. And if this is Docker or Singularity or whatever comes next, I couldn't care less. I mean, this is not what I'm really, really motivated in, right? So I just want to make sure that whatever comes next, we don't start on the baseline again and make configure, make, make, install again and again. And this is wasting too many resources. So yes, Malt is independent. It's developed, it's a very small Python script. A little bit of Lua code to define different tasks, but essentially it's converting from a package manager to another package manager. And it can be easily replaced by fancier code. So there's really nothing special here in it and there's entire talk, actually. Can I ask a very quick follow-up just to clarify something? So correct me if I kind of got my understanding wrong here, but you've got this one megabyte kind of base image that you're building the new stuff into for the package. So presumably you don't have a whole operating system in that one megabyte. So what happens if my tool, which is going in from Condor needs some kind of underlying operating system commands? Where does that come from? Yeah, this is, I should paste and maybe hear the conversation with Paolo. This is exactly the point where I discussed it. I don't know, this is already two, three years ago. We see next-flow core developers that if a workflow engine assumes Unix command line tools in a container, that's probably a bad design decision. And this is where we kind of disagreed in the beginning. So because next-flow was, I'm not sure if it's still, I don't know, it's not still, but it was relying on a specific version of PS and other very bare metal command line tools, but they are still different. If you don't use the GNU PS or TILS, but if you use whatever the busy box or the BSD version of these command line utilities, you get a different behavior. So I think if your workflow engine depends on the GNU version of a particular tool, it should actually depend that or declare this dependency, right, where we are back to these multi-package containers. So nowadays you can just define whatever orc in this version and or set in this version, but actually D define that you are talking about the GNU set version and not the Mac OS BSD set version. But I think the next-flow developers fix that and this is not an issue anymore. But I'm also thinking, say for example, multi-QC wants to do, I don't know, some kind of base commands. Next-flow is a perfect use case because it's been a problem in the past, but presumably there are plenty of condo packages which do use underlying system libraries. So they'll have to start declaring those dependencies. So I mean, my take on that is the only thing that you can rely on is on bash. And everything else should be defined as dependency. So if you use set, if you use curl, if you use org, you should define this dependency because even org changes the syntax of a tool set. I mean, there are different set versions running around and you cannot even use the same set command on OS X and on Linux. So this is, and that's why I'm saying if you use that in multi-QC, you should probably define this dependency even if it will blow up your container by whatever 10, 20 megabytes. But this is, this is just good for reproducibility. So I know that other workflow engines really just say the only thing that you can assume is bash. And even that was debated for a long time. Thank you very much. Yeah, no problem. Yeah, thank you very much. And now that we are talking also about reproducibility and containers, we have a question by Harshel. So how do you, how do you ensure reproducibility with multiple builds of the same container if the tag is the same? You get, so if you see this container version, you will get here minus, minus one, minus, minus two, minus, minus three. So you have more or less the same that you have in condo, the build version. You have the same in the container as well. And I talked about these smart way of workflow engines to pull down the container. So it would be smart, I guess, that if you don't specify the build version in your metadata here, that you always retrieve the latest build version until you actually fix a specific build version. But I guess we are all going in the direction that only the latest build version is actually the one that you should use. Yeah. So this is part of the container name. Harshel, actually, we cannot hear you. You are muted. Thank you for the answer to the question. Okay, cool. And thank you for all this work again, amazing. No problem, no problem. Thank you very much for the clarification on this. Yeah, multi-tool container is something that a lot of people had questions about as well. And I was wondering, so you've shown now an example with two tools in that container. Where would be the limits here? Can you imagine also, you, sky's the limit? I mean, the only limits that I can imagine is limits on KIO and Docker Hub that your containers at some point so big that we cannot push it to them to these registries. Or let's simply put it, the restrictions of a Docker container. Otherwise, there are no restrictions. I mean, you can just a comma-separated list of all your tools that you want to include and then go for it. But again, I mean, my personal opinion is that this should not happen. We should create workflows that are decomposable, that EDLE has only whatever one tool in one step. I know this doesn't work always because whatever certain mappers create a SAM file and you probably want to convert it directly into BEM in the same step. But I think we should aim at minimal containers with minimal dependencies. Yeah. Thank you. Thank you very much for clarifying on this. And Harshel had another question as well. So are there any plans to build more standardized R or bioconductor images with multiple packages there or these would be handled in a similar way as these multi-tool containers? Not sure what you're... So bioconductors, so what we are doing... So bioconductors a little bit special again, as always. So we are actually building automatically all bioconductor packages on BioConda. So they release every six months. So a month later after they release, we most likely have most of their containers built, but these are only single package containers. So we don't put the ESAC2 with whatever a go seek together in one container. And the reason, I mean, I don't think it's biocondor problem and it should not be biocondor's concern. I mean, we have developed this multi, this URL here. We have developed this repository exactly for this purpose in the sense of if you need two bioconductor packages in one container, just create such a pull request here and then you get your container in 10 minutes. But again, I would say these should probably be two steps in a workflow, but yeah, if you need it, create it. So you see here that this is an example of the XCMSS package and they also need Color Brewer for whatever reason. So we have this container now. And the same can go for two bioconductor packages. Yeah, this often comes up about, I mean, even in our group as to how we maintain bioconductor within the group because often, and try and keep things reproducible because often you need multiple packages to run a single script loaded within that environment. And if those packages are in single containers then obviously you can't run that same script across multiple containers. So you need some way of having everything within the environment. And I think filling the group has gone with a condor approach. So we have a local condo installation which we try and update with packages if we need to. But I guess this is a common problem for a lot of people is to how to maintain bioconductor packages locally and also reproducibly and also not having to minimize the maintenance of it. Which is I guess is why I was asking if there's some sort of plans for even maybe coming for our bioconductor to build their own images for standardized packages within the same environment that you could just use without doing anything. But I guess this is an expansionally complex problem, right? I mean, if your arbitrary R script is depending on seven different packages, I mean, we cannot create a priori all different combinations of tools. So that's why I'm saying your R script should in the first place maybe be split it up into three different R scripts if possible. And if not possible, okay, then let's create a container for your group specifically and keep that container updated. And what I can just recommend is creating also these kind of bots, right? I mean, this bot is really, I mean, we have an allow list that contains a lot of GitHub repositories that we scrape. And if we find a workflow, if we find a tool, in your case, if you find an environment file for from your group, you just take it, create a container, and this is more or less a snapshot of your R environment in 2019. And you can treat it like that. I mean, again, everyone is free to use that. We are currently not in a stage where we have the storage problem. So I let keep that going and then see, yeah, how much terabytes we actually produce in the long run, but until now it doesn't seem to be a problem. So just use it in that way and it should work, hopefully. Oh, thank you. Yeah. Yeah, we were wondering then if all NFCore repositories could be also added to that to that white list. So let's do that in the first place. And then, I mean, I'm always a fan of let's try it and let's see how far we come and then yelling at people to give us more storage. I mean, this is what we... So this is what we are doing for the Galaxy community for all Galaxy tools and workflows. And these are now, whatever, 3,000. And yeah, I don't think it's a problem, not yet. It might be in the future, but let's wait for that time. Okay, so one last question regarding biocontainers. So Docker images are hosted currently on Quay.io and our Singularity images hosted online anywhere. Question by Phil. Yes, that's correct. What was the question? Sorry. So if Docker images from biocontainers are currently hosted on Quay.io and our Singularity images hosted anywhere? Yes, yes. So the beauty of Singularity images, it's a file. So you can actually share it via FTP R-Sync whatsoever. And this is what we are doing. So there's an FTP or a very simple step server where you can download all 40,000 images. But I mentioned it briefly. There's a technology that is called CVMFS. I pushed this idea already to the next flow developers a few years ago. This is kind of a read-only file system over HTTP. So you don't have any problems to get that into your cloud because it's really just speaking HTTP. And it's mounting a directory over HTTP read-only into your workspace. And you see immediately 40,000 images. But only if you access one image. So only if your next flow workflow is actually wanting to run this one image, it gets downloaded and cached. So the first time it's downloaded, the second time you run your workflow, it's already cached and just using it. So I would always recommend to use the CVMFS and working with our community together to kind of shape the ecosystem around the CVMFS mirror. For example, we also share on the CVMFS reference data. I'm not sure how next flow community is maintaining reference data, but we use the CVMFS idea for images, for reference data, for databases and so on. Yeah, and we just mounted in over HTTP. It's a fuse backed technology maintained by CERN. And this is actually what I would, at least the hardcore people and look at. But the most easiest way is obviously just go to the FTP and double, you'll get your singularity images. Super cool, thank you. Sorry, I dug it up actually while you were doing questions that I found. It's a Galaxy FTP somewhere, I guess that you're keeping. Yes, exactly. So this is hosted by the Galaxy community, but yeah. As long as we have funding for that. We've been looking for something similar for years and we started off using the Singularity Hub, but then we were hit by the usage requirements. And then we wanted to use Singularity IO, but they have, we can't automate that because you need to manually renew your key every 30 days or something. And I sort of given up. If we have more of our containers, Singularity images, that's great because, so the main thing we want that for is, actually for people running offline. So make it easy to grab the Singularity image on whatever little small laptop you're using. It doesn't have to have Singularity installed and then you can just pop it over. And if you can just download it from an FTP and sort of building it every time, that's really helpful. You could create a staging script or something similar, but just stages the containers that you need for workflow on your notebook and then can go offline, yeah? Yeah, exactly. Now, this is good to know about and I'll look up the other file system that you're talking about, something exciting. Yeah, just drop me an email. So we have an Ansible role that will insert your CVMFS and set everything up. So CVMFS is an awesome technology. You have some kind of geolocation also included. So it will actually take the mirror that is locally the nearest to you. And what we do in several European projects like Alexia and EOS is that we build now CVMFS mirrors in Europe, Australia and the US. So I think it's a neat way to solve the reference data problem, for example, but also the container problem in the long run. Very nice. Thank you very much. I still have some questions regarding Bioconda, the first part of the talk. So question more for beginners that we've had this situation sometimes in the NFCOR community in this, can anyone can publish a package on Bioconda or do you need to be the main developer of that tool? No, everyone can do that. I mean, you need to go to a review process but you can update Phil's package. So if someone reviews it and approves it, yeah. People often do because I'm too slow to do releases. Yeah, no, everyone can do that. But the nice point is that we have a lot of upstream developers now. So this is actually where you know that a project is kind of successful right, I mean, at the beginning you crowd source everything and everyone is building all packages that you need but you're not maintained. But if the upstream developers popping up and yeah, then you are in the second stage and that's pretty cool. But everyone can contribute and I warmly invite everyone to contribute here. Perfect, thank you very much. So there should be no excuse left, not what. And final question. So do you have any tips for people who suffer from long lasting environment building? Huh, depends on what you're doing. So if you deal with our packages include our base in your environment definition. So pin more or less our base to four or to three dot six. So this will help. There are some discussions of... So, okay, the short answer is there's an issue on Bioconder. With, I mean, the Bioconder issue we also use to communicate to our users and so on. And there's a description of what you can do to actually speed up environment builds. And there are some suggestions. So have a look at that. The long answer is a little bit, we are well aware of the problems. Condor Forge is more or less also driving in the direction of supporting multiple condor implementations. I mean, condor is more or less just one implementation of the metadata scheme and there are other ones like Mamba. And Mamba is way faster than condor currently. But I wouldn't recommend Mamba yet for beginners. And I personally would like to get the people involved in a table and talk to each other. This is the condor main developer. So that's actually the under condor company that is behind that. Talking to the Mamba developers, talking to Condor Forge and Bioconder and so on to figure out. I don't want to have the separation of whatever different condor implementations. I would like to that everyone works together and improving, but if it's a really urgent problem, look at the issue and try to consider using Mamba. And Mamba from the syntax is the same than condor. So you can just exchange condor with Mamba and then do your magic. It still works. And I think Snakemake, for example, goes a step further and actually now recommends Mamba for their users. Yeah. But at first, if it's our packages include our base, that's probably solving your problem already. Yeah. And we know that Perl packages are a nightmare. So we don't have a strong Perl community or anyone that is kind of old enough to understand the Perl packaging system. So if you have Perl packages in your environment, this could also be a reason that, yeah, we would need to invest more time in our Perl ecosystem, which we also didn't have the capacity and the main developers that we have are really not familiar with Perl anymore. So this is a problem. Yeah, sorry, no other answer, but have a look at the issue and I can probably put it in a chat afterwards. Yeah, thank you very much. I'm sure this will already help us a lot. And thank you in general for the talk that was really helpful to have this insight into biocondent and biocontainers and how to build these biocontainers and multiple containers. So thank you again for coming to NFCOR Hackathon. So with this, we will end up the session of Hackathon Talks today and we'll see you later today at half past four Central European Summer Time for the final Hackathon wrap.