 are recording and we'll be sharing the presentations individually. If you are a tweeter, feel free to use the R-Cost 2021 hashtag. We are also very welcoming of little people and pets. It's been a couple of years now, we're used to these kinds of interruptions by now. And we very much invite you to contribute and have fun during this event. So a quick recap of the program. The symposium today is divided into four sections, four hours. The first section will feature a couple of local Australian presentations. Then we have a break. In the second hour, we have an international speaker from the NVIDIA Corporation. And the third hour will be an opportunity to have some in-depth discussions. And this will be done using breakout rooms. So we have some topics there that we have and somebody will be facilitating each one of these discussions. You'll be able to join one of the breakout rooms and be a part of that discussion. If you would like to talk about something else entirely and you are willing to lead a discussion on that, I mean, you don't have to be the expert on the topic. You might be wanting to know more about something yourself. Please let me know. And we can add you to the list of topics or add that to the list of topics to be discussed. And then in the final hour, we have our second international speaker. This speaker is based in Amsterdam and it will be quite early for him. So hopefully he'll have had plenty of coffee by then. I imagine he's still asleep. And yes, apologies for the incorrect Zoom link, a bit of a gremlin. I should have known better, but I did send out the correct Zoom link eventually. So thanks for making it here. So I will now stop sharing my screen. And Stefan, you can get yourself set up. Our first speaker today is Dr. Stefan Boleman from the University of Queensland. He will be giving us a, well, as the shared screen says, behind the scenes look of at NeuroDesk project that you've been working on for quite some time now, haven't you, Stefan? Yeah, that's correct. Yeah, we started with a hackathon at the OHBM 2020. And yeah, so we were basically involved with ARCOS from the beginning on because we use a lot of containers. And today in this talk, I would like to show a little bit what we built there. And I hope some of the things that we learned are useful for ours as well. Before I get started, I would like to acknowledge the traditional owners of the land for us in St. Lucia here. These are the tourable and the Jaguar people. And I also would like to acknowledge all the people that are actually working in this project. So although I'm presenting this today, I'm actually presenting on behalf of a lot of people. It's an open source project started at OHBM 2020. And we're supported by lots of organizations. So for example, the ARDC supports us because we're one of the platform projects that were funded in 2020. So very lucky that we managed to convince people that this is a really cool idea and it should be supported. We're also supported by Oracle Cloud, which actually sponsors all the cloud credits that currently host the whole infrastructure in the rest of the world. In Australia, we are supported by Nectar, which hosts our infrastructure in Australia. The National Imaging Facility supports us with manpower. And contributes a lot of in-kind contribution there. And of course, the ARCOS, where we learn a lot what containers are, how we can use them. And I want to highlight one particular person, Ashwin Narayanan. He's a National Imaging Facility Fellow. And he contributed a lot of the code that I'm actually showing off today. So a lot of this is actually not made by me. So I'm always saying, I'm just a researcher. I just want to solve problems. And Ashwin is actually a software developer. And he put a lot of these things together. So yeah, with this introduction, I don't have time today to say why we actually build NeuroDesk. So for this, there is a really cool YouTube video produced by the same guy that you see there. And it's from the e-research conference, which explains a little bit why we actually did all of this. But today, I just want to start with a very quick why NeuroDesk in a nutshell. The problem is that our neuroimaging research tools, they all basically require Linux. And this is a problem because not everyone has access to Linux. Even if people get access to Linux, these tools are not available in standard package systems. So apt, yum, all these tools can't be installed via these standard things. Compiling from source is unfortunately a nightmare, as many people probably know. And it's unfeasible for a lot of researchers because it's just too hard. Then we also have conflicting dependencies, which means that we can't easily run the same software versions or different software versions of the same software next to each other. Reinstalling these tools on different computing platforms. Sorry, there's a timer on my slide. I'm probably quickly deactivate that. Just where is this? I don't know where this comes from. Clear all timings. OK, this is it. OK. When we go and move these tools, we have to reinstall these things on different computing platforms. This takes a lot of time. And it's not what a researcher should do. Also, we have differing versions between or different results between software versions, which is really annoying for reproducibility. So this is why we did it. Now, what I want to achieve today. So first thing is I would like to motivate people to join the project, of course, because it's an open source project. It's really cool. And I think people can can really help us a lot, as you will see. I also want to show a little bit what we learned during this last year of building this, because we basically had a proof of concept that we that we regularly threw away and rebuild everything from scratch. So we have been doing now a couple of iterations of this. And I hope some of our lessons are useful for others. I also would like to get input on how to do things better, because, as I said, most of us are not software developers. We're just researchers who want to get stuff done. So if you know how to do things better, let us know. So before we start with what we've built, I actually show what it looks like. So this is a Docker container that the users run. And this Docker container contains a full Ubuntu Linux desktop. And currently, this thing is starting up. And then users get a link. And this link they can post, can paste in the browser. And they get a full desktop environment running with Apache Guacamole, as some people might see. And then we have all the applications that people might use installed in there. We also link data directories. So here you see I opened a data directory on my local Windows computer and I'm dragging in now a file that I would like to use in this environment. And then it appears in this linked folder. So that's just simply an amount through to Docker. And now users can use any tool that we provide, which these are the tools that are hard to install. So here, for example, I just grabbed a visualization tool. And this tool now gets live mounted from a CVMFS mount. And I will show all of this, how this works in the end. But the cool thing is with this, we actually keep the container very lightweight. And now we launched that tool live and we can do our image processing work. So here in this case, I just opened the file and then people would start, for example, manually annotating things or checking the quality of this. And you see the graphical performance is quite nice. It's all very fast. And yeah, and that's basically what we built. So now what is this architecture and why did we do this? So we started from an open source tool, which is called NeuroDocker. NeuroDocker is a community that provides Docker recipes to build your imaging containers. Because it's actually quite hard to figure out how to install off this software. And I will show a little bit later how we use all of this. Then we have NeuroContainers, which builds these container recipes automatically on GitHub with GitHub Actions. Then we build a tool around it to make it easier to use these containers because our users don't want to type Singularity XX, Singularity Shell. They just want to run the command, right? As I said, we're just researchers and just want to get stuff done. Then we build NeuroCommand around this, which is a tool that brings multiple containers together. And then on top of this Neuro Desktop that you just saw in the demo, which then brings all of this nicely together in a usable interface. And then we have multiple entry points for people to use this because we are heavily based on the fair principles. So it should be all reusable, what we build and interoperable. So here the NeuroContainers, for example, they can directly be used by other developers and integrate in their tools. The NeuroCommand tool can run on, for example, high-performance computing systems or at the center of advanced imaging where I work. We run this on our Linux workstations where researchers use these packages. And basically what I show today, all originated from an early prototype that we used in the center of advanced imaging for a couple of years because we had exactly that problem. We needed people to have access to different versions of software without having this big hassle of reinstalling the software all the time. So it was really our own problem that we tried to solve there. We put it together in this project. Then the Neuro Desktop is, yeah, as I said, this is the entry point for the normal users, for the normal researchers, which runs on Windows, Mac, and Linux. So we need an operating system that supports Docker. And whatever supports Docker we can run on. Okay, so now, today I'll go through all of these things and I'll show what we did there and why we did it and what we learned. So NeuroDocker, as I said earlier, it's a Docker singularity recipe generator for neuroimaging software. So what it does is it takes a very simple syntax. So NeuroDocker generate, you define a base image, you say what package manager you have in there. And then you can say, I want the software called ENDS in a certain version and then output this into a Docker file. So these four lines generate this Docker file on the right. And as you can see, there is a lot of complexity involved in most of you probably have built Docker files before, right? So you have these run commands. You see, for example, one hack that you have to do with when you build Docker images, you have to link different commands with these end end and the backslash in the end because if you remove things and you don't do this in the same linked command, you end up with additional stuff in the layer that will then just inflate your images. So they will make them larger than they have to be. So there are a lot of tricks that you can do. For example, also here, this curl command where we directly pipe that to tie and directly unzip it. So these little hacks are quite useful if you don't want to end up with a 20 gigabyte image. And that really makes our life a lot easier. So the reason why we use it is because writing efficient Docker files is very tedious. So it's chaining commands, cleaning up and also figuring out how to install this neuro-injuring software is a lot of work. So I basically said, yeah, it needs a whole village to solve some of these dependency issues. And we were lucky because neuro Docker has already a big user base. There are 232 stars on GitHub, 82 forks of this repository. It's a long existing project. It's funded by a repro NIM in the United States. And we work closely with these guys mainly we use their recipes, but we also contribute back if we find improvements, for example. So here I just have a screenshot where we recently put back a pull request to them that they also merged. And we're basically one of these 82 forks. And we really rely on that community to bring this actual software recipes to us. Okay, now we move one level higher. So now we have these recipes. And now we built them automatically. And what I didn't show in the beginning, when you go on the start menu and click on neurodesk all of these categories in here contain many, many tools. And every one of these little tools, for example, ITK, SNAP or GIMP are all separate singularity sub containers. So we basically have a Docker container that runs this whole desktop. And then every application is one singularity container. Why did we do this? We solve dependency issues with this because as you can imagine, we can't install different software versions easily next to each other because some software just needs Ubuntu 16.04 or it needs 18.04, it doesn't work anymore in 20.04. So it's really, really annoying with these dependencies. What we can also do is we can use legacy operating system. So for example, we have software that only runs in Ubuntu 16.04. No one today would run Ubuntu 16.04 in production because it's out of service, it's out of life. No updates anymore. But because we isolated it in a singularity container, we can run it and because we all put it into a modern Ubuntu box, we believe that this is actually quite secure. And also a singularity container has minimum privileges. So we are quite, yeah, we're quite sure that with this we can actually do this without compromising on security. We also chose the singularity because it supports graphical user interfaces very well. As I showed earlier, most of our workflows are visual. High performance computing workflows, they do exist and lots of people use them. But I would say in neuroimaging, a lot of the work is actually looking at the data, creating plots and going iteratively between these things. So we need graphical user interfaces to work very well. And then all of these containers are downloaded on demand and I will show how we do all this magic later. But they're basically not installed. So this whole menu is basically just a dummy. And when people click on IDK snap, then actually the magic happens and it gets downloaded. So how do we build these containers? And I try to link also always these actual examples. So if this is something that could be interesting for your project, really feel free to look at what we did there and also ask us. And we're very happy to help. But what we basically did is we used heavily GitHub actions because what we wanted to do is we wanted that everyone can contribute to this project very easily on GitHub and everything is automatically built, tested and deployed. So every one of our imaging application containers has a .yaml file as a workflow, GitHub workflow. And these YAML files are just defined what a runner should do. And there we generate the recipe with neuro Docker in a shell script. And then that gets built by Docker and then deployed to Docker Hub and the GitHub container registry. One thing that we actually ran into very quickly is that these runners on GitHub, they have very limited disk space because GitHub installs pretty much everything that any developer would want. So .NET framework, lots of Docker containers are in there. So we actually couldn't build our containers in there because our containers are like 10 to 20 gigabyte in size. So there is luckily, if you run into this problem there's a really cool maximize build space GitHub action that we found very useful. That's basically now in almost all of our recipes where we first wiped the whole container, remove everything that's in that container. And then we move the Docker as well around. That was a hack that we needed to get enough space to actually build these big Docker containers and then we can run. So I really think GitHub actions are cool. So what we learned there is GitHub action runners can be too small but cleanup actions help. We also found that the Docker Hub was pretty useful in the beginning when we started but they introduced more and more rate limiting and it's now to a point where it's quite unusable. So we use the GitHub container registry as well. We use Docker Hub as well. We put our containers there but they can't be pulled efficiently. And we hope they are for the R-Cos registry. So I hope we hear more about this later. And I'm a bit involved because I'm a user that really needs this thing to work hopefully soon. And I believe there are many, many more in Australia who would like to see this working. The running singularity containers within Docker containers. So this sounds like a Christopher Nolan movie but it actually works very well. So if you ever have this idea that this might be tricky also but it's actually quite nice. Singularity can run with quite low privileges and we can nicely run this in a Docker container. And with this we manage this whole dependency conflict. So that was something that people said in the beginning oh this probably doesn't work but surprisingly well. Okay, now we're at the next layer. And this we called transparent singularity. So this is one of these projects that existed many years before this whole neuro desk architecture was conceived of. And that was something we used at the center of advanced imaging. And what is this thing? So the problem is that users are familiar with running the application commands directly. So also workflow systems, they just run a certain command or scripts. So basically what users want to run is they want to type FSLMARFS and then they want the program FSLMARFS. But what we do is when we have this in a container people have to run Singularity exec then give them a current working directory then tell them which container you want to run and then tell them which application in there. So if you show this to a user they will run out of the room. They will say this is nonsense, right? I will not use this. And yeah, because it's annoying these paths might change. It's way too long. All scripts look horrible if you use this. So what we built is we built an automatically an automatic wrapper script generator. So basically we go through all the applications inside the container and we write the wrapper script that we call exactly like the binary inside the container. So now we have a wrapper script that's called FSLMARFS which is exactly named as the actual binary. We set up the working directory and then we put the Singularity exec command and we generate this correctly with the paths and then we pass in all the parameters that the user would give to this command. So basically this wrapper script behaves exactly like this original one but it now runs in a container. So the advantage is that users didn't even realize now that they're running in a container. So we did this pretty early on and it's always a good thing if people don't complain and actually everything works. So that's why you call it transparent Singularity. So we put everything in containers and people didn't even notice and that was quite a success. So then we needed to solve another problem because now we need to combine, as I said, every one of our tools is in a separate Singularity container. So now we needed to combine these different containers. And for this we use the LMOD module system that we already used on our workstations and on the clusters we were using. So what is this? So here we basically write a module file per application package and then we can load that. So we can, for example, module load FSL and then here we see it's loaded and we can also load another package which is called FreeServer. So we'll module load FreeServer as well and then we have these two modules loaded. Now, when a user says, okay, which preview am I running? You see, okay, we're running this wrapper script. So this preview is this wrapper script that then points to the application inside that FreeServer container with everything, every paths, everything set up correctly. So when a user runs preview, they run in this container. When a user runs FSL maps, they run in this container. And with this, users can manage their dependencies perfectly. So they can say, I want FSL 6.0.3 and they can combine that with any version of FreeServer and things are not conflicting and things are 100% reproducible because they are forced to use this module load to have these things on the path. So if they use these in the scripts, these environments are perfectly reproducible and it doesn't matter where they run. They will produce the same outcome. Okay, so what we learned there, wrapper scripts work very well for Singularity and our users continued using their old scripts and they haven't even noticed that they're running inside containers. So I still get sometimes an email asking me, oh, how does this whole container thing work? And I'm saying, oh, you're actually using containers already since four years. You just haven't noticed on the workstations. So that's quite cool. The Elmord system is really useful for combining these tools from different containers. And with this, we have no dependency conflicts. We have full isolation between tools and the combination of tools from different containers becomes very straightforward and transparent to the user. Thanks everyone. Okay, now, next thing, Neuro Command. So Neuro Command now is another wrapper that we wrote to combine all these different containers and now install them in an actual Linux system. So what is this project or sub-project? So here we define which application containers are available. We convert Docker containers to Singularity containers and upload them to object storage and we unpack Singularity containers and store them on a special file system which is called CVMFS. It's developed at CERN and it's a web-based file system that allows us to load these containers on demand. So I'll show how these things work. So here we went very simple. So we have a JSON description file where we say these are the applications that exist. So they have a software version and they also have a date version because as you know, builds are not reproducible. It depends on the day when we build this software what dependencies will be in there, right? Because when we run an update Ubuntu and apt-get then we will get different packages and different system libraries. So the actual date of when the thing was built is crucial to reproduce this whole setup. So that's why this is underlying but again, we hide this for the user. So the user only sees the using and 2.3.4 and our system internally knows which exact build date was built. And we also had the case where we actually had bugs in these containers. We had to update them and then users get a warning that the container was deprecated and a container is now in place. And then we give a category and these categories then control our menu. And then based on this JSON file we generate this whole menu system and we control all the GitHub actions that then automatically run builds from this. So that's the next step. Convert these Docker containers to Singularity. How do we do this? Quite simple. Well, first why do we do this? Because the build tools and caching of Docker is actually really good. Some people say why you're not building Singularity containers directly? And the reason is Singularity doesn't have caching. It doesn't have good tooling for building these things on GitHub. So it's all a little bit not that great. There's, for example, no security scans for Singularity containers. So Docker just has a better ecosystem. But the problem is Docker is a pain if you want graphical user interfaces, which is what we needed. So that's why we use... Can that person mute? Okay, I'll mute it then. Cool, okay. So, yep, so, okay. So GUIS, and they work very well in Singularity. So what we now do is we simply have a GitHub action that grabs our Docker container that we build. And that Docker container then gets built into a Singularity image. So then we had another problem. The problem was that when we started this project, the last Singularity registry that was open actually was just about to die. And the problem still exists. There is currently no nice Singularity registry. There are commercial options, but there is nothing good right now that I know. So if anyone knows something, let me know. And what we currently do is we build our own registry based on simple object storage and a couple of scripts. So we basically built this image and then we just upload that to an object storage location on the Oracle Cloud. And then we mirror these things. And that's something I learned recently. ArcLone is really good if you want to mirror different object storages of different clouds. Because ArcLone, you can just set up... Because, yeah, as you know, S3 is not S3. It's only S3 on Amazon. All our providers have slightly different implementations that can be quite annoying. But ArcLone is really good. It handles all these different ways of how object storage is presented. And with this, we can then synchronize our first, our primary bucket to Australia, into the Nectar Cloud and then also into the Oracle Cloud in Sydney. With this, we get a worldwide distribution of these images. And then we download them using Araya2. Araya2 was also a tool that I wasn't aware of, but it's really cool because it can download from multiple HTTP endpoints at the same time. So with this, we basically do a load balancing on the client side. So we point our users to four buckets at the same time. And wherever the images pull fastest from, they end up in Araya2 manages everything. So with this, we don't have to do any fancy load balance or setup or mirroring of this whole thing. We just have these mirror locations and we just pull from all of them at the same time. And even if one cloud is down, which happens apparently, then the other cloud just takes over and with this, we're basically multi-cloud with a very, very, very simple setup. So as you see, I think that comes back to our thing. We are researchers, we just need to get stuff working. I know this is not what a developer would do, but that's what we did. Then finally, we unpack these singularity containers and store them on CBMFS. So that was something that we had a problem quite early on. We had these nowadays 10 gigabyte containers and people tried to download them. And I had a user in rural Australia with a very, very bad internet connection because that was just during COVID lockdowns. So people tried to do their work at home and they said, well, I can't finish your container downloads. My internet connection is not stable enough. So we said, okay, let's see if we can do something else. So then we looked around what people are doing and we found what the CERN is doing. And there was a really cool workshop series. I think even there were some EGI guys in there as well that present later where they said, oh, we use CBMFS to distribute software across HPCs. So I thought, well, if it's good enough for CERN and HPCs in Europe, it might be good enough for us as well. So we just played with it, we just set it up and we just used a very simple proof of concept setup in the beginning with one stratum zero server. We have three stratum one servers in the US and Europe and in Australia. That's hosted on the Oracle Cloud. Then a simple GUIP service that picks the closest one and then we distribute this to our laptops, desktop computers and high-performance computing systems. Of course, if it's on an HPC, it makes a lot of sense to have a local script proxy. Otherwise, if 10,000 nodes ask our stratum one in the US, the stratum one might say, I'm not available for the next five minutes. So that's something we built and tested and then with this, users get these unpacked Singularity containers and they can directly run Singularity shell exit directly on that CVMFS file system. And I was surprised first, how easy the setup actually is. So everything there that we did there is documented. If you go to Neurodesk GitHub IO developer CVMFS, we wrote everything down that we did there. And so it's just a couple of lines and the packages installed nicely and the setup was straightforward. And yeah, and it just works. Since then. So we basically the proof of concept, the proof of concept worked very well. When we then got, yeah, and then we just rebuilt the whole proof of concept a little bit bigger, took security more into account and that's currently our production system. So it only took one iteration to get this nicely working. So what we learned there, a Singularity registry would be nice but current object storage works. So I'm hoping for the ARCOS group to do something there. CVMFS is great and fast. If you have this problem that you want to distribute software to multiple machines, I think NFS and trying weird software mounts and cross mounting is not so cool, but CVMFA actually works. With this, we currently distribute more than 200 gigabyte of software on demand in our lightweight desktop container. And Docker Hub used to be good, but now it's rate-limbing our container ports. So for example, because of this, we can't use the CVMFS duck tool which pulls Docker containers and unpacks them and we had to write our own tool that starts from our Singularity containers on object store because that was the only reliable way how we didn't get rate limited and we can't pay the amounts of money that Docker Hub is asking from us. So that basically also screams that we need a nice registry in Australia. So last item on the list and you saw that earlier in the demo. So now I show a little bit how we built that. So we use a standard Ubuntu 20.04. We use an LXDE desktop because it works very nice on remote environments. And it's a simple Docker file where we define all these things. And we use Tomcat, Guacamole, we build Singularity in there, CVMFS in there, Lmod Visual Studio Code to enable nice coding of these workflows. Git, Python, Julia, so basic development setups so that people can code and combine different containers because that's ultimately the goal of this whole project. And then again, you would have guessed it probably we use GitHub Actions to build this thing automatically as soon we have a daily build currently. So the build just runs every day at the build tests if all the CVMFS servers are online and up to date then it builds a new container and it deploys a daily version. So we do a daily push and that's pretty much it. So it's quite straightforward once we figured out how to do all of this. And again here, shout out to Eswin who actually built all of this. So there's, I haven't done anything there really. Good. What we learned there. Seamless copy and pasting between hosts and containers in the browser is important to people. So we had in the beginning, NoVNC and if people work with NoVNC it's really annoying because the clipboard integration we need to click on a button then the clipboard thing opens you need to paste in that clipboard thing and then it becomes available inside the desktop. And this really annoyed people and they said, this is broken. So we changed from NoVNC to Guacamole and Guacamole is actually has a proper clipboard integration and they support quite a few browsers as well. So that was quite nice. And also there were lots of other issues that we had with NoVNC where things just stuck shift keys and things like this. So I think NoVNC, it looks good but it didn't work for us. It might be the future but for us it wasn't. The neuro desktop container needs to be as lightweight as possible because that's where we found one bottleneck for example when we run on Mac computers Mac computers use a virtualization environment to run Docker and this virtualization is quite limited in terms of space and the resources they give it. So if you for example try to pull a 10 gigabyte container that thing just fails and goes belly up. So we are currently at 3.47 gigabyte for the desktop and we try to get this even smaller right now by even shifting more things out in separate subcontainers that are then on CDMFS. And one thing we learned there as well is that Docker caching can really speed up the builds because most of our base image doesn't change like the Guacamole setup that doesn't really update very often. So we reduced our daily build time from 21 minutes to only two minutes where we basically just have everything cached and then the last layer is just what are the latest applications that need to be present, put it in, test everything and push it out to the Docker Hub. So also there I think it really shows that Docker is actually, I really thought about this and it works well for big deployments like this. Okay, I think I have two minutes left. So I'll just quickly show the roadmap. What do we want to do and what are we working on right now? So of course we want more neuroimaging containers. So there we're working closely together with the community to provide more containers. Some people criticize us that we don't build all our containers ourselves but I think that is a big strength. So we actually want that everyone can contribute containers and I don't think it's a security problem because we run these containers unprivileged and I think if you can't run an unprivileged container on your computer or on your cluster system I think then you have a security problem. I think that's exactly the point of singularity containers. They shouldn't have any power to destroy your system. So you shouldn't have to trust what's in there. Support of GPUs. We would like to support deep learning workflows. We have done a little bit in the space already but we need more work there to get this nicely working. The Docker is quite nice for GPU pass through and also singularity supports a few things there but it's not yet in a point where it's really, really user-friendly. We would like to build a graphical user interface for managing these desktop containers because currently people have to paste this command line on a terminal, which is not so user-friendly. We want to build an open deployment for MyBinder where everyone can just run this in the browser. So people don't need Docker installed on their computer and we also want to run our own Kubernetes cluster to run our own MyBinder instance that then makes this available to people and to researchers and people are asking us, can you support M1 and ARM processors? Theoretically, yes, the desktop container is easy to port to ARM but all of our sub applications need to compile for ARM as well. And this is something that is quite a bit of work and I think we need to wait there for the communities actually to update their software as well. So that's why it's at the end. And there is a brain hack that I want to make you aware of from the November 29th to December 3rd. We're running a brain hack Australia, part of the global brain hack 2021 where we will work on all of these projects. So if you're interested in this, please join the hackathon and help us build this further and add more features. So with this, I think I'm perfect on time and I'll thank everyone involved in the project and I'm looking forward to question.