 Good. So the question is on EasyBuild. We handled that. So the last part of the workshop is an introduction to the Easy Project. And that has sort of evolved from the EasyBuild community, not directly, but there's lots of people involved with EasyBuild, also involved in the Easy Project. This is really my favorite cartoon. So this shows we have some very good ideas here on what we could do in the Easy Project, what we have a day job and we need to keep the researchers happy with what we currently have. So we're like, we're almost too busy using square wheels, even though we know that there are proper wheels ahead of us. But okay, so this isn't, let's say, not progressing as fast as we would like it to, but we are actively working on it and we are making progress. So what is Easy? Easy is short for the European environment for scientific software installations. We pronounce it as easy. That's not an accident. There's a clear link with EasyBuild. And things are, this is supposed to make things easy, right? So that's why we gain that a bit. The European part, we often get questions on. So if I'm not in Europe, does that mean I cannot join? Also here in the UK, that's a relevant question. We could change that first E to another word that starts with an E. We actually considered that at some point. It's really just to game it and to have a project name that sounds like Easy. It was also done deliberately to some extent to get some European funding for this, which eventually worked out. So there are some reasons why we have the European in there, but there's definitely no limitation in terms of countries who can join or people who can use it. It's an open source project, just like EasyBuild is. What we want to do here is we want to build, work together to build a shared repository of scientific software that's optimized, like EasyBuild does, optimized for specific hosts. And it's a repository of installations itself, and that's an important detail. We're not doing software packages like RPMs. We're not doing recipes like easy config files that you don't have to install themselves. We're sharing the installations themselves. And that's an important detail, and we will show this when we go through the demos and the actual structure of how this is organized, what that means. The main goal we have for this is avoiding duplicate work. So even if people work together on EasyBuild, you still run into weird issues that you only see on your system. You have to figure those out. So even if somebody already wrote an easy config file that worked for them, you still have to go through that installation, which may take time, and it may fail because of specifics of your operating system or your setup. And that's what we want to get rid of. We want to start working together on the software stack itself rather than tools to get that in place. We also see this as a uniform way to providing software to users. So right now, even if the software is installed with EasyBuild across multiple systems, things are going to look a bit different. Maybe the module naming scheme is a bit different. The location where the modules are installed are different. Some systems are set up where the modules are available to use and to look directly. So in some other systems you have to do a module use first, and maybe they're using a hierarchical module naming scheme and you're not. So things get confusing and things are a little bit different and different enough that the researchers get confused. If we have a shared repository, which you can use in multiple locations, it looks the same everywhere. So that's already a big hurdle that's removed for the researchers. The goal of this project and that's a very ambitious goal, we fully realize that, is that the installations we provide should work on any Linux operating system, regardless of whether it's Ubuntu or Red Hat or whatever other variants or whatever version it is. We shouldn't really care about that. It also works in WSL, so the Windows subsystem for Linux. You can play with those installations there as well, which is really just another Linux environment, just like a VM you're in. And we're also considering supporting macOS. So that's different enough that's going to be another big effort. Right now we're not working on this, but the door is open to also start doing this and make that possible. So the goal is really to give you a set of software installations that work on your laptop, on your personal computer at work, to an HPC cluster and even in the cloud, AWS Azure, Oracle Google, whatever you want to use, it should work there as well. So that's mostly in terms of operating system, but also it should work on different types of CPUs, old generations of Intel, new generations of Intel, AMD, also ARM CPUs. Power, we're currently still playing a bit with Power 9, but we'll stop doing that because that's sort of a dead end. But in the future, Risk 5 is coming up as well as another CPU family, let's say, and we want to add support for that as well. So lots of operating system, lots of different types of CPUs, and then we're not looking at that interconnect in FiniBand or in the cloud, EFA and AWS, for example, different generations of Nvidia GPUs, but also AMD GPUs, Intel GPUs. Ideally, we want to support all of those as well. And that's very ambitious, but because we're working together and we're sharing the software installations themselves, we believe that actually becomes feasible. And we'll see how far we get with that in the next couple of years. So the focus of the project is very much on performance, where we want to use this for scientific software that's used on HPC systems. So performance is a very important aspect here. We'll have to automate as much as we can to make this feasible. So if we'll have people manually building software for all these types of system architectures, that's not going to work. So that needs to be fully automated. We want to also make sure that not just the installation work, but that the software actually runs and runs well, so that it functionally works, that it performs well. So we're going to be testing all of these things. And we'll have to collaborate together to make this possible. So there's a website, there's documentation on GitHub, and I'll show you the pilot setup, the proof of concept setup we have for this and explain how this works. This is just to zoom in on performance. I probably don't need to explain this in detail here, but this gives you an idea of what the impact may be if you're not being careful about the binary that you're running on a very capable system. So this graph shows you the performance that you get for Gromax. One of the brace benchmark inputs for Gromax. All the benchmarks or all these tests are run on the same system, an Intel Cascade Lake system, and all we're really doing here is using a different binary for the exact same version of Gromax. So we didn't touch the code at all. We're just basically building this with different compiler options and seeing how it performs different. If you build it with only SZ2 instructions, so it runs anywhere on any modern X86 hardware, we call it the generic binary, you get a performance of about, let's say, one simulated nanosecond per day. So that's a measure of how fast the simulation is. The larger that value, the better the performance is. If you start using AVX instructions as well in your binary, that goes up, and to the very end, we also use AVX 512 instructions, which are supported by Intel Cascade Lake and then you get way better performance. So performance actually goes up, but let's say like 70% by only using proper vector instructions for the binary. So the impact can be quite big. And Gromax is probably a pretty extreme example of that. Maybe we're more talking 10, 20%, but that's still significant, right? That could still make a big difference. So that's a very important point. So what do we have as major goals in the EZ project? Well, first of all, we want to avoid duplicate work, not only for the people who are installing the software, the HPC support teams, but also for the researchers. So we don't want to relearn the software stack when they jump from one system to another. Ideally, they can use the same software installations everywhere. Tools like Easy Build and SPAC already do some of that, but they're not really sufficient because we're automating the installation procedure. And if it works for you, that's good, but you're still doing that installation. And if it doesn't work, you still have to figure out why it doesn't work for you and try and fix that. So there's still lots of duplicate work here. We want to go way beyond just sharing the build recipes. We want to go towards sharing the software installations, the actual binaries themselves. We want to build a uniform software stack that runs, let's say, anywhere. So regardless if you're running in the cloud or running on your laptop or running on HPC cluster, you're basically using the same installations. It looks the same, it feels the same, you know how it works. And you don't need to lose time to see how all of that works. So mobility of compute is sometimes that you often hear about when talking about containers and condos. So you're just taking your software with you. We're doing sort of that as well. It's more like the software follows you automatically. And we do it in a way that we're aware of this performance issue, which in containers it's usually just people ignore this silently, just run one binary everywhere and assume that will be okay. It's often not. So you shouldn't be cutting that corner. And we have a better way of dealing with that. If we manage to do this, this will help with HPC training. We can spin up virtual slurm cluster in the cloud, train the scientists on this, make sure their lamp software, for example, is there, they can play with it. And then if they can get home, if they also have access to easy, they get the exact same installations on the laptop, they move to their institute cluster, they can also have the same binaries there and they basically know how things work. So making the jump from being trained to actually using it after the training becomes a lot smaller. We think this can also help with developers of scientific software. So let's say OpenFoam as an example. The OpenFoam developers probably know their own code base very well, but they need a bunch of dependencies and they don't want to go through the pain of having to install those dependencies or having to figure out how to build different versions of GCC or C-Lang so they can experiment with all these compilers. Ideally, these things are just available somewhere for them to experiment with. And also in NCI environments like GitHub Actions, for example, they can get easy access to a wide variety of compilers there. It's probably going to help them to also test their code with those compilers. So there's lots of things that become possible if you could get this to work. All right, so how is this project organized? It's really a layered structure, three main layers, a file system layer, a compatibility layer and a software layer. The file system layer is what's responsible for distributing the software installations we provide. And for this we rely on an existing project that was created at CERN, which is called CERN VMFS, CERN VM file system. This was built exactly to distribute large amounts of software across the world in a very easy way. And I'll explain what this does. It's a bit difficult to grasp what it's actually doing because it's so different from other types of distributing software, like building packages or downloading container images. But it is a very powerful concept. And we're very thankful that the CERN developers, the people at CERN who developed this have been sharing this as an open source project. The middle layer, the compatibility layer, is what we need to shield ourselves from the host operating system. So what we're basically doing there is building our own mini Linux environment. So we don't have to rely anymore on what the host operating system provides. The main part in there is Gilib C. So we have our own Gilib C in there. And then whenever we're running on Ubuntu or Fedora or CentOS Linux, we don't really care because we're not going to use the Gilib C in there or any of the other libraries to the extent that we can avoid that. There are technical reasons that we sometimes can't. InfiniBand drivers, GPU drivers, those will still need to come from the host because they're too tight to the kernel. So that we cannot avoid. But we'll have ways of detecting what's there and if needed installing missing stuff and we're playing with that actively already. So that's still feasible. On top of our Compat layer, we'll be installing the actual scientific applications, OpenFoam, TensorFlow, all these things. They will link to the libraries in the Compatibility layer and that way they can run anywhere because we're only relying on the host for the kernel essentially and whatever kernel drivers we cannot avoid. And that works and that works well. So this software layer is being installed with EasyBuild today. There's actually nothing specific in here that requires EasyBuild. We could use SPAC or other installation tools in this top layer as well as long as we can make sure that whatever is being built in here links directly to the Compatibility layer and not to the host. So we cannot pick up anything from the host otherwise it won't work as expected. Next to EasyBuild, we're using L-MOD. So we're generating environment modules along for these installations. So you can pick them up. And we're using an extra library here called ArchPEC which does detection of what type of CPU you have. So not only is it X86 or ARM, but also is it Intel or AMD and which generation of Intel or AMD is it? Is it one that supports AVX2 or AVX512 instructions or not? So this tiny library basically tells us what type of CPU we have and based on that we can pick installations that are optimized for that type of CPU. And the whole software layer is structured in such a way that it becomes very clear that this is happening. I'll show how this works. So that's the three main layers. We're also actively playing with reframe for testing. So reframe is a software testing regression testing tool created at CSCS in Switzerland. Like I said, we want to do a very good job at not only providing those installations but make sure they work. We're going to be testing those installations in Ubuntu, in CentOS, on ARM, on Intel, on AMD, and make sure that that all works. We'll do functional tests to make sure it runs, but also performance tests to make sure it performs well so that we're on the right end of that Gromach benchmark port. Today we're looking at Intel, AMD, ARM and Power 9. Power 9 is a dead-end. We'll stop wasting time on that. But we will very soon start looking at risk 5 as well. So there's big European projects that are actively looking into building accelerators that use the risk 5 instruction set but also very capable CPUs that use risk 5. So let's say five years from now, these will be very relevant. And we want to be ready for that. So when the capable CPUs are there, we'll basically have the software to run on them, which is usually the other way around happens. The CPUs are there and then people start figuring out how to build for it. We can actually prepare for that already and hit the ground running as soon as let's say the first risk 5 supercomputer is there. Okay, so that's a lot of information. Let's see how should I continue. I'll first explain a bit more what the file system layer does. So that's how we distribute the software that we provide in the EZ project. This is basically what CERN VMFS does. So this is not something we created, but it's to give you some idea what CERN VMFS provides us. So CERN VMFS allows you to build a file system, something like NFS that you can mount somewhere and it's always going to be read only for the clients. So the people who use the file system cannot make any changes at all in the files. They can only consume what is there. And it's mostly targeted towards software installations. Some people use it to distribute data files, data sets as well. That also works. What happens is you're creating what's called a CVMFS repository, so a file system in a central location, a central server, which is called a stratum zero. And you make sure there are enough mirrors of that stratum zero that have a full copy of the software stack available and they sync automatically with the stratum zero. So whenever software is being added to the repository, this can only be done here centrally and it just syncs up to everything around the world. You typically have multiple of these mirror servers to make it redundant. If this guy dies, everything is fine because these all have a copy. And as long as there's one mirror server somewhere, you can still continue and consume those software installations and use them. In here you could have, and we eventually will have thousands of installations. These have a full copy of that. And when you, as a friendly user here, start using that software, for example on an HPC cluster or on your local laptop or in the cloud, you'll basically be mounting from one of these mirror servers. You're talking to this guy. And if you fire up TensorFlow, you're here from a CVMFS point of view, you're a client computer. If you fire up TensorFlow, it's going to check, CVMFS is going to check on your local cache. Do I have that TensorFlow binary already? I don't. Okay. So I'll ask here in this caching layer, this proxy cache or squid cache, is TensorFlow here already? If not, it's going to ask the mirror server, please give me the TensorFlow binary. It's going to come back here. It's going to be copied into your cache. You have the binary and you can run it. That binary is probably going to need libraries. So the same story happens for the libraries. So what's basically happening, very simplified, is that you're streaming your software installations. It's like Netflix, when you say tonight, let's watch this movie, you click the movie, and it starts quickly downloading the first part of the movie so you can get started. It's the same thing here when you start running TensorFlow, it says, oh, I don't have that binary. I'll have to download it. It downloads it to your cache. You can run that binary. And there's a small startup delay, of course, because it has to do that. But the next time is going to be very quick because I have a local copy of that in there. So that's one thing. You have this streaming idea, and it's fully transparent to the end user. So if they start typing module avail, what is there? Module avail means I need to know what's in that directory. It will copy the metadata for that directory, which is very quick. You're loading the TensorFlow module. It needs the module file. It's going to download that. So all of this happens behind the scenes, and it feels like everything is local. Maybe with a small delay the first time you hit something. But other than that, you can't really tell that it's all streaming in the background. So CVMFS hides this all from you. And thanks to these multiple caching levels, you have cache here. You have the cache here, which could be in the network of your HPC cluster. You could even have your own full mirror server next to your HPC cluster to make sure to reduce that latency for downloading stuff. Everything is here. Even if this guy dies, it will just automatically migrate to the other one to download stuff from there. So as long as there's one somewhere around the world, everything will nicely keep working. If this guy dies, no problem. For a couple of hours, we cannot add new software, but everything else has a full copy. So it's all good. So it's all like this peer-to-peer network for streaming your software. The big advantage is wherever you are in here, in the cloud, on your laptop, as long as you mount this easy repository, the server MFS repository, you're getting the same software installations everywhere. So that's how we distribute stuff. But that's not enough, because if you build a binary for me on Ubuntu, and I'm on CentOS, it's hopeless, right? There's just no way that's going to work. So that's why we need the second step, the second layer. This is the compatibility layer. We construct this with Gen2 prefix. So Gen2 is a Linux distribution where you typically build everything from source. Gen2 has a subproject called prefix, because you can install your own Linux distribution in a prefix, in an installation directory that you choose yourself. When you're using CVMFS, everything has to go into slash CVMFS. So we need to be able to build binaries that work in that prefix. That's why we use Gen2 prefix for this. Again, the biggest one here is glibc. There's other libraries as well to figure out user names and things like this. But it's pretty minimal. So only the stuff that we really need is there. So we'll construct the compatibility layer once. For every CPU family we support. Once for ARM, once for X86, for now, once for Power as well, and eventually one for RISC 5, 64-bit as well. So you basically have four mini Linux installations that are included in your CVMFS repository. That way we shield ourselves from the operating system and we can build a binary in here that works on Ubuntu or CentOS or SUSE or WSL, whatever, any Linux distribution. So it looks like this in terms of the structure. All CVMFS repositories are in slash CVMFS. PilotEasyHPC.org is the name of our easy repository. In there we have a couple of versions because over time we'll actually do versioning across the compatibility layer as well. Every now and then we will rebuild this Linux environment to get a new glibc version. Maybe once a year or once every two years, whatever makes sense. And in there we have a directory for Linux because eventually we plan to support macOS as well. And we have a sub directory for the CPU family that you're using. I can show this interactively as well to get a better view on what this does. And then on top of this, we have the software layer. So this is where EasyBuild and Lmod kicks in, which we've been playing with for, let's see, the last day. So in here we will build our big scientific applications and all those dependencies that it needs. And they will link to the compatibility layer, glibc and whatever other libraries, not to the libraries provided by the host OS. And that way they will work as long as that compatibility layer is available. We're currently using EasyBuild for those installations. We could be using other tools as well. So anything that we can control to only use stuff from the Compat layer works. It's back, I think today cannot do this, but it could be enhanced to also make sure it doesn't go outside of this little box that it's supposed to stick into. And then ArchPack is used for detecting what type of CPU you have. A small part of that is which one of these three families are you using, but also way more specific than that. It's going to check if it's Intel, and if it's an Intel Haswell or an Intel Skylake like this. So that means this software layer is not one set of software installations. It's actually one set for Intel Haswell, one set for Intel Skylake, one set for AMD Rome, one set for ARM Graviton 2, which is an ARM CPU in AWS. So we basically have every installation multiple times for as many CPU types as we need to care about. Okay, I'll get back to this later. Let me show you what this looks like. On a system where we have easy available and it is right there in our prepared environment. So if you do slash CVMFS, you can start looking into this yourself. You will find the easy repository. Right now it's a bit more messy than we would like it to. So there's an old version here that dates from the time before we had a versions directory. So eventually this part is going to disappear. In versions, we have currently two versions. This is a sim link to the other one. This is our latest version, which is getting quite old. We're actively working on a new one. And we will probably ingest that next week. So we've built one in April, 2304, which is going to be our next version. We have a latest sim link as well, which now points to the 21.12. So let's just use this latest link in here. Do it like this. In here you have the compatibility layer and the software layer. So the two layers that are included in our file system layer, which is CVMFS. And we have an init sub directory as well. We have some scripts to set up your environment. Let's take a look at the combat layer first. Like I said, a sub directory for Linux, because eventually we hope to also support macOS. In here we have three CPU types, ARM, Power and X86. And if we look into one of them, this is where you'll see something that looks like a Linux file system hierarchy. Binarys, libraries. In here, glibc is somewhere. I always forget where here it is. So that's our glibc that we will be linking to in the software layer. That's the combat layer. Our software layer has a very similar structure. What type of OS are we using? Linux or macOS? What type of CPU do we have? ARM, Power or X86. But then here it gets a bit more fine grained. In X86 we have AMD, Intel and Generic. Generic means any X86 CPU. I don't really care. But in the Intel and the AMD directories, we get a bit more specific. Currently we have Hazel and Skarlake for Intel. So basically AVX2 and AVX512. And in AMD we have Zen2 and Zen3, so Rome and Milan. And then in here you'll find the modules and the software directories that EasyBuild produces for the installing software. You can see we have this whole structure in the repository. So that's absolute madness. You would never let a researcher manage this by hand. They would never figure it out. That's where the init script comes in. So we have an init directory which has a couple of scripts. The most important one is our init script here. And this does some magic. This will use the ArchPak Python library to detect what type of CPU you have and use that to set up your environment. So if we source this script, this script is going to change stuff in our current challenge environment. We need to source it, not run it. And this is very silent. Why is it very silent? Because it's probably already done. Let me do it. This confuses me. It's supposed to give me some outputs. But it was done by default. So let me do it on our system to show you what kind of output you should be getting. You can see it drops us in an easy environment, but it's not really producing the output I'm expecting. There's a way to make it silent, so maybe that's enabled by default in the prepared environment. Let's do it on our system where I know it's not going to be silent. On our systems in Ghent, we already have the easy repository mounted, but we're not telling anyone yet. So the researchers don't really know. If they would be looking for it, they can find it, but we're not promoting this yet because we know it's not really, let's say, stable and reliable. So what this is doing, what the script is doing, is it's doing CPU detection using ArchPak. And it produces some outputs. As ArchPak says, we're on an X86 CPU, AMD Zen 2, which matches our login nodes. So that makes sense. Using that information, it says, okay, so this is the sub directory I'm going to use in the software layer. And it's telling you here, in slash CVMFS, pilot easy, the current version of that. In the software layer, I'm going to use the Linux sub directory. I'm going to use this particular sub directory for the modules. So it's focusing in on the software that was built for AMD Rome, and it does that automatically. It adds this path to your module part. It finds the Lmod configuration file for that, makes those changes in the environment, and then you're ready to go. Now, what does that mean? If I now do a module avail of, let's say, OpenFoam, I should be seeing the OpenFoam installation that's included in the easy repository. Now, this feels a bit sluggish, right, because at first CVMFS says, okay, you need to know what's in that modules all directory. I'll have to download that metadata and cache it locally, otherwise I don't know what's there. So that's why the first time it takes a while, the second time it should be a lot quicker. It should be, maybe not always. One thing I didn't do is we're still picking up stuff from that we have installed in this case in my account on the system. Even if you empty your module part before, you will still see the things in here in the easy repository. So right now we have three versions of OpenFoam installed in there, and those should work fine. So that's the setup that you do to get started, to get your access to those modules. And now, let me do an on-use of the stuff we have in my account. So I'm only really interested in the easy things. Like this, we have OpenFoam, we have Gromax, I can load one of these, and again this will make CVMFS pull in some stuff in the background. So all these latency aspects, these are annoying, but you can limit those a lot by having a proper caching setup at your site. Right now it's not properly set up, so that way it's a bit slow initially, but it does work. That gives us a bunch of modules. And let's see if we're now looking which Python we are using. So the Python command has changed because there was a Python dependency loaded for God knows what here, something. The Python binary we're using is one that's coming from CVMFS, the easy repository, and the one that's specific to AMD's N2 that is optimal for our current CPU. So that all just works automatically. So from a researcher point of view, let's assume easy is available, it's there. All they need to do is somehow initialize their environment. Right now we have the source script and we can have better ways of doing that. So they do this, then they can start loading modules and start doing their software. And they should work regardless of whether they're on an HPC cluster in the cloud on their laptop. It should just all work fine, it auto-detects what you have. If you're on a system let's say Intel Cascade Lake. So we only have optimized installations for Haswell and Skylake currently. So if you check in here so in the Intel directory we have Haswell and Skylake. If you're on a Cascade Lake system that means we're in trouble because there's no exact match. Archspec is smart enough to say okay if you just take the Skylake binaries you'll be doing pretty good. So it knows what is compatible with what and it will take the best possible match for your CPU even if there's no exact match. So that should also work fine. Okay. Any questions on this? Yeah, let's use the mic. So when the user is running this on a cluster they could do the sourcing in their batch job and pull everything. That's the better way. So if you do this up front and then you submit your job with Slurm which passes down the environment your login nodes are different from the cluster you're submitting to, you're in trouble. So it's better to do this from the job script itself. Yeah. So you've followed a layered approach. Does that mean that you could have removed the file system for example in future replace another one and there is that kind of thought process that things will... Yeah, so the file system layer we're deliberately not calling it CVMFS because that's just one way of distributing this. So what you could do and we're planning to do this at least for archiving is you could say this whole directory so everything that includes this particular version of easy I'll throw it in a container image then I can take that container image jump in that container and source from there. This is going to be a pretty big container image because it's going to have AMD Intel ARM power everything. So maybe you want to zoom in a bit on something very specific or even specific installations from there and you could throw those all in a container image and then you'll need CVMFS at all. So there's options there. If you don't like CVMFS you could sync everything to an NFS file system and just now that it will still work fine. And I also saw that there was a branch bypassed all the layers that went straight up. Is that because of the Is that because of the Yeah, in the diagram The reframe part? Yeah, in the hostOS you go all the way up by passing the compatibility on the file system. Yeah, that's because for some things we still need stuff from the host. We're not going to give you slurm in here anywhere, that doesn't make sense. The GPU drivers need to come from the host because they're too much tied to the kernel. Your InfiniBand drivers need to come from the host that are too tight to your hardware and the kernel. So some things need to leak in but it leaks in a controlled way and we take that into account in here. So when we configure our open MPI library for example, we're doing this with libraries like UCX which basically also detect what you have and just use what provided by the OS and that works. One thing I didn't mention I should have mentioned that is one question we often get like, well wait, this is a good idea but this is going to work in practice. Yes, because Compute Canada has been doing this for five, six years. Exactly this system this layered approach. They used to use something different than Gen2 but they have now also switched to Gen2 because it's a better option. They used to use Nix for the Compute layer but this idea of this layered approach and using CVMFS that's basically what they do in Canada. They have one software stack that's used on all the Canadian systems and they have a team managing that central software stack. They have a mix of InfiniBand and Omnipad Interconnects, they have a mix of Intel and AMD they're not playing with ARM yet but that's the detail in this setup. That's really just another CPU we don't really care too much. So it works absolutely fine. Excellent, thank you. Another question there, Jörg? Stupid question but just some understanding. If my operating system is quite out of date then but a user wants the latest, greatest piece of software this helps me get around that. As long as you can get CVMFS running on it that's enough. That's really enough, yeah. And we will be a bit careful here what we build because there is some things that check what kind of kernel version you have kernel headers you have and take that into account. That's something we'll need to keep an eye on as well. So for example, if we're building something we can actively test for that. We can test these installations in a CentOS 6 VM that has a very old kernel just to make sure things still work. We can run our tests in there as well and that's something we plan to do. So we want to make what we support here as broad as we possibly can. macOS is the annoying one because you cannot run a Linux binary on macOS that doesn't work so that creates a fork in that directory structure and for now we're not paying attention to that but Gen2 prefix works on macOS so we could build a special Gen2 prefix for macOS and then we're doing good again. We'll probably need to do it for every major macOS version which we don't have to do for Linux but yeah. If we have to go through that pain we can and if all of this is automated that's not too big of an issue. I see it seeping in slowly what this could enable so that's good. Let me continue. We have a paper, an easy paper an open access paper which explains in detail what we want to do what all of these layers mean and how we work together how this thing is designed what kind of use cases this could enable and so on. This paper was written it was published February 22 and by then we already had our pilot repository for a while we've been playing with this so we knew technically quite well what we were getting into it's definitely not finished work but the ideas are there. One experiment we did in the year 2021.12 version of easy which is the one I was showing we have a gromax in there and someone ran as a regular user so not a sysadmin just in a regular user account was playing with gromax coming from easy and comparing it with the gromax they have installed in Ulig which is optimized for their system and their interconnect and he was doing a simulation up to 16,000 cores so pretty big and we're seeing this kind of performance the dotted line is I'll deal scaling the black dots are what he was getting with easy with the auto detected Zen 2 so this was an AMD ROM system and the red is the performance he was getting with the optimized system installation so higher is better you see some of these black dots hovering above the red ones that basically means you were getting better performance than the system one but that could be in the noise so I'm not going to say that the installation is better they're quite close to each other but at least we're in the same ballpark and that's already pretty impressive because our easy installation knows nothing about the system nothing about the interconnect it's just doing the detection of the CPU OpenMPI does the detection of the interconnect and it scales and works fine and that's the main message here the blue dots that you see are the generic binaries that we also have in easy so you can force easy to use setting an environment variable so you can basically disable the detection and say run these binaries because I want to test this or see if it works so if you force it to use the generic binaries you're getting the work performance of course but it still scales so there is a gap here in terms of performance but it still works fine and this was a very important result that's why we put it in the paper this basically shows that the idea can work and you can scale quite good and again I'm not going to overstate this but this difference in performance here could be because we're using a newer G-Lib C so I think this was still CentOS 7 which means an old G-Lib C we were using a very new G-Lib C in our Compat layer and that could give us a performance boost as well now again it's close enough that it could be in the noise but at least it's a same ballpark definitely alright so what's the current status of this whole idea in the tutorial page this morning I wrote up a small history of the project it basically started after having a meeting in Delft with some Dutch universities that invited me there to talk about EasyBuild because they had some crazy idea to work together on something big and they wanted to get some funding for this and they were figuring out what to do and we had a bit of a brainstorm at the end of that meeting and the conclusion was basically try and do what the Canadians do but on a bigger scale like make it a community project make it a European thing even maybe we can even get some European funding for this and let's see if this works that was March 2020 it was a bit of a ruse also to have lots of beers together and have the Dutch people visiting Belgium and the other way around then the world changed a bit so we couldn't really travel much but we did use that time to work out the proof of concept this pilot repository and we've been working on this since then we set up a GitHub organization on Easy called Easy we've been doing monthly meetings monthly online meetings every month since basically April 2020 to see what is the next step we should take how can we tackle the automation who's going to work on testing so basically getting ourselves organized writing the paper happened in that community as well and we applied for funding on the European level as well so we ended up with the proof of concept setup where the central server is running in Groningen in the Netherlands and we have four stratum one servers one in Groningen one in Oslo in Norway one in Azure which is I think running on the east coast in one of the data centers and one in AWS which is running if I'm not mistaken in Ireland so they have four of these mirror servers that means we have a relatively robust network as well and we just wanted to see what happens like one of these mirror servers dies it's just fully transparent you don't really notice we also wanted to have a good set of software in there already for now it's only CPU because supporting GPU is a little bit more tricky and we had to figure it out but we have some big things in there like open foam like TensorFlow like Gromax you can load those modules and it should just work in terms of targets already a good set of CPU supported Intel AMD a couple of ARM ones and we're still mucking about with Power 9 but nobody's really interested in that and that's a pain to get that to work so we're going to start doing this the interesting news here is also we have pretty good contacts with both Microsoft and Azure and they're basically throwing cloud credits at us like whatever you need tell us if you want to build binaries and test stuff in different operating systems we'll basically do that for free and that helps us a lot because we can get very easy access and very quick access to a big variety of CPUs which is what we need to build all those binaries so that's been very helpful so it's like if you look at what we're all combining here lots of open source packages so all of these things basically existed already we're puzzling them together and the Canadians showed us how to do that we're leveraging the cloud which gives us very easy access to all these different CPU architectures there's also changes in OpenMPI and there's these companion libraries like UCX and LibFabric that do auto detection of what kind of fabric you have that's definitely an enabler for us as well so it's like everything is basically there to make this possible and make it work well and we're building that puzzle and making it happen that's basically what's going on we'll do the hands on and demo at the end once you have a good idea of what's going on now one thing we're now actively working on and thinking about is we want to make this a community project we want to basically bring the community together and work together to get installations in there now how do you do that we don't want you to come up with a binary and say here's a binary throw it in there that doesn't from a trust and security point of view that's a bad idea and we also want to make sure that these things are actually working we can test them and so on so what we're doing is we're setting up a way that you can essentially send us a pull request and it's a bit easy but focused but it applies to other tools as well that somehow expresses okay I would like to have openfoam version 10 added to easy so if you use this easy config file you should be able to build that binary for all these CPUs that you support so they make a pull request and what we want to do I'll skip ahead here a little bit because there's a lot of technical stuff here that's not that interesting this is basically what we're going for is we want you to be able to open a pull request to our software layer that says please add openfoam 10 with this tool chain earlier you would say okay that makes sense our bot will start building that in Azure some in AWS maybe some on-premise we'll use in a container for this so to isolate it from the host OS as much as we can once we have some tests we'll run some tests on those builds as well if that all looks good we'll put those installations in tar balls so it's all nicely in its own installation directory so we can make tar balls to easily ship them to other places those tar balls get uploaded to an S3 bucket a place where we can collect stuff they get copied over to the central server before we add them to the easy repository we have another step and this is more to keep control of what's actually going in and this part is automated with the cron job as soon as a new tar ball appears on the stratum 0 a pull request is opened to our staging repository that basically says this is a tar ball these are the files in there does that make sense to add when it says ok makes sense so they hit ok in that pull request if we want to we can have another bot here that re-runs those tests and when we're running the tests we could run them in the build container so the same build environment but also in a totally different container with a different operating system and it should still work so it has to pass all those tests if that all looks ok they'll be added to the easy repository and again easy repository is just the cvmfs repository and it's streaming that means if openfoam 10 gets ingested here it will automatically appear on all the clients that mount easy they don't need to do an update at all it's just streaming in like Netflix adds a new movie you don't need to update Netflix it just appears and that's exactly what happens with all these installations as well so we're now working on building this whole pipeline and automating this all and making sure this works and we have a bot that you can tell go ahead and build all these things and it reports back whether that worked or not but we're improving on this the testing step is now very light to almost nonexistent but that's the next iteration so build and deploy is our biggest goal and the test part we can enhance later so that's really the explicit goal to make this a community project that's more text which is basically explaining what I just did so that bot that I talked about is a GitHub app which means when you open a pull request an event gets sent to some Python code that can decide what to do and the event could be if the reviewer hits OK on that pull request that's an event and the bot would say that means I can start build start the build on all these things automatically so we're basically taking out the human as much as we can we want to still people have approved things so we're not getting anything malicious in there but that's the idea so it's automatically building that there's some slides on this as well that explain it step by step so someone opens the pull request and our goal there is to make this an easy stack file so this new experimental feature in EasyBuild that just says it now actually looks a bit different but basically says I want open foam built with this tool chain and these two versions OK that looks good and then EasyBuild knows how to do the installation there's an approved review the bot says OK I'll submit some slurm jobs to build all these things on different types of CPUs that gives me tar balls for all that software the bot says the builds worked now what do I do and the reviewer says OK let's go ahead and test this and make sure it works in another container maybe we're building in Santa West and testing in Ubuntu that should work because of the combat layer if it doesn't we overlooked something and you could have a reframe test for that as well then the bot says tests look OK to me now what do I do and the human says OK that's good let's get it in there looks good so this cycle is what we're now building and then when the bot gets OK it just can do all these uploads and ingestion so you basically add it to the easy repository and then you're done then everybody can start running that software good like I mentioned one of the goals of this initial collaboration was this is a very good idea but nobody has time square wheels and round wheels and all that stuff so getting funding for this was very important it took us a while it's a lot of effort to start writing this idea from scratch we realized that getting funding for a service like thing is not easy it's not research so what we did is we found some researchers that actually have ambitious ideas as well that we can help so they want to develop some software to do multi-scale modeling like batteries and helicopters and all these complex things where they need lots of software it's a headache to get all of that in place and be able to migrate between systems so we are helping them to go through one of our use cases so we're combining forces with scientists from the CCAM consortium to get with people that are already active and easy we're joining forces there we have a scientific use case and we are going to help them in achieving that use case so that's what the multi-scale project is about we proposed that to as a Euro-ACPC center of excellence and got accepted and generally first of this year the project actually started so we now have more dedicated manpower to make easy possible so we're going way beyond this pilot repository the development of this bot has really sped up a lot in the last couple of months we've made very good progress there so we're slowly working our way to make easy more reliable and go beyond a pilot repository we have a website as well where we're starting to make noise about the things we do we had a kickoff meeting and so on so demo one demo I will do is I already did the one on our HPC again infrastructure what I will also do and let's see if this works because I didn't test this at all this morning but it should work what I'll do is I'll I'll create an empty Ubuntu VM in AWS and show you how quick it is or how quickly I can get access to easy on a totally empty system so a blank operating system so I'll fire up the VM somewhere in here launch a new instance in a while since I've done that easy demo let's make it Ubuntu 2204 that makes sense let's make it a bit more interesting and go for ARM we should at least have some C6G I think let's go for 8 ish CPUs 16 gigabytes of memory that should be enough for a demo I have a key in here and finding where that key is is going to be interesting but I'll figure it out and the rest is pretty standard I'll make sure it has enough disk so 30 gigs of disk and the rest should be fine so I'll fire that up it should only take a couple of minutes and then we can check if we can actually get easy working so that's basically the top one that I'm just setting up there's nothing there CVMFS will not be there that's the standard package we'll have to install this first we have a tiny script for that and then we can at least show on a single VM how quickly we can get access to this of course this requires admin privileges you need to be able to install CVMFS and configure it and mount that file system if easy is already there you don't need to do that so I'll put it on our cluster I was just a regular user there so I can just source the init script and start running that scientific software there is another option I'll get back to these when the VM has spun up there is another option and that one you can actually try yourself on the prepared environment or if you have a system somewhere where you have either Singularity or Apptainer you can try this yourself this is part of the instructions but if you click the link to our documentation this will show you the steps that you have to do so you have to pre-create some libraries, some directories you have to bind mount some paths in there because these locations this needs to be writable for CVMFS that's where it's putting the cache also this place needs to be writable in the directory and then you can basically do a Singularity shell to shell into our container so the container we're providing you here our easy client container basically only has CVMFS in there it has no software like Openform or Gromax so that's just a way to get around not having admin access to install CVMFS and we're mounting the repository using the fuse mount option of Singularity or Apptainer if you copy paste these things these commands on any system where you have Singularity or Apptainer it should work and it should give you a shell like this where you can then check that easy CVMFS repository is mounted you can source the script and try something let's see how our VM is doing and if I can get access to it it looks like it's running it's still initializing but it should be close I'll need to figure out where that key is I think I have it in here somewhere and I've kicked off a VM in EU West so that must mean it's easy EU West must be this one I'll need this identities only and then fun part is always what is the user name on the VM that you start I think it's Ubuntu so we're getting access to our empty VM Ubuntu 2204 there's no model command here absolutely hopeless there's no easy builds so we have an empty operating system so what now what I will do is clone the easy demo repository that we have on GitHub so we can just get clone this so all we need is Git that should be easy to install in here we have some test scripts for Gromax, Openform and so on there's also a script directory which has some installation scripts and this one is for Ubuntu scripts of course so this is all you need to do to install CVMFS and the easy configuration for CVMFS so this whole part is CVMFS this part is our tiny configuration package which just installs a configuration file for CVMFS that tells it about the easy repository very soon this will no longer be needed because the CVMFS people are asking us to include easy in their default configuration they see the value in this as well so this part will disappear and this creates a tiny configuration file for CVMFS where here it says I'm not using anything special in terms of proxy cache or I'm basically directly connecting to the mirror server which is not ideal for latency but it's good enough and this says you're allowed to use 10 gigabytes of cache don't go beyond that so it's like a what's it called as soon as it's long enough ago that you put something in the cache you know needed it will be kicked out so it recycles it has a name I forgot what it was so I have pseudo writes in this VM I can just run this script it will pull in CVMFS which will pull in some dependencies as well so that's going to take a minute or two but once that's done you can get access to the easy repository and start playing with the software that is there you can imagine even a researcher could do this right if it's only running a single script so if they're spinning up a VM you could have a pre-configured VM that has this already installed so that's what the AWS and the Azure people that we are working with are considering you could give pre-baked VM images that are tagged with easy and that way all you need to have is CVMFS in there and everything is basically pulled in as you start using it so that makes it very attractive it's going to be a small VM that gives people access to lots of stuff so that's it now let's see this looks empty so that looks wrong but CVMFS does auto-mount so if you actually know what is there and you will on that it will mount it and it will be mounting the easy repository and from here it's exactly the same as before latest in it Bash this is some ARM CPU I'm not sure what and it's a relatively recent month so it gets a little bit confused but it's basically an ARM Graviton 2 at least that's what it detects and then it does the same thing as before it does a multiple views on that which means we suddenly have Gromax TensorFlow all available OpenFo TensorFlow Gromax the law looks good so that means we can start so now I'm in the TensorFlow directory there's a run script here which basically does a module load and runs the Python script so very very basic and this again when you run it it's like come on let's go right but it is pulling in all that stuff in the background not only the module file but we're firing up Python so it has to download that Python binary all the dependencies for that the Python packages but you can see it takes a couple of seconds and starts running TensorFlow I think it's very difficult to make it easier to get TensorFlow running than this for now this is CPU only there's no GPU support yet but we're working on that and that will also be possible that's very important of course and this you can try yourself you can in the prepared environment you should be able to just clone the easy demo repository and run one of these example scripts and it will as long as you have set up your environment to use easy with the source command it should be working fine if you have obtained it on your local cluster you can do the same thing as long as you follow the instructions here so this is let's say the technical details on how to set up your environment and do the singularity shell if you look into our documentation and the easy container part in here we actually have a wrapper script as well which does all that magic for you and drops you into a container that has easy available so as a scientist if as long as they have this script they can just run the script and they can do single note stuff very easily any questions on that yes you showed us the pipeline where you deploy automatically the new software and one of these steps which is you said that it's not mentioned here but it's basically testing and I'm assuming that's the reframe tick mark it's there we're not actively doing that yet but we will so I guess that's functional testing you're referring to but we could also do performance testing in there yeah but what about kind of security aspects because you know if you're taking software from another kind of unknown built by that could also be done there yeah so this approval step here that can be human it just says okay right well you can have security scanners in here as well that first scan the source code that you're pulling in and gonna use for the installation yeah and there's lots of tools that do this already they're usually quite cheap if you're only scanning source code it's okay you could even after the build rerun the scanners again on the binaries for the check for water marks and or fingerprints yeah that's definitely possible and well yeah as we we become more and more serious about this we're gonna do that of course and this is a one-time cost right so this you only have to build those binaries once thoroughly check them then it's okay but we will also do actually even after stuff is already in the repository we'll do weekly retests to make sure everything keeps working because at some point and we bring back this guy in the compatibility layer here there's a gilip-c that we build ourselves gilip-c also has security issues that pop up every now and then so we will have to update our gilip-c in here I will have to make sure that doesn't break anything in there because gilip-c is supposed to be drop-in replacement drop-in update in practice it's not always the case right so that's something we're very careful with so what we can do is in the sandbox environment do the gilip-c update and then re-run our test suite and see if something is broke and that's exactly what we're gonna do okay I have a couple more slides on what this could enable if it's actually working it has a bit more software so there is stuff to play with there but let's say it's in about 100 modules eventually when we have this automation in place we'll start installing everything that EasyBuild has because why not right as long as it's open source we could include all of that in there and then you have a collection of let's say about 3,000 software packages waiting for you to be used it's like a catalog and you can just start using them if you think you need them so that's the demo did I forget anything here not really so please give it a try the easiest way to try it is with with Singularity or AppTainer you don't need anything else if you have AppTainer installed you can pull in our client container as long as your environment is set up correctly and you can do single node tests with what is in there in Easy quite easily so this enables a couple of things that haven't really been possible before the interesting things start to happen so when you give people a uniform software stack that works everywhere there's like new opportunities that arise and we're discussing them also in our open access paper so uniform access means wide variety of systems you can run anywhere you can play around on your laptop if you're confident enough that your script works or your input files are real prepared you can jump to the bigger system with a lot less effort your operating system becomes mostly irrelevant we can leverage high speed interconnects like we showed with the Gromax benchmark and so on and we can actually prepare ourselves and that's part of the multi-execl project we can prepare ourselves from for the time that there will be risk 5 CPUs supercomputers in the future so that's something the EU is betting on quite heavily they want to become self-sustaining essentially so build their own processors and become independent from China and their risk 5 is one way that they will be able to do this so there's lots of research going on currently and a small part of the multi-execl project is to see how difficult it is to start building binaries for the CPUs already we can do this with emulation so we can have an emulated VM where you're building stuff for risk 5 which is going to be quite slow when you're building but once you have the binary it should run you can actually start building the software before the CPUs will be there because this is a very standardized instruction set and very predictable all the nasty details that you need to know we can make this happen there's lots of software like OpenMPI like Python already works fine on risk 5 so it's starting to get there that it's becoming realistic to do this already even though there's no very capable CPUs yet like I showed with auto detection of the CPUs and building for different generations of CPUs you can do this without compromising on performance so very different from what you do with the container image where you build one binary that works everywhere and even that is going to slowly start changing one binary that works everywhere works as long as you stick to Intel and AMD if you go to ARM and it's game over you have to rebuild your container image and the same thing with risk 5 so you will at least have 3 container images that you need to juggle around so that's sort of a dead end to me but in this collaborative software stack you can build for different generations of CPUs different families of CPUs and you can either auto detect or you can tell which part of the stack it should be using so that's a very large contrast with generic binaries that you typically see in containers it facilitates cloud bursting as well so if people have a job and the queue is too long on your on-premise cluster and you have some credits in the cloud just throw your job into the cloud and as long as your data is there your software will be there waiting for you one thing that also enables is using easy NCI environment that to me is very interesting so some scientists are definitely running tests for their code like every time they change something in the code they have some test cases that they run over and over again something that's very painful is that that means they have to compile their software have to make sure all their dependencies are there have to make sure that the compilers that they want to play with are there all of that could be coming from easy because why not all you do is mount a file system load modules and things become magically available they are streamed in as they are needed so all you really need to install a cvmfs package and I showed you how quick that is and everything else is streamed in as is needed file by file so not huge packages of gigabytes that you need to download and also in other environments so in Jenkins or GitHub actions this is quite typical but it also enables you to run those same tests on your laptop in the same software environment as well now we've actually done this we actually do this for our demos so for the TensorFlow demo I was running we're running those tests in GitHub actions as well in a workflow to basically make sure that our demo script still works and doesn't break and all we need to do here and this is very small we've come up with our own GitHub or this is a this is a central GitHub action for mounting cvmfs repositories in this environment so you just tell it okay I want to use this I want to mount the easy PILO repository the configuration for that can be found at this URL so this action knows that it should install this package and do the mount of that and then we're done you source our init script and you start running your tests you load modules in those tests and everything just streams in as needed so it's very very quick when you do that we're doing this in the easy demo environment I'm not sure if this link will still work because they clean up stuff in there every now and then but you can see we're running our tests for TensorFlow OpenFone, Gromax and Bioconductor and we do it twice I think we do it once with CPU detection and once forcing it to use the generic binaries just to make sure both of those aspects work so that's to test our own demo script we can imagine a scientific software developer developing their own code could write tests that just load a bunch of modules for the dependencies load different modules for different generations of GCC that you want to test with and then make sure that that code keeps working with all of that so forget about figuring out which RPMs to install it could all be coming in from easy so that's at least very different to what's currently possible that's another example where we have our own dedicated action for making easy available in GitHub Actions and the version that you want to use everything else it knows that's a bit more minimal but it works just as well and here for example reloading Gromax and we're checking the version works fine and that's that in action like I already mentioned this also facilitates HPC training so if you're giving a training session on OpenFone maybe you want to also explain them how to install OpenFone in day one of the tutorial I'm not sure you want to spend your time on that if you want to focus on the actual science that they do you could say for this training we're going to use the easy environment because OpenFone is installed in there we even have two or three variants of OpenFone installed in there so you can pick one and when they get home they can try it on their laptop as long as they mount easy they can jump to their cluster send an email to their system in and say this easy thing can you please make it available because it makes my life a lot easier when they hit the ground running with the stuff that they learned in the tutorial also for the trainers it's very easy they can set up a slurm cluster in the clouds they mount easy and as long as the software that they need for the tutorial is there the work is done when the training is done you just throw away the cluster and you set it up again for your next training session so that helps a lot compared to making all people that are attending the training get an account on that system and going through the security or whatever administration is needed to get their account just set up your own cluster in Azure or AWS and there's lots of tools for that like cluster in the cloud or magic castle and there's parallel cluster for AWS there's az help for Azure there's lots of tools that allow you to do this quite easily and integrating easy in there is very very trivial like I showed it's like five or ten lines of bash as long as you can figure out how you can make the tool do that you're good to go this I also briefly mentioned already we think this could be a step towards software developers as well so if someone was asking are people opening it was Stefano I think are people opening pull requests for easy blocks not really because easy build even though many people are using it that's one way of installing the software but if we have easy and we go talk to the GROMAC developers for example and say look you could maybe add this to your documentation this is an easy way to get access to GROMAC's binaries that run anywhere and that are properly tested that also makes it a platform for them that's maybe more interesting than helping out some random build tool that not only a handful of people are using they could alleviate them from many questions on installation if someone isn't sure as long as they can figure out how to get easy working they can get GROMACs running and then they should be happy and that means it's maybe more attractive for developers to actively help out validating the installations that we provide they could figure out or they could help us figure out which tests we need to run to make sure it's all functional if we're doing a performance check they could tell us one nanosecond a day on an Intel Scar Lake that seems a bit low right so something must be wrong there well we have really no idea we can say we get a one and we'll try and make sure we keep that one but if the developers say you should be getting five nanoseconds a day that helps us as well and we can figure out what we did wrong getting that kind of feedback I think becomes more interesting from the developer point of view because you'll probably be helping a bigger set of people of researchers as long as they use easy also for the developers themselves like in the CI case maybe they can get their dependencies their compilers that they want to play with from easy and that helps their development a bit as well another thing is portable workflows so especially bioinformaticians do a lot of this they string different tools together in the pipeline and all the tools do something small but their genomic data or whatever they are using that's all very good but if you need a thousand tools to get your research done that becomes a bit painful to get all those installations in place if they are a part of easy a big pain of that goes out the door and they can be running their snake make or next flow or whatever on top of easy that's very different from the container approach that some of these tools take when you're running next flow you're pulling in containers which includes a whole bunch of binaries that you're probably never going to run right well with CVMFS only the stuff that you're actually using is being pulled in so you're downloading way way less things than you would be downloading full container images alright so that sort of wraps up all the content I've prepared there's our open access paper there's our websites we're still doing a mailing list for easy which is only used to announce our monthly meetings really there's very little activity there but we do have a very active Slack channel where people jump in all the time and ask us some questions we have documentation which could definitely be improved but it's not that bad we don't get help we have a Twitter account we have a YouTube channel just like we do for EasyBuild where we're posting any talks we do or we had a community meeting last year in September in Amsterdam where we also explain easy from scratch in a bit slower pace than I did today but also talking a bit more in depth about the use cases talking a bit about the bots that we're building and so on and we have monthly meetings which are really open to anyone to join that's what I have in terms of content yeah I guess so I think I'm pretty much done in terms of content so we can still handle some questions of course but yeah if you can I think that makes sense as soon as they can start the config yeah yeah so I think the model is that there's a central repository of all the easy software and then there's all these mirrors potentially if you have many many users you're going to need lots of mirrors so who's managing the mirrors who's owning the mirrors who's paying for the mirrors yeah okay that's a good question so from a naive point of view it looks like the more mirrors the better that's actually not true so there's like the CVMFS people tell us go for a dozen mirrors or so spread around the world but don't do many more than that because every time CVMFS starts or needs software it's going to check like oh what's my best option here which is just geolocation of the mirror so based on the IP address it figures out which one is the closest one it should be talking to to minimize the latency and the delay in downloading stuff the more options it has to choose to the more things it needs to ping to see if they're still alive and it's actually going to slow things down so we're looking at the order of like a dozen or so mirrors and what we envision is that the let's say the maintainers of the project also maintain the mirrors at least the public mirrors because then if something goes wrong with the mirror if it's somehow not syncing anymore that would be bad because that means new software is not streaming in there if you're coming in through that so we're looking at like say a core team of how many maintainers we will need that also keep an eye on those mirrors set up monitoring and make sure everything works as expected so we think like let's say ten dozen mirrors couple in the US several in Europe one in Australia one in Asia to be close to those people if they want to use it as well but the focus is going to be on Europe probably and that that should be enough so there will be a team managing that there's as a part of the multi-excaled project there's the idea of setting up a rotation among five or six of the partners that are involved in the technical parts of the project so they will be keeping an eye on the mirrors they will also set up a support portal for easy what people can ask questions or say I tried running this and it didn't work and then we can either figure out is it a problem with the system is the problem with easy that we need to talk to the software developers like pulling the right people and try to figure out what's going on and make it possible one thing I didn't mention which is also interesting the ones I'm showing here are like the public mirrors so which are part of the easy network let's say if you install our easy configuration package those are the mirrors that CFMFS will know about you can install your own mirror server as well you don't have to ask anyone it's all documented on how to do this you could put it next to your HPC cluster in the same network so you're reducing the latency and that will automatically sync with the public servers of the network so that's like your own mirror only for your use case and that will not be serving anyone else so as long as you do that and you maintain your own absolutely fine so that's just one or maybe two more servers that you add to the list in the configuration and then CFMFS will also ping them but not all the other ones that people have at their own cluster so you can in your network you can and that helps as well because maybe your cluster nodes don't have access to the web so they could be offline as long as your mirror server is in the HPC network and they can ping that that's actually okay that's all you need because your mirror server has a full copy another thing a full copy sounds scary that sounds huge right maybe hundreds of terabytes and not at all so what Compute Canada does is they have thousands of software packages installed and last time I checked they were still under a terabyte of disk space with CFMFS because CFMFS does deduplication it will never store the same binary twice so it has object storage in the back end it's like Git where all these hashes and so it has only every file once on the disk and it also does compression on disk and when sending the data over the network so Compute Canada can serve all the software all the scientific software that the researchers need across the whole of Canada with a terabyte of disk so yeah setting up a mirror server is not scary in terms of resources you need it could be a two or four core VM with one or two terabytes of SSD disk and you're good I have coffee yeah I don't need I guess I don't think we need to ask for more I'm happy to take more questions but other than that it's sort of a wrap right it's up to you if you can if you're ready for the weekend I'm totally fine with that I'm also happy to answer any questions I'll show you something more specific more technical up to you so yeah if you can I would say take a look play around with it maybe on a VM maybe through the container and just see if it doesn't work for you we'd really like to hear from you because now is the time that's also clear in our documentation so whenever we talk about using easy there's a big fat warning here right so this is not ready for production our pilot setup is you can play with it and there's been very little instances where it doesn't work but please don't make this the only software that you provide on your cluster today that we don't have strong guarantees there we're going to move to a different domain we're going to make it easy .io it becomes a bit more neutral not HPC specific because we're going to rebuild our cfmfs network with dedicated hardware that's securely set up with UB keys so only a couple of people can access it and stuff like this so and we're going through that exercise now once we do that and once we have the automation in place with the bot to set up that whole contribution workflow then we'll start saying okay now we think you can start relying on this and assume this will work today not yet but I think by the end of this year will be a lot closer to that if not already there that's an explicit goal and the first year of the multi-excal project is making easy ready for production which means it's stable it's properly set up and from that point on we'll start expanding with more and more software testing making noise about it attracting attention getting developers involved and so on okay if there's no more questions I'll I'll stop the recording yeah alright yep thank you