 Okay, so, hi, my name is Maxim Bosano, I'm working... Hi, my name is Maxim Bosano. I'm hearing Echo. Okay, now that's me. So, I'm working with Compute Canada. I've been working with Compute Canada for Citizens 2012, and I'm located at University Laval in Quebec City. Alright, so, first, what is Compute Canada? Compute Canada is a little bit like Exceed in the US. So, it's a consortium of universities. Actually, it's a consortium of regional consortia that includes university. There's roughly 35 member institutions. That's basically all of the major universities in Canada. There's about 200 technical staff spread out around the various universities. Roughly 15,000 user accounts with a growth rate of 20% per year. We support... Basically, we run supercomputers and we support researchers, academic researchers across Canada in all fields. It's free access for any academic researchers in Canada. So, our use of EasyBuild started around 2015. So, before that, we had roughly 30 different hosting sites. So, 30 different clusters. 50 different clusters, sorry. All configured differently. Software was being built separately on each cluster with little overlap on the procedures between the clusters. In 2015, there was a major new hardware deployment in Canada. So, five new national sites were deployed for a total of 270,000 cores, 2,500 GPUs, a couple tens of petabytes of disks and tape. So, that includes four major cluster, three that we call General Purpose. That's CEDAR in Simon Fraser University, Graham in Waterloo University, and DeLugan in Montreal located at École de Technologie Supérieure. They have various CPUs. So, some Skylake, some Haswell, some Cascade Lake. There are various GPUs. So, we have Pascal 100, Volta 100. There is one cluster which uses Intel Omnipath. The other ones are using Infiniband EDR. We also have one called Large Parallel Cluster in Niagara, which is fully non-blocking Dragonfly Infiniband. So, that's for large parallel tasks. And we have one cloud system. So, basically one big open stack cloud system. And each of the main clusters have a little open stack partition on the side for things like portals and stuff like that. So, when this new hardware was deployed, there was a big push to uniformize our practices. And so, there were national teams that were created. The team that I lead was in charge of software installation amongst other things. And so, the main goal we set out was that users should have an interface that is consistent, easy to use at all sites. And of course, it should provide optimal performance. So, that meant a couple of things. And we identified a few tools to accomplish those goals. One of them is that it needs to be, the stack needs to be on all of the clusters without effort. And so, it means a distribution mechanism. It needs to be independent of them from the OS, because those clusters are actually run by different teams of systems administrators. They could be upgraded at different times. And so, we need to provide a stack that is independent from that as much as possible. Of course, things should be track reproducible. Since we are 200 staff working together, we need to have some automatic way of installing software. And since we serve so many users on so much different hardware, we have a very large module stack. And so, we need to be able to handle that. So, we use a lot of the same things that Alan presented at Hewlett. So, hierarchical module naming scheme. We also use hoax when installing and so forth. So, for the parts that we chose, so for the distribution mechanism, we use CVMFS, which is certain virtual machine file system, which I will talk about in the next slide. To get a stack that is independent from the OS, we have a compatibility layer, which we started using NICS. But for the next version of our stack, we are using Gen2 prefix instead of NICS. For all of the scientific software layer, we use easy build only and we use Lmod with a hierarchical module naming scheme. So, CVMFS, the way it works is you have your data in three points. So, that's where we build the software. We have a dedicated hardware to build software. Once it's built and tested on the build node, we publish that to what's called a stratum zero. So, this is the master copy of the software repository. From this automatically, there are a number of stratum ones that will fetch the latest version of what was published on the stratum zeros. So, this we have a team that manages this geographically distributed infrastructure. So, we have multiple stratum one across Canada and I know the team is also looking to put some of those in the big content delivery network out there. And on each site, so there are usually two or more squid proxies. So, the way that the data, the software is distributed is over HTTPS. And so, the squid proxies will provide a partial cache at the site level so that if there's a knowledge of your wide area network, it will keep working. And the client nodes also have a cache on the node. So, it is fully resilient and there's no single point of failure. Even if the stratum one disappears, there's a full copy of the software on multiple stratum ones. Sorry, even if the stratum zero disappears. For the software that we install on that stack, the red box here is what's installed by a systems team. So, that's the OS kernel, the demons, the drivers, anything privileged that is not distributed by our stack. Then on the stack that we distribute, we have some libraries that are also in the OS because it's needed to compile an MPI, for example, you need to have the InfiniBand library or Omnipath library. This can be overridden at the site level through the library path or some other mechanism. Then the compatibility, the lethal layer, we install everything down to the libc. So, make the bin utils, the auto tools, all of that is installed in there. It's basically like a big bundle module, which we don't install with Easy Build, but we provide it with Nix or Gen2. And then the green one is mostly deprecated because we have two systems so we can choose to install things in Nix or Gen2 or in Easy Build. We used to install some things in Nix, but we run into issues, so now we are installing all of those in Easy Build directly. And on top of that, that's the Easy Build layer. It supports multiple CPU architectures. So, we have some legacy sites that run on hardware that is only capable of SSE3. So, we do compile for all of those CPU architectures. So, to date, we have roughly 600, 800 different scientific applications. So, that's, openform is one. It's not one per version. If we include the permutation of the version of the software, the compiler, the tool chain, the architecture, we have over 6,000 different builds. This infrastructure here, we had the deployment of our first AVX2 cluster roughly in March last year. Was it last year or the year before? But we were able to recompile everything for AVX512 in a matter of days because we use Easy Build and this is the bump that you see here. We also have some specific way of handling Python which I will talk about shortly. And so, this is the bump when Python 3.8 was introduced. So, a few design choices or EB features that we use. So, because of the compatibility layer, we filter out a lot of the dependencies. So, M4, CMake, et cetera, those are just ignored by our config because it uses the versions in the compatibility layer. We also have different tool chains from the upstream. So, all of the sites in Canada were using combinations of Intel or GCC, OpenMPI, MKL or CUDA. There was not a lot of Intel MPI or OpenBLAST. So, we are not quite using FOS nor Intel, which are the two primary tool chains in Easy Build upstream. That means we are using or abusing, we could say the try tool chain, try software version and there recently introduced try update depth options in Easy Build to reuse the upstream easy config but installing with variations. We do have a custom module naming scheme, which is basically your article naming scheme but all in lower case, we don't want our users to figure out, to have to figure out the capitalization. How do you pin SIMD level to combo app and computational system? So, basically when we compile for the different CPU architecture, all we enable is on GCC, it would be the mArch equals something which I don't know of the top of my head. We have those scripted in an EB wrapper script that it sets the up arc, the correct up arc in Easy Build for the different architecture and we let the Easy Build handle the rest basically. We completely ignore version suffix so our module names are strictly module name slash version, nothing else. If we do need to have different versions of it, we instead change the module name. So I posted something in Slack earlier, we have FFTW and we have FFTW-MPI so that's how we handle it instead. The tool chains are mostly hidden. We don't use LD library path because in the compatibility layer, we have a linker wrapper that will automatically add run path or R path to the compiled binary so LD library path is not necessary. And I think that's it. I added a few more notes in the written version of the tutorial so we do use hooks more and more to avoid getting, to avoid having different easy configs because of the same management issue that Alan mentioned. So if we are too different because of the maintenance issue, sorry. So that means that you run all AVX512 binaries if the system supports it. So the way our stack is built by default, and can you see my terminal? Yes. Okay. So when the user logs in, basically we will detect what's the highest set. I need to increase the font size. Okay. What's the highest CPU instruction set that is supported on that node and there's an environment variable that is set for that and that will control the branch of the module tree that the user will see. So on this node, it detected that it's AVX2. We can also overwrite that. So if we have, for example, our CIDAR cluster has AVX2 but also AVX512. So by default, they set it to AVX2 because that works on all of the node but users, we provide, we provide special modules called the arc modules and users can switch. For example, load the AVX512. So if I load that, now because of the hierarchy, it will reload a bunch of modules and if I now look at which MPI exec, for example, this is the branch for AVX512 module. So I see there was a question, how do you make sure the software is optimized for the hardware? Well, yes, we do build it X times. So this is for our older stack, the very new stack, which is the 2021. So we have different, and I think this one is still hidden. So if I look at the hidden modules. So the 2020 stack, which is based on Gen2 prefix, rather than NICs for the compatibility layer, we also enabled for the Intel branch, we compiled with FAT binaries for AVX2 and AVX512. So that the same binary at one time will pick whichever set of instruction is supported. That doesn't work for GCC. GCC doesn't support, I think FAT binaries, unless it's changing in the most recent versions, but we do it in 2020. Otherwise, we basically choose a good default and we let users switch if they are more experienced at it. I did also in the written note, add some things about it. Whenever we install Python in a module, we will install it so that it works with any versions of Python. So for example, we do have a module sci-pi stack. I do have slides on that. I think I forgot to copy. Oh, no. Sorry, I thought I thought I was done with Python. So I covered that for the hooks. Thanks, Kenneth. Because we support Omnipath and Infidemand and Luster and GPFS, our OpenMPI is compiled with basically every option possible. That means quite a bit of customization from the upstream easy build. And so I think we inject some configuration options through hooks. We also inject some custom code in the module for the compiler and MPI in order to support installation the user's home directory. So because we use a hierarchy when you load those MPI or GCC module, it changes the module path. It will also pick up module paths in the user's home directory if they are present. Because we redistribute our stack, some parts of the Intel compiler were not allowed to redistribute. So we need to split the installation to two different CVMFS repository, one which is private to Compute Canada and the other one which is open. So we do that with the hooks. We also strip down Python modules. So our Python module is bare with very much minimal extensions. And we use a different mechanism. That's true. I have stacks slides on that. So we do install Python wrappers, for example, PyQT, we install that with QT OpenCV Python, we install it with OpenCV. And we will use multi-depth which actually was contributed by Compute Canada to handle that. But we are not installing most Python packages as modules. Instead we use what's called Python wheels and I have a slide on that. We also don't support Anaconda at all on our wiki. Basically those users don't install Anaconda. So what are Python wheel? They are basically binary packages that you can compile against your own stack. So for example, we have a Python wheel for NumPy which is linked to our version of MKL that we provide with the stack. Same thing for H5Py which is compiled against HDFI. And then we direct users to create the virtual environment and run pip install whatever they need. And we let pip handle all of the Python dependency things. That also has the advantage of having users not so dependent on updating software because there was a lot of stuff that was not found or something. They install it and it stays there. And you can use this stack too because we built it to be portable. It's publicly accessible on our wiki. It takes about five minutes to get it running on a VM or on a cluster. It will take you a bit more time because you need to have different proxies. But otherwise in a VM, it's a matter of minutes, you install a few packages, a few configuration files and you have the whole stack available and it's run on Ubuntu, Redat, CentOS 6 or 7 if you want to use the Gen2 stack. We've tested it on Fedora, OpenSUSE even on Windows if you have Windows subsystem for Linux and yes, I think I'm probably out of time and that's all of the slides I had. Thank you very much, Maxime. My pleasure. Any more questions? I don't see any additional questions popping up. We do have hopefully some time for questions at the very end. If anyone has questions, please raise your hand and we can get back to them and I'll try to zip through the rest of the tutorial.