 Okay, so our next speaker is Bart Oldman, and he's going to tell us about their setup at Compute Canada. Welcome. Thank you. So welcome everybody. My name is Bart Oldman, and I work as an HPC analyst at McGill HPC Center from McGill University, which is collaborating inside Canada and Calcule, Quebec, and then Compute Canada as basically the collaboration of all academic supercomputers in Canada. So what I'm going to present is how do we deal with software on our supercomputers? And we combine various technologies, software solutions that have been covered already before, particularly in Canada's talk. And they're all open source, so it fits really well in FOSTA and all these four things, CVMFest, Nix, Elmod, and EasyBuild, they're all present on GitHub actually. So what we're going to in Canada is a bit of an upheaval of our supercomputer infrastructure. It's kind of the motivation that we're moving from a system where pretty much every research university had their own little compute cluster, and some had bigger ones, to a more national setup where there are basically four bigger systems and one cloud system. And the four bigger systems, Simon Fraser University, Cedar University Waterloo in Ontario with Crayham, Niagara Arts University of Toronto, and a new system in Quebec, which is called Beluga. So just to give you an idea of the scale of the system, the Cedar system in Vancouver has about 900 compute nodes, going to be extended to 625 nodes and also about 500 TPUs. So that together gives about 30,000 total compute cores to work with. The Crayham cluster is quite similar, it has a few more compute cores, fewer TPUs, about 35,000. And the new system in Toronto will have 60,000 cores. So these are, in the HPC world, they're kind of, they're not top systems in terms of the top 500s. You won't find these in the top 10, but you will find these in the top 100 or somewhere. So what we're trying to present to our users is, when they log into one of these new clusters, they see a similar software environment. And because many of these new systems are actually pretty much every supercomputer in the world is running Linux by now. So you'd think, OK, well, they're all running Linux or they get a similar interface. Well, in practice, that's not the case. Whenever you go to another supercomputer, you'll find that there are very many minor differences in how to operate them. They're not very portable between each other. So we're trying to avoid that by distributing the software that's used on our clusters. And combining four different solutions. One is a distributed file system called CVMFS. One is a package manager called Nix to try to have some base software setup that's independent of the underlying Linux distribution. And we have easy build to install scientific packages. And then there's something called a module tool, Lmod. So I'll get to in a minute. So the background is really, if you log into an HPC cluster, you will find that it's run some typically in almost all cases, an enterprise Linux distribution. And in most cases, that's either CentOS or Red Hat Enterprise. Sometimes you may see a Susie cluster, even Rare is in the Ubuntu cluster, but they typically don't run Fedora. And one of the troubles with that is that they tend to be relative dinosaurs in the open source world. There are very many clusters who run Red Hat 6, which has a 2.6 Linux kernel, Tc4.4, Tlipc2.12, Python 2.6. So that is really like you think, ah, I'm going a time machine, I'm going 10 years back. But that's just the way they work. It's for mostly for vendor support. Fenders will say, I have a driver for this particular system and they typically include all the enterprise distributions. And also Fedora has two new releases a year. You don't want to upgrade your OS on the cluster twice a year. It's just way too much work. So nowadays, production clusters of a couple years old, they tend to have 6. The newer ones tend to have 7. But users, some of which may run Linux at home or at their office, want something newer and they may run Fedora 27 or some Ubuntu or you name it. So they read instructions. They try to run their software which is not included as an RPM or whatever in the underlying Linux distribution and they read the documentation. And documentation has a little section about Linux. They go to that section and the section says, oh, if you want to run the Linux, it's simple. Just do sudo apt-get install Python 2.7 dev to get Python 2.7. Because as I mentioned, Python 2.7 is not installed by default on Red Hat Enterprise 6. So they try to do that command. They get a little lecture about sudo. We trust you have received the usual lecture, et cetera. It boils down to free things. And then it asks them for the password. Well, they have a password. So they type the password. And then it says apt-get command as found. Ah, why is apt-get not there? Oh, I'm running Red Hat. This is not Ubuntu. OK, but Red Hat has UM installed. Oh, OK, so sorry, user. It's not allowed to execute this. Sometimes they get something meaner. They say, you are not allowed to do this. This incident will be reported. It depends on the version of sudo. I don't know which one does what. So they send us an email. Either they say, can I please have root access in your cluster? Or they say, I'm so sorry for typing sudo. Now this incident will be reported. Please don't report me to the FBI and the CIA. So what we do to alleviate this problem is modules. So what does a module do? So what we do is we install many commonly used software packages in a location somewhere on the shared file system, which is not in such users as been or some other OS controlled location. And we call that a module file. And this module file has been around for a good 20 years or so. And you'll find them on pretty much any HTC cluster. So what you do, this is the old TCL module file syntax. It's that you write a little file that you put in a central location and you specify where is the boot of where and so the software like the prefix that I used to do configure. And in this case, this is for Python 279. This is an existing module for our old system. And then you put a few guidelines in that module file and then the module command, which is called module load, Python slash 279 will modify these four environment variables, man path, path, LD library path, and C path so that once they load the module, they can execute that software by just typing Python and they get Python 279 instead of Python 266. Same for C path and LD library path. This way they can link to development libraries from Python. Nice thing about modules, modules can be unloaded so then the module unload will say, hey, oh yeah, the Python module has added these paths so now I unload so it will be removed from the path and I'm back to my original situation. Because of multiple versions, you have Python 362 installed as well this way and then you can do module load Python 362 and it will take prison so for the order of Python. So the thing with modules, how they can install in the past mostly is by hand and this gives some advantages as you can see on the right. This is something, some idea I cut from Kenneth. This is a nice little comic strip and Kenneth with us up and other people at, he wasn't the one who started it, but basically they came up with a solution as to why does every cluster in the world write their own module files and installed by hand. There's a lot of duplicated effort. Let's work together and get an automated solution that basically takes the recipe, installs the software, creates the module file and you're all set to go. Then you can contribute the recipe back and others can use that knowledge. However, the problem is then if you adopt that you are no longer invaluable so it may not be the best for your job security. So at Compute Canada we took that a step further and we're not only installing a few scientific packages via modules, we basically replace the whole user stack. So we have the bottom layer. This is basically CentOS 6 or 7. It has the OS, kernel, demons, anything that's privileged. We don't touch these. We basically replace anything that's in the user space that is not privileged so it doesn't get too many security issues either and we install many base packages like GNU, Lipsy, Auto Tools, Make, Bash, Cat, LS, All, Crap, Fine, you name it. They're all installed using Nix, the package manager and then accessed via a specific Nix profile like some sort of prefix with a sim link force behind it that we put over here in the CVMFS shared file system location. There are a few custom profiles when there's multiple versions of the same tool that we want to make accessible and then we have many things to install via EasyBuild. There's usually scientific applications, high-level applications that depend on MPI that run in parallel. Yeah. We have to bootstrap Nix because it has a non-standard prefix. Otherwise it would start with slash Nix so we cannot use any of the binary cache that Nix provides. We have to do everything from source. The advantage is then that we can have multiple architectures. Not all of our clusters have the same architecture of optimal performance. We have multiple trees depending on the architecture which is not the case for the Nix layer. All these three layers are architecture independent. They're all x86 but we don't have any other clusters and we don't plan on that either. The distributed file system is something that comes from CERN, originates from higher-energy physics. They have a lot of software to process their collision experiments which are then treated in data centers around the world. They've been used to distributing software for quite a while already and they use the distributed file system that works on top of the FUSI plug-in, so file systems in user space. It works quite well. There's a lot of redundancy built in for our distribution layer with multiple caches. The idea is here, we build our software on a build node, then we push it to a central distribution node which is called the stratum zero. That in turn replicates it into a set of stratum one nodes which in Canada we have two, three of them, they get in turn replicated on the squid node which is local to the cluster and then the client node can mount that file system and even if some of these stratum ones go down, you still have access to your files because of the multiple caching. What you need basically to mount these systems is a public key and then once you have the public key you can mount the file systems and you can use the software. There's a special restricted repository too with some commercial things where people have a very specific, what we call LDAP structure, a very specific UID mapping you can see. The other tool is Nix. It was briefly covered in Kennet's talk at 9 a.m. This is how we can actually provide a newer user space than what's provided by CentOS. We're actually tracking the September 2016 release at some point we can upgrade it, but we can upgrade it. It's controlled by us instead of whatever Red Hat does. So we have a fairly new G-Lip C and TCC and core-util stack and so people will see all the newest bells and whistles for the most part in the default stack. So that's in the path when people log in. The next tool we use is EasyBuild. This is for the more scientifically oriented modules and Elmot, which is a module tool much like I presented earlier is that we have environment modules and older solution and then Elmot is a re-implementation of modules in Elmot in Lua, that's why it's called Elmot. Elmot doesn't have a shiny logo, but it comes from TAC and it's Texas Advanced Computing Center. It's Robert McClay who works there, so I just put the TAC logo here. So we use Elmot for the software hierarchy. It's nice because there are multiple implementations of MPI to do parallel programming on a cluster of multiple nodes and if you don't do a module hierarchy, you get many modules you get, you get for instance a popular parallel package called Chromax. You might have a Chromax for OpenMPI, a Chromax for MVAPH2, a Chromax for Intel MPI and they all show up in your module listing. With one module for every MPI flavor and multiple versions of the MPI flavor, it quickly gets out of hand so that's why we have this hierarchy. Same for compilers and multiple compiler versions. So if you have say two different compilers, three different compiler versions and three different MPIs, you get to three times three times two, 18 different modules for the same software package and you don't want to have them visible by default. So conceptually what we do is that we fork some git repositories ourselves. We have the nix package for nix and then easy build is free components, framework, easy blocks and easy configs. The framework is the high level Python scripts for easy build. Easy blocks is basically a Python script that says how do I compile it and easy configs are recipes that basically say what parameters do I use to compile it, configure parameters, where do I want to put it, these little descriptions. So what we do in our team, we figure out if something's going to mix or easy build in general, if things are not performance critical and are more base, more boring dependencies, they're typically also provided via RPMs or debt packages and Linux distributions, we install them via nix. If it's something that is performance critical, it depends on MPI, we put in easy build. Then we build it on the build node using nix and for EBs, the utility installed, we test it there, we have a testing development repository, we test it there and if everything goes well, we push it to production and all the users will see it after about 15 minutes. So it's again an idea about the difference. There's a gray area between nix and easy build. Easy build is focusing on HPC, nix is focusing on general software, there's some overlap and these are all packages that are provided by easy build as recipes, but we found them to be a bit too low level and typically we don't expose these as modules to the users because something like package config, we say okay it's provided by default and typically the latest version is okay and we don't really need to provide multiple versions of package config. There's some things we may want to move like new plot is one where we may want to provide multiple versions. Things like auto make, unfortunately sometimes need multiple versions of those because of compatibility issues. But even there we can create a module around a specific nix version. So mostly easy build provide recipes for those because the Red Hat development RPMs were too old and we work around that by providing modules but we ourselves work around it by using nix. Another example of packaging that we do slightly out of this is that we provide Python wheels. So often people want to use pip install some package and say how do we do this? Some people insist on using Anaconda, we also accommodate for that but this is just the standard Python setup and in order to get optimized Python packages like NumP and TensorFlow and things like that, we provide our own wheelhouse so that when people do pip install some package in a virtual environment the package gets downloaded from this particular wheelhouse instead of going to the worldwide web and downloading some random binary that may not work so well in our cluster. So some of the statistics. So at the moment we have a monotically increasing list of module that we install at the moment we are about 500 software packages and versions that we make available for users in the scientific domain and there's many more that have other architectures as well. Sometimes we need to install something from multiple architectures and we do that mostly automatically. So to see what scientific tools do we install is the kind of idea of the distribution. There's a huge amount of bioinformatics software out there there's just many, many, many different tools. We install them mostly using EasyBuild. At the moment we have about 145, there's probably another 145 coming and you see that other domains typically have fewer applications that many people use. This is in chemistry, molecular dynamics, you have chromax, you have NAMD, a couple of others. We also keep track of module load so we can say okay who's using what software? This is a Kravana interface that basically says we can see okay, do people use a certain module and if they're no longer doing it maybe it's time to retire that module or if we say okay maybe that module is too old is anybody still using it? Okay, time's up. So we have a couple of challenges mostly related to the non-standard path. If you have any questions I can let you know but we do a lot of work around for that including a huge module file just to make sure that the software really doesn't look as less user and that's basically. The other challenge is to hide the NICS store path with all their hashes to the users because if they leak into the user's environment which may not use NICS themselves and we do a garbage collect then stuff stops working so this is the challenge that we combine NICS with something that is not NICS. Internally it's all super consistent but when you combine with something else you may have funny things happening. So I have a lot of people to thank. My colleagues at Compute Canada Maxime Wassonot who leads the research support national team some NICS experts that we have and some of my local colleagues and people who maintain easy builds. My people at the University of Ghent, Ulrich and Robert Smeed also. Thank you. Thank you. I think we can have one quick question. Yeah. What do you use to maintain Linux installations on the compute nodes? The Linux installation on the compute nodes they're typically proficient. I think we're using Puppet at the moment. Puppet, they used to use Xcat on older classes in IBM solution. So these are typically proficient. They're all automated. You provide a fixed recipe with a bunch of postscripts that alter the installation and so you have an identical installation on those compute nodes. Well, we basically, any time anybody loads a module it gets logged, reparts the logs. One question out back. Yes, we do. Can you please repeat the question? Yes. The question is do you have optimized Python packages? Yes. We have a generic wheelhouse and we have AVX tool, optimized wheelhouse, et cetera. Thank you very much. Thanks.