 So I'll start by thanking Peter and Kenneth, who are the ones who actually implemented all the back end for this work. What I did and I'm presenting here is just the deployment of this and the testing of the framework. So big thanks to Kenneth and Peter and also all the team at CSS who work on this project. So we do claim we have reproducible HPC software installations, though geeks are saying that we are not giving that. We do think that we can achieve reproducibility, so just one more that I think we do have problems with bootstrapping. Sometimes when we move from one operational system to the other, but then that's where the dockers and containers, they can take a hook-in, but after that I can assure you that we can have reproducibility with EasyBuild, and that's why we chose EasyBuild and we are going to present. So this is the outline of the talk. I will first start with the background. I think if you're here you already know how hard this installing software on HPC, so I won't spend much time explaining all the troubles that we ran into. Then I will talk about the EasyBuild framework implementation on Cray. So this is the work from Kenneth and Peter, and the last part I will present two use cases. So all the two deployments on our main systems at CSS with EasyBuild have many other systems. We don't have time to talk about them here, so I will focus on two systems that are Cray-based and how we manage to solve the installation problem with EasyBuild. So in the end I will also show some integration we have now with GitHub for doing continuous integration, for testing the build software and also for archiving the recipes. So the problem of building HPC software in HPC, as I already mentioned, I think it's known from you. So we had the same problem on Cray systems because users, we have many requests from users and they have requirements of software that depend on specific versions. So you cannot just do a ticket install on the software that they need. They depend really on specific versions and these versions need to coexist. So this is a huge problem for the teams maintaining the software and not always these people using the HPC systems. They are scientists, so they are not developers or CIS admins. They don't have all the background for installing software, so they need help from people with this background and with experiencing installing system. Also the people developing the scientific software, they are not formed on computer sciences or development. So when they produce new packages, they often don't use correctly the build tools, just to name a few of the problems we see when using scientific software that we don't see in common, let's say software outside the scientific world. So we have incomplete build procedure that you don't have configured or installed and you have to tweak the files manually. Then this is my favorite interactive installation script. This is really the thing that you shouldn't do when you're packaging a software. You should really provide a way where people can install it automatically and not answering questions, which is your favorite color. Please install it here or not. So please don't do that. Automation is not a new thing. So we should think that there are people working on installing the software at full time. So then the other things are missing documentation and also the dependency held. These are slides from Kenneth that I'm just quoting here. So the dependency held. We can see many software packages that you can have up to 40 or 50 dependencies in only one software. And then if you need to upgrade, then good luck. So this is the big picture. So we have a problem. So the request for HPC user is always growing and unfortunately the quality of packaging this software is not improving. So we have solutions out there for installing regular software but not specific for HPC. So we have the speaker before just mentioned about spec. And we have also easy build but the other tools, they don't really focus on HPC. So the impact of the lack of these tools is that when they have the request for new software, the researchers, they spend a lot of time waiting and then we also spend a lot of time as HPC staff trying to fix that. The other main problem is that it has been already mentioned here is the very little collaboration among HPC sites. So everyone is doing their own builds on their own sites and then people are solving the same problem everywhere. And this is not good in the sense that we don't have a common tool or a forum for sharing. So some sites they do work together but there's no common ground for describing a build. So people are basically redoing the same stuff on all sites and there's very little collaboration. So this is also true on Cray systems. So that's why the way we are just moving, it's just a new case, the Cray system for the easy build and the HPC build software problem. So we have the same problem on Cray. So just one slide on easy build for those who do not know yet is the framework for installing a specifically scientific software. So it's not for every kind of software so it's focused and then the idea is to group all those people who have this experience in HPC and to know how of these people they should be in a single place so people can reuse. That's the advantage when someone spends hours and hours preparing a build recipe then someone else can just take this recipe and reuse all this time that someone who's an expert has an expert spend worked and then you can just reuse this to your local case. So it's based in Python. It started at Ghent University in 2009 and it is open source since 2012. Now we have community, we have a stable version that is released every two weeks. So from my experience we can always use the new version and I don't say we had zero regression so far but every time we had it was very small and fixed very quickly. So it is something that you can really use in production even for large scale systems. This is my word as a group lead of scientific computing support at CSCS. We have more than 500 users and we have easy building production since more than one year. So there are many well-known scientific software that are already included. You might not find everything but every package manager is giving the figures with how many software you have but the thing that you need to do is go there and see how many of your software you can find it. So I encourage you to go to the website and check if you find your software there and then you can have an idea if it's useful for your use case or not. So the main features of EasyBuild, you have autonomous building and installing of the software. You also have the logging so you don't need to care about saving the output. You're going to find the logs of the installation somewhere and you don't need to worry about that. You have archiving of the build specifications so every time you build a recipe you have a copy of this recipe somewhere so that you know that you can redo it. So this is one step for achieving reproducibility. It's highly configurable on command lines or files. It's up to you or site-wide configurations. It is dynamically extendable so you have the recipes but you can write your own so you can just extend existing classes and then write the things that are specific to your software. So it is tested and it is actively developed so you can just go to GitHub to see that. There's just a graphic to show the community that is sane so it is growing so for the moment it's growing. I think by the moment when it stops growing you're going just to show the figure, the numbers and not the graphs so for the moment you can show the graphs it is growing and you can just look at the mailing list and you can see that it's very reactive and also on GitHub so usually there aren't people who have problems that remain open. Typically people have to find a solution when they need to contact the community. So just one slide and I will try to avoid getting too technical here because I think it's better to discuss the ideas here. So just I will present quickly the terminology of EasyBuilds so the EasyBuilds framework is the part that takes care of installing, downloading the packages, creating the module files and provides all the functionalities that are common let's say for all the builds, for all the software. Then we have EasyBlocks that can be specific for a software or mostly they are generic so you have an EasyBlock for applications that use ConfigureMake for example, you have one EasyBlock for that and then depending on the application you might need one as well but mostly you use only the EasyConfig file which is a recipe that is not generic, you have the version and that's one of the key ideas of EasyBuild that you have the versions of the software everywhere in a recipe so that you know if someone else takes this recipe and tries to rebuild it's going to use exactly the same versions while other software packages like EasyBuild they are more open, more flexible but then there's no guarantee that you're going to use the same versions and that you're going to achieve reproducibility. Less concept is the tool chain which is very important is the base for any EasyConfig file it's actually the compiler and also the basic libraries that are used so typically it's MPI, BLAS, LaPack, ScalaPack and the basic things that we see that is used on HPC so it's called the tool chain and it's grouped together so it's done in a way that it is a base tool chain for building the software so I move to the next part now which is the implementation for Cray so what was missing to use on the Cray system and why so this is a typical example of an EasyConfig file what do we have inside for a build for example this is one software called GMP it relies on this tool chain this is a tool chain which has GCC, OpenMPI, OpenBLAS, O1 and it uses the configure make easy block and then typically with EasyBuild you are going to rebuild everything from scratch you are going to rebuild all the tool chain that you need because we don't want to use the things that you have already available on the system because we might not achieve reproducibility and then the difference on Cray is that the programming environment it's already there so since it's provided by Cray in this case we do want to reuse the existing software because they have the optimized version of scientific libraries and also MPI that is optimized for the interconnect so in the case of Cray we do want to reuse these libraries these compilers and so on so we just created actually Peter and Kenneth created the easy block and the tool chain for Gnu for a Cray, for Gnu compilers Intel and the others mapping the programming environment available on Cray to EasyBuild so that we can use the Cray tool chain as any other tool chain that it was already existing on EasyBuild so the three main features that had to be implemented are the support for external module files then the definition of the tool chains and the custom easy block I want to thank again Peter and Kenneth but I won't go into details this implementation because we don't have time here so this is a key feature for the Cray support so the support for external module files so before EasyBuild was creating and building everything from scratch and creating the module files here on Cray we need to reuse these module files so there's the support for a file where you map the existing modules to a way that EasyBuild can read so this is useful for, it's needed for Cray but it can also be used elsewhere if you have already modules that you want to reuse so also the easy block I will speed up a little bit here because of the time so the easy block are specific for the Cray tool chains you can go on GitHub if you want to see the details then here is just to mention that we have one tool chain for each programming environment available on Cray that we automatically map the variables that you need for building software on the Cray because we have our wrappers on Cray we don't use GCC or the compilers directly so we use the Cray compilers I'll move now for the last part I would say it's more interesting from my side because it's where we managed to use all this infrastructure to deploy software on production so the two use cases that I'm going to present here it's first the machine from MetalSwiss which is the weather forecast service of Switzerland they have production systems it's two cabinets with production and failover with a very GPU dense system with 8 k80s per node so 16 GPUs per node and this is a CS storm series so I explain what we had in the Cray programming environment typically this machine is already different from the rest this is a new series that they provide only partial support for the programming environment so they provide only programming environment Cray and not GNU and Intel so we had to rebuild our software from scratch in this case even the GCC that they provided it was not able to compile AVX instructions so they give you the hardware and they don't provide you a working compiler so we opened a bug and meanwhile they took a long time to give an answer and then meanwhile we rebuilt everything with EasyBuild it took us a couple of weeks and now the software stack is in production since more than one year our main use case I move on to Pitsdient which is our main flagship system so it's a GPU based system it has two partitions so here it's our largest partition with the Tesla Pascal P100 GPUs and then we have the Brodo partition so according to top 500 is number 8 fastest supercomputing in the world and according to the green 500 it's the second most efficient supercomputing with respect to energy consumption so this is the list of software that you can find for Cray on the stock EasyBuild repository meaning that we have already contributed back and we also have our GitHub repository where we have our recipes and then here is just the list if you want to find look for your software so here it's just to mention that we have this GitHub repository where people can also open pull requests and contribute back it's for Pitsdient specifically we have also automatic checking of the recipes using the GitHub pull requests builder plugins so we check everything before merging to the master we check everything and then the last thing is that we have the autonomous deployment of the software on the system so the final comments here is that proprietary and open source software can coexist so we had the case where we use open source in order to better exploit our system we had available and in this case we had the best of two words because we have an optimized software stack and we also had support from both community and from the vendor and also minimizes the risks of vendor locking meaning that we have an alternative it was the case when we had a problem with the software shipped by the vendor we managed just to rebuild everything with open source and then actually don't need the software their software stack anymore in that case so it's the best of two words as I said okay one slides with links that will be available on the web and then we have time for questions thank you very much thank you the army questions time for a couple of questions and regarding vendor locking how many create provided applications in the last day so regarding vendor locking how many create provided applications we replaced for the CS storm series we replaced all the software stacks so meaning that we don't count on them only for the operational system and then we replaced from MPI GCC up to all the things that Metal Swiss needed for post processing all the data so I would say here at least 20 software for all the net CDF HDF 5 all of this came from from easy view then we didn't have to change much all the defaults work for them so I would say 20 software in this case and then in the future maybe for the XC series where we might start looking in also using the open source only and then we know that other place like tech they just dropped all the software stack from create so they use Intel compilers and MPI and then with easy view we could think about doing the same the same in the future so this now we have the possibility more questions I have a question how is this being received by create itself because it seems like we are giving the signal to create that they're not doing a good enough job in terms of and you have to put something on top to actually make it feasible so for the for the CS storm series the feeling that I have is that they don't really care it's not important for them it's the series that they are focusing on the it's the low end not harder but the version that they for the support they don't really care that's my feeling as for XC they not only because of the efforts on easy build but from the complaints that we have open tickets that took so much so much time to resolve then I think they realize they are not doing a good job but they don't give any feedback to us so but they know because we have people inside from create they know that we are using less and less their software every time we have an error before or at the same time we open a ticket we build software with easy build so if they don't realize that if we continue like this we might not needing their software anymore it's bad for them but then it's it's the free market so that's so okay that's it thank you