 Okay, good morning everyone. Thank you for coming. This is the HPC big data and data science the room organized by myself Kenneth and Roman who just arrived and Our first speaker for today is Kenneth and he's gonna tell us how to install software for scientists on a multi-user HPC system All right. Thank you Vaz What I'm gonna do this morning first of all, thank you all for being here this early I'm sure you had too much beer last evening. So thanks a lot for coming What I want to do is compare five different tools To get the scientific software installed on Supercomputers so these multi-user high-performance computing systems So first of all I wanted Look at the title of it so installing software for scientists so this can be quite challenging Scientific software has its its quirks. It's very different from normal False software where you can just run configure make make install and has a man page and everything is nicely documented That's not the case with scientific software. It's very very different Good software engineering practices are rather rare in my experience in Scientific software so the installation procedure is very non-standard. You have to figure things out from scratch like for every separate Application almost dependency how happens here, too So these are scientists writing software and a quote. I don't know what I got this quote But if we would if we would know what we are doing it wouldn't be called research That's what scientists do So they mostly care about the science then don't really care that much about the software They just want to get the science done and they need to Write software to get that done so most of these people are not trained software engineers Or system administrators. They just figure things out by themselves mostly and if it somehow works, they're already quite happy and Also on the flip side not everyone not every scientist develops their own software Sometimes they just want to use after that other scientists develop and they want to make it easily accessible to them So they don't want to install or compile it themselves. They barely know what a compiler is. They just want to get the thing working The super computing aspects of the multi-user high-performance computing Systems these are very different from a normal laptop in several ways Both in good and in bad ways, of course the main thing about supercomputers is that you get a lot of performance A lot of parallelism out of these things they can do a lot at the same time But also in bad ways, they are very different. They're not that easy to use especially not for scientists who most most scientists have never seen a terminal up close And then you give them this black window with only letters and you tell them you don't need your mouse anymore Yeah, people get confused There's also a wide variety of users this biologists chemists people doing like history studies of old newspapers and Everything in between So you need a lot of different software Very different software you need also multiple versions because some people want to use open from for some people won't use open from extend And there's lots of variants of the same thing and in ways that are important to them So you have to have everything installed and typically and there's there's other ways of doing this But on a lot of systems once you install software it stays there as long as the system is alive So scientists write papers and two years later They only revisit the paper or they end up writing a PhD that includes that paper and you want to redo some of the runs and Hopefully with the same software or at least as close as possible and of course performance is very important on supercomputers If you buy this big expensive hardware with a really good expensive network, you also want to actually use it properly and get the most out of the system So You don't want for individual scientists. It's good that their Runs are 10% faster with a little bit of effort, but for the administrators of the system 10% is huge because that's 10% more science, right? That's really what you want to aim for So some disclaimers here and I'm doing a comparison between five different tools, but I'm certainly biased So I'm the lead developer of one of these I Tried to make this a very objective comparison. I really tried hard to do that and I hope you'll agree with me that that sort of worked I'm very familiar with easy build the other four not so much But I spent hours in the last couple of weeks playing with these tools reading the documentation Installing software them to try and from earlier familiarize myself with the tools and yeah, that was a lot of time I spent since last Christmas on this There's certainly personal bias still there Retreat to minimize it and I sent my draft presentation to like the people behind these tools like tops Who's giving the next out on SPAC? Give some feedback Ludovic and the gibbick's community. So all these people I got some feedback and basically all tools are covered So I'm not going to tell Any serious lies anywhere unless they missed it in the draft So yeah, that was my attempt to make it very objective So I only have 20 minutes. We have to rush a bit. So I'll do 30 second introductions of each And I'll do them alphabetically So come on is a tool that works on limits macOS and windows with cross-platform Implementer in Python. This is really targeted towards end users towards scientists So it's very easy to install software at conda. Just do comma install something and it goes and does it's magic Basically what it does it pulls in binary packages or pre-built Packages from the anaconda cloud or whatever they call it This used to be Python specific, but they Have now support for C C++ for strength are basically anything out there. So it's no longer Python specific The packages themselves are written in YAML so YAML configuration format together with a build script So it's shell or a bat Depending on the OS And build packages with conda build which is typically not what users do. So this is what the people that prepare the packages do And they're hosted on GitHub and all that stuff. They support about 3500 different software packages and most of these or a lot of these are scientific software easy build It's pretty different compared to conda. It only Mostly works on Linux. So it's very focused on supercomputers It sort of works on Mac OS, but not really. We never tested there It's definitely not a target platform. It also works on Cray supercomputers. These are different enough from Linux. It's It's worth mentioning that it's also implemented in Python. This one is mainly targeted to Support teams of supercomputers so people that get installed requests from scientists have to install the software and then give it back to the Scientist so not people installing software for themselves and that's an important difference Easy build builds everything from source so it compiles everything from source. There's no binary packages that it pulls from somewhere And one of the key things about easy build is trying to get good performance for the installation So it targets the hardware that you're installing the software for and you'll see that's coming back in the talk Let's see it has these easy config files as recipes and it has Python modules Python scripts that have all the logic to install the software What else it generates these module files because that that's Being used on pretty much all supercomputers out there So people know this environment and it makes it relatively easy for the scientists to just access the software We have reasonably good support for site specific customizations to Installations and keeping track of that and today we support about a little bit over two thousand different software packages Then next this is again very different from the other two Cross platform, but at least on Linux and macOS and no windows Implemented in C++ and the next custom DSL they have the tagline it's a purely functional package manager, so it tries to Provide very reproducible builds of software. So it's very focused on that There's an operating system that uses the next package manager as well So don't confuse these two. It's two totally different things. You can use nix on a traditional Linux system You don't need to use nix OS if you don't want to very strong focus on reproducibility of Compilations even down to bitwise reproducibility So getting the exact same binary on two different systems through nix if you just ask it to do it to do the same installation And they do pretty crazy stuff like resetting timestamps to try and get to this bitwise reproducibility So they're very focused on this support atomic atomic upgrades of packages and rollbacks it By default it also downloads binary packages and if they are if it can't fight binary packages It will build from source. So that's not the combo It will just only install binaries while nix will build from source if it has to it has multi-user support As well so it uses the nix DSL for the package recipes and this has over 13,000 different software packages supported and that's not counting the 12,000 Haskell packages. So this is huge Of course a lot of this is not scientific software This is way broader than scientific software nix was not initially targeted at scientific software at all Then there's geeks or geeks HPC. There's a very good blog on Using geeks in an HPC context. That's mostly Ludovic and Ricardo and Piotr Write the stuff very good blog posts. So they're trying to explain how geeks can be used in this context where it has a lot of features that Make it really make sense in this context. This is GNU linux only implemented in a combination of scheme and C++ in my view This is mostly targeted to system administrators and experienced end users. So you I don't think you would give this to Let's say a normal scientist Maybe in the future you could The design is fairly similar to nix. It also has the geeks SD distribution. It works on can you heard if you want to use that? You can very strong focus on reproducibility It only supports free software. That's actually not really true, but it has some focus on that Same upgrades and rollbacks as nix does and over 6,500 packages And then spec is the last one in the row Also sort of cross-platform linux macOS and kray also implemented in python in my view and I know Todd disagrees with this It's mostly targeted towards software developers of big scientific software So they need to juggle lots of dependencies and it has a lot of flexibility for that It's similar to easy build in some ways, but different in in many other ways One of the main big differences is the flexibility. It gives you to Juggle dependencies. You can do very cool stuff like this You can tell spec install MPI leaks a particular version with a particular compiler and this dependency I want it locked to this particular version and then everything else that it needs back will figure out by itself That's certainly quite different from a diesel does All right, so the project comparison I want to fill in this table basically Well, you know compare all these aspects and I have this star rating system and the colors are pretty straightforward So releases I'll speed up a little bit here to avoid going out of time Coma has had many releases since 2012 You can quite easily install it with the shell script. So even a scientist that doesn't know how to use a terminal can do this It has self update support and it basically has very little dependencies It just chips with whatever it needs You don't need to do for installing or using conda, so that's important as well Easy build has been released since 2012 Before being developed in-house for three years It has a bootstrap script to install it or you can just pip install easy build It has self update support and it really relies on having environment modules there It doesn't work without environment modules. So you need this you need to have Python You need setup tools today and to get started. You need a C++ compiler on your system as well Here's also here no pseudo for installation or using of easy build so Normal users can install software without asking admins Nick's has been around for a very long time since 2004 Stable since 2012 so 1.0 has a Install script to install the binary release or if you want to you can build mix from source You have to get this build demon running because this is the thing that actually does the installations And this needs to run as root. So here you do need Sudo to install mix not to run it a normal user can talk to mix and get stuff installed And the installation is just done by the demon so the user is talking to the demon Self update support is there as well and dependencies. Well, not really unless you want to build it from source Then the stuff it needs has to be there It geeks so geeks is more recent on mix. It's no 1.0 yet. So still in data But it does a lot of the stuff that makes does to mix does to Let's just actually not true anymore. It says no installation script available. They fixed that Thanks to seeing the draft of my talk. So that triggered that that's good self-update support and have some minimal dependencies and then spec Handful of releases. Well, yeah since 2014 not not stable yet The installation here is a little bit different You just get clone the github repository and then you can get going straight away So you don't need to really install it. It is why I gave this a lower rating because It feels a bit off certainly if you give this to a scientist that like what this what is this good thing and For yeah, it feels a bit off to me. I know Todd disagrees here To updates packages do get pulled you pull in updates from the github repository and some basic dependencies So documentation comparing the documentation all of these have really good documentation I Gave easy build the lowest rating here because I know where the gaps are in the documentation that have to be filled in I'm sure there are gaps for the others as well, but I can't tell as an outsider. So Try to make this fair here So configuration so once you get the thing installed you need to maybe configure it with condo There's basically no configuration at all. You just and that's it That's very good with easy build you really have to do some configuration out of the box defaults Don't make sense for most people, but we can't come up with a better Default or at least I don't see a way It really really to tell it some things before it can get going Geeks and nicks I gave these a lower rating mainly because of the demon that you have to get running as user as Roots and you need to create build users So realize them to the build users with different permissions And also because both of these are locked to a specific location for the software to a new store and next door You can change it, but you're gonna suffer if you change it. So typically people don't And then SPAC has a very little configuration that you really need to do But if you want to you get a lot of options, so that's very good basic usage Yeah, come as very straightforward you install something you do this source activate of a script And then it's ready to go Yeah, this is creating the environment and this is the actual installation You can easily install other software versions and build packages yourself if you really have to So in terms of basic usage, I'm not making any distinction Everybody's getting three stars because it depends on what you want to do With easy build you search you get this easy config file You can use you do best as robot to make it install the dependencies and then you can load the module Installing other software versions is just changing easy configs and Or maybe writing some Python code yourself depends When Nick's fairly similar you can search for what it has Nick's ends to install things It links it in your profile. It's ready to go or if you really want to you can build your own software or software versions Geeks is pretty much the same just different commands and then SPAC you get all this flexibility of juggling dependencies and all that stuff And you can do SPAC load or you can load the models that it generates To install different software versions. You don't even have to dive into the Python yourself You can just tell it to install a new version and it will go and do what it can to make that happen Time to result. So how long does it take to install software with these tools all the things that do binary packages by default are very fast Seconds well with SPAC and easy build you're building from source So that's a big difference and with easy build you have to build the compiler Typically first before you can get to the software with SPAC can pick up the compiler in your OS So these are seconds to install this FFTW with easy build in SPAC It will take like an hour depending on whether you're going to pick up the system compiler with SPAC or not Performance I'll speed up a little bit here because I'm running out of time and basically want to get to this light So this is performance of an FFTW library with the five tools. This is second So lower is better. I see if you do binary packages if you install these these are generically built So you're not really using the hardware very well in this Set up SPAC and easy build compared from source so these can do a lot better They can target the hardware today SPAC doesn't do this out of the box So which is why it's pretty much at the same level as the binary ones it builds Generically by default, but you can easily tell it to do different easy build really targets the hardware you are all So that's why it's getting better performance Now I'm sorry thought for this, but when I was testing this I ran into an issue with SPAC So SPAC was because of a bug was building with minus or zero so it was quite slow This was fixed in hours my thoughts and it was a quick release They were just not aware of this, but it was something I ran into well while testing So this is definitely not the norm for SPAC don't get me wrong Lot of stuff we didn't cover I won't go into this too much each of these have their key features that are very important So definitely take a look and try to deep dive into all of these before you make a selection This is the summary of the table. So who won well it depends depends on what you want to do It depends on who you are depends on how much experience you have yourself whether you're who of these three you are Really depends on the use case your experience your profile what you actually want to do Do you care about performance or quick installations and so on? and Then one one cool thing was this talk because I sent a draft to a lot of people It told me making waves and stuff happened before I did actually the talk So SPAC had a bug fix release because I ran into the O zero issue that was fixed in hours. So that was really good I was I figured I was doing something wrong and it turned out I found a bug in a different project. So that was cool And it was fixed really quickly There's an easy installation script for geeks now that was it was sort of there But not very clean and they finished it because I was complaining about the installation Which was very manual. So that's good And there's a very good blog post by Ludovic who's giving a talk later today as well on the performance aspect of geeks and Basically the the possibilities they have to get rid of this performance issue Definitely read the blog post. It's very good. It has a lot of technical references to all the things you can do Other build tools, which I'm not gonna cover here I didn't cover most of these because they have a lot less focus on scientific software So I haven't seen people using these on supercomputers to install Software for other people and I didn't cover singularity and new docker. So these container things for HPC, which is now Happening as well. But if you're interested in that, you should definitely take a look at these two projects, especially singularity That's it any questions Yes, oh wait, you should take the mic for the questions. Where's the mic? It's for the recording. So you should take the mic or I can repeat the question. Go ahead The performance lie I'll take the one without the red bar to keep it fair These are not using evx, which is why they're getting bad performance They build generic packages so that you can install them and non AVX hardware It's packed because it doesn't do it by default, but you can tell it to Because here you're compiling from source So you can actually tell it to target your hardware and then it will generate AVX and it will be at the same level as easy build It just doesn't do it out of the box So I was showing you what you get by default. It basically has a warning slide like be careful You can do a lot better with SPAC very easily, but you need to be aware. Maybe if you're just assuming It's not gonna happen And for this aspect you should read Ludovic's blog post because there are ways to get around this You can still ship binary packages and get good performance. They're just not there yet. It's not happening right now, but you can Sorry easy build right. So if there's new like Intel Skylake has AVX 512 We don't really need to change easy build for this because the compiler does it We just tell the compiler build for this thing and if the compiler can do it, then it's fine Yeah, so we easy build is just telling fcw build for AVX By default that's the only difference. You can do the same thing with SPAC here It's harder because you have to rebuild the packages, which is let's say not that common or certainly in the geeks thing But yeah, Ludovic and You should talk offline with Ludovic about this because they really know how to fix this, but it's not there yet today All right So why didn't I cover DBM only supporting certain Linux distributions? That's not really there I mean the things that work work for basically any There's certainly details there. Yeah, but in general it should work on any Linux distro each of these tools you mean CPU I So I know some people using easy build on ARM for example, I don't know about SPAC SPAC has good support on arm and power Yeah Then this was on power Arm so Oak Ridge Oak Ridge has been using SPAC on our network so really well. This is mostly So it's Python so Python runs anywhere as long as you have a compiler that There are some details. Yeah, but as long as your compiler can handle your architecture, you're fine No, well, if you disagree we can talk about this, but on a high level. There's shouldn't be an issue There's certainly detail it if you try easy build of SPAC out of the box on Something sexy like arm or the new power you you may see more issues than on x86. That's true But in principle it should work and it's just yeah All right. Yeah Yeah, last question Yeah Yeah, not necessarily as admin for many of these the user can do it themselves Yeah For for most of these and once also for nicks and geeks once the admin has provided nicks on the system and geeks Users can install their own software Not necessarily it can be for users It's also for admins to install software for the users because they don't know how to do it Yeah, you can do a combination of both He can't yeah And I think on that as a bad fit for providing software to other people you wouldn't use it for that Usable using themselves the other two is like SPAC and easy build are more targeted to work towards Yeah, but they will do it anyway, and then they Don't need permission so you cannot you cannot you cannot avoid them Sure Some people tell them tell their users to use condo. I know yeah There's There's all kinds of details there are also for the generic aspect so with geeks and nicks There is a way out with color. I'm not sure there's a way out for to fix the performance No But with condo users can install anywhere they can store in slash TMP and they can store in their home directory Sorry to interrupt you can yeah, we should take it outside. Thank you. Thanks everybody