 So welcome to my talk about open HPC. So my name is Adren Rebo. I work for RETED. I'm, actually I'm in the core kernel team and I'm working on checkpoint restore container migration, but because I was doing checkpoint restore before joining RETED in HPC, I'm also doing now I'm RETED's representative in the open HPC project. And I want to give an overview here what open HPC is, what it does, why it exists. So my agenda is I first want to give a high level view why and what open HPC is. The next step I want to say why open HPC exists. I guess most people know the problems which open HPC tries to solve. And then I want to talk about again what open HPC is. And I want to focus more on how the project is set up and what goals are. And then I want to talk again what open HPC is and giving details about what it actually includes and what you can take out of it. And if there's some time at the end, I want to talk about what we are currently discussing in our meetings. So what is open HPC? If you just look at it at a first glance, it seems to be a software repository. There are a bunch of RPMs. You can easily install on your system and you can install it. You can install those RPMs using YAM or Zipper, which means it supports right now CentOS 7 and slash 12. It's for X8664 architecture and the ARM64 architecture. The actual users are mostly, as expected, not on ARM, but on the X86 architecture. And that's what open HPC is, if you have a first look at it, at least what I saw. So the topic, why open HPC, why does it exist? This is probably one of the same reasons why also easy build exists. So in HPC environment, you have multiple compilers, you need to install on your system. You have multiple MPIs you need to install on your system in multiple versions. And you have three different compilers from three different vendors and two versions each. You already have like six different compilers. And if you combine this with the MPIs, you have a lot of almost same software you need to install, almost the same steps. And this is something which many HPC sites actually do today. So open HPC was the idea to kind of solve those things in a collaborative way. Back to my point, what is open HPC about the project, how it's set up? It's a community effort to reduce those mentioned duplications which each HPC site has to solve in some kind of way, whatever they use to do it. And when the project was informed, there was a mission and a vision statement. This is the vision statement. It's like a vision statement, it has to be it's visionary. It tries to formulate somehow that through collaboration, open HPC wants to help to have easier setup of an HPC system, easier administration, easier starting if you're completely new to the topic also. And the mission statement, I think it has some clearer goals. It's a collection of open source software in a repository which is easy to install. And in addition to the software it provides, it doesn't want just to provide software but also best practices, ideas, how you could install or manage your system. The open HPC project is a Linux Foundation project. I think these are all current members, maybe some more, it changes all the time. There are new members all the time. It's a combination of industry partners who are providing tools, software, hardware for HPC system as well as universities which are all bringing their knowledge together to formulate what open HPC wants to do. And I think the governing structure of open HPC is similar to what a lot of Linux Foundation projects are doing, we have a governing board, we have a technical steering committee which has a meeting each week or every second week depends if it's necessary. And in the TSC meeting we discuss what we want to change, what we need to change. And this is the committee where we try to find how to move forward. Maybe this is interesting for some of you, the membership is free for academic partners so if you're interested in joining open HPC you can talk to us and we're happy to collaborate with any more partners which are interested in open HPC. So this is the list of the current TSC members, so from a lot of those institutions or companies there's someone involved somehow in open HPC and one important thing which I think I, which needs to be pointed out is that open HPC doesn't force you what you have to install and how you have to install your system, it's about building blocks. You can pick and choose what you want and install the software the way you want it on your system. It doesn't, there are certain parts which you have to do if you want to use open HPC but it tries to be as open and free as possible so you can adapt it to your local needs. So you can either install the binary RPMs, you can modify those RPMs for your local needs, you can use any of the documentation in any way you want so it doesn't force a special setup on you so this is one of the important parts of open HPC I think. Some information about the project history. The first discussions about open HPC were held at IC 2015 and the first release of open HPC which is probably somewhere here was at Supercomputing 2015 and since then there are continuous new releases every quarter and each release contains more content than the previous release. Currently the latest release which was released at Supercomputing 2018 is 1.3.6 and we are currently working on 1.3.7. This is the number of components. So it grows slowly, there's not a large number of software packages we have but it's from our point of view, it's a basic software set which makes sense from our point of view to get an HPC system running. This is an overview of how much changed between the releases so changed means added or updated so it's most of the time around 30% I would say. Some statistics, how many visitors per month the build system gets. The package repository access we see per month unique visitors. So you can see they usually switch to the new versions our user as soon there's a new version. Now the latest statistics somehow some one started for a large number of people started to using the old release. We are not sure yet why this happens but more visitor statistics and this is the data downloaded from our build system and then back to my thing what is open HPC now in detail what it is. Like I mentioned it's a software repository and it includes different packages. The basis of all is ELMOD which we are using for the environment modules. The reason why we are using the Lua-based implementation is that as far as I know it is developed at tech and the Texas Advanced Computing Center and tech is part of open HPC and that's how the connection is between ELMOD and open HPC. It has advantages for the open HPC project because we have really close contact with the developers of ELMOD. So there's a close exchange and they can easily help us fix things or provide new features which open HPC needs for the software management. Then it includes provisioning tools like werewolf, there's XCAT, then there's monitoring, Nagios, Ganglia, then we have resource managers, KBS Pro, Slurm, Compilers of course, GCC, LLVM. There's support for the Intel Compilers so we don't distribute the Intel Compiler of course, you have to have it on your system already but if you have it, open HPC will detect it and all packages which open HPC provides are also available compiled with the Intel Compiler. So if you have locally the Intel Compiler installed you get the complete open HPC stack compiled with the Intel Compiler. The same goes for the ARM HPC Compiler which will be probably part of the next release 1.3.7. It's also, you have to buy it from ARM directly. Then we have the usual MPIs and then we have spec for more software, easy build for more software. This is one point where we know we cannot provide the same amount of software that easy build and spec can so we stop at some level where we say we want to provide a certain amount of packages which people can use to install a kind of minimal system or at least an HPC system which gets you running and then on top of it, there can be other tools which can complete the full software stack which we cannot in our group provide with the necessary, I don't know, bug fixes or make sure it actually works good enough but therefore our community is not big enough and the people involved right now. We have a few container run times included, Charlie Cloud, Singularity, then there are file systems clients, the last reclined and BGFS clients are part of it and lots of the usual libraries which you see and on an HPC system but it's important to know that OpenHPC is not just a software repository. It tries, one of the mission or vision statement includes that it wants to not just provide software but also best practices and OpenHPC comes with, in my view, excellent documentation for an open source project. This was one of the points when I first looked at it. It's really documented completely and documentation, I mean, it's the documentation how you would set up your cluster and the documentation is for every combination of provisioner, resource manager, operating system, CPU architecture, so for all those combination you get a different documentation and when I say including recipes, so part of the documentation is all the commands you have to type in into your shell to get your cluster running, to install the software to provision all the clients and the recipes are basically just all those shell commands put into one large shell script and you can run it and hopefully you have a working HPC system at the end. And also one thing I like about the project is with every release all those documentation steps are actually tested. We have a bare metal cluster where we install all our software packages and the operating system like we described it in our documentation to make sure that the documentation actually works. So everything is actually tested to make sure it gets you a running HPC system and once you have it installed you get the same interface independent of the operating system of the architecture. So the user will always see the same module names and files so he doesn't have to change any paths if he's using an open HPC system. Then I want to talk about some upcoming changes. The next release will be 1.3.7, it will be available somewhere in the next three months. It will include the usual updates of all the packages which can be updated and the goal is to include the ARM HPC compiler so that all packages are rebuilt with the compiler from ARM. What we just started to discuss last week in our meeting is support for new operating system releases. There's less 12, less 15 available which we want to support where we are thinking about supporting then there's the RHEL 8 beta available so we also started to look into what changes are necessary to get open HPC running on RHEL 8 and maybe CentOS 8 then at some point and the important thing here is from our release numbering this would mean we would go to a 1.4 release so at some point we changed our release numbering and said we will stay on 1.3 point something as long as the user can just upgrade the packages without any reinstallation and if we go to new operating system releases it would mean we would need a new release which just a number wouldn't be so bad but the problem is at that point it would mean we have to support like four releases for operating system releases and right now we are not sure that our community can handle four operating system releases because we already have enough to do to make sure that for the currently two supported version releases we can actually test it and if we have four we are not sure how to really do it so this as I said would require double the required testing so this is currently being discussed and when we switch and how we switch we don't know it yet how much time do I have and it's okay so when I tried to build an open HPC on the rail 8 beta there were two interesting things I had which may make the life of users or people providing the software for system a bit more interesting than it used to be so there's a new thing called ANOBIN and it provides annotation in the binaries and the problem is it's a GCC plugin and the default RPM optimization flex enable it and if you install a second GCC or a second compiler like we do and you recompile all your RPMs with your own compiler which doesn't have the ANOBIN plugin it gets complicated to disable it so this is a thing where we are currently I'm not sure how to handle it if we should just enable it or provide our own ANOBIN plugin for the plugins of for our GCC and the other big thing is Python 3 so we have a lot of packages which still depend on Python 2 and we are not yet sure how far rail 8 and CentOS 8 will support Python 2 if it will at all so we don't know it yet and so we currently disabled all packages which are requiring Python 2 this is a lot because it's basically it's at least when I tried it there was singularity was not working it was not possible to build PBS Pro it was not possible to build Slurm so this is a lot of our low level packages which are necessary to build a complete stack we were not able to build so we're hoping that the upstream projects are making the move to Python 3 so we can support almost the same flag same software stack as we used to so this will be one of the interesting things we somehow need to solve and another thing which I find interesting is we are using RPM coloring so the problem is if you install additional software using RPM like we do I'm missing a slide so in the end we install what could happen is that OpenHPC installs the same software as the operating system so same MPI or same GCC or at least or for example where I saw it first was OpenBlast so we are installing the same package built with a different compiler but the result for RPM will be the same because the shared object name is the same from the operating system package as with the OpenHPC package and if RPM or better YAM does its dependency resolution it can install one or the other package and if you just install everything it's okay but if you install the OpenHPC stack so you have OpenBlast as example from OpenHPC and then you install some package from the operating system which requires OpenBlast the dependency resolution will say I already have the shared object symbol in my RPM database I don't need to install it again the package and but if you then actually run the software it will not find it because you have to do module load before and the operating system doesn't know anything of module load so anyway what we do we just append to all our symbols which are in the RPM database we append a string called OHPC to make sure that our shared object are different from the operating system of object so this is an overview of where you can find us and what we do we work on GitHub you can look at our build system that is where you can download all our packages and you can have a look at our CI testing to see what we test like I described it there's a wiki and there's a mailing list where we discuss stuff and at this point I'm already in the end and thank you for your time I have a small question when you say singularity was not compiling on RAIL 8 you mean the 2.6 version? some version the version at that time I don't know okay because the new singularity version is not Python dependent oh it was some build dependency I don't know yeah I guess you were trying to build the last year yeah could be I just build it what was it openHPC repository and it didn't work do you have much of an idea of how many people actually use the openHPC so from the statistics a few thousand IPs yeah we don't know do people get in touch right do people get in touch with problems or like as regards documentation and things like that I mean you say everything is heavily documented like do people say this is not quite correct or this happens sometime but it's not very often the most feedback we actually get is at ISE and SC where people come by and they say we use it it's great so we that's kind of the interesting thing we do not get very much negative feedback so it seems like it it solves the problem of some people how do you integrate easy build in this openHPC environment do some special things to make it easier to use it he probably knows better what can happen he's doing the easy build package actually and so there's an existing spec file to install or to create a package for easy build which I try to update every time there's a new release and still have to do it for two days of release so you basically get an RPM for easy build that installs easy build and installs a module for easy build there's a metadata file in the easy build release for openHPC that tells easy build about the openHPC packages so you can make easy build use the open API for example installed through openHPC through one of the packages that this provided that probably needs an update for the latest openHPC releases but you can let easy build leverage the packages you can install through openHPC so it knows about it already maybe it needs a bit of an update but all the mechanisms are there so you don't have to reinstall openMPI if you already have it through openHPC so there's a bit of integration there there's probably more that can be done and this is some of one of the things I wanted to discuss with Adrian this week but yeah it hasn't happened but there's other ways maybe the openHPC project can leverage easy build to generate packages or update packages it could be an option To me this last bit I think it makes a lot of sense because you have easy build or you have a Spark which can generate all this huge amount of packages ideally optimized so I think there's a little bit of overlap between openHPC and when they the libraries that openHPC provides why not kind of get rid of that part and leverage the tools that are specialized in that like this wheel or Spark or well Yeah that would be ideal if we if those projects somehow could work closer together so that we do not do the same steps again I don't know to me you just think I think it kind of makes sense to maybe within the openHPC community to focus less on those libraries and maybe focus more on okay so what is the for this particular infrastructure what do you want to use? You want to use this wheel, you want to use Spark and from there on start picking up the the libraries that you want to install and use those tools for it I think it will be beneficial for both openHPC the users and the Spark and this wheel communities So is there an easy build any binary packages which is provided as all source compiled from source? It has support to install stuff binary as well it can do anything it even has some support to create RPMs so this could be something that could be leveraged in openHPC something we can look at together at some point Does anyone else have questions? I have a couple ask one or two the one thing that you mentioned is I'm a bit surprised by what you said you said in RHEL 8 it's not sure yet why the Python 2 will still be there I don't know In the beta it's there and there's I've seen blog posts by people at Red Hat that say we're gonna have to keep Python 2 at least initially Basically I know as much as you do I personally wouldn't expect it to be there because it's the end of life this year It doesn't make sense It's hard to believe that it will be part of running On the other hand you also said you run into lots of problems because you don't have Python 2 installed So maybe that's a reason why Red Hat says we have to at least initially Another thing I've noticed so Lmod is like the base for OpenHPC all the packages come with a module so people can you module load to activate the package but the modules that are included and OpenHPC are all in the TCL syntax Is there a good reason for that? Is it to keep it compatible with other modules tools in case people want to switch? No this wouldn't work because it includes already statements which are specific to Lmod Yes like automatically loading dependencies and things like this I actually don't know why it's still in the old format maybe it's just what people are used to I think it's worthwhile looking into switching to Lua because there's an overhead that the modules are in TCL Lmod does a translation of TCL to Lua before it even loads the modules making it a bit slower it doesn't make a lot of sense if you're tied to Lmod anyway Maybe one more you mentioned that your documentation is being well tested before doing the actual release to make sure everything is still fine Is that automated? That's interesting That's on the Jenkins on the CI infrastructure you can see there and all the testing is part of the OpenHPT repository so you can see where it installs the systems and then boosts them up How does that work in the documentation? All the commands have to be run are in a certain order and are marked somehow that they can be extracted or it's just a bunch of scripts that are in line in the documentation How does that work? I never actually looked but I think it's like it is said in the first step so it's all in latech from the documents into a shell script so this was and it's modular so if you do the installation for PBS Pro and SRAM all the documents which are then above the schedule are then from the same source so it's written only once then just constructed together that it fits Okay, cool Any other questions? Okay, thank you very much Andrew