 Welcome, everyone, to my talk about OpenHPC. My name is Adrian Reber. I'm part of Reted's kernel team, and I'm involved in OpenHPC from Reted because I used to run HPC systems before I joined Reted. So I had a chance to be involved again in my old area where I worked, and so I said, yes, I'm interested to be a part of OpenHPC when we joined in 2016. The last time OpenHPC was here at FOSDEM was in 2016, so I thought this year might be a good point to tell once more what OpenHPC is and what has happened in the last three years, and, yeah, that's what I want to do here, give an update and see what has happened. So the microphone is not working. I have to speak up. Okay, sorry. Okay, I wasn't aware. So it's about OpenHPC. Okay, so my clicker doesn't work. It does work. So my agenda is my first one to give a high-level view what OpenHPC is. This is basically what I saw first when I looked at the project in 2016 and what was my first impression of the project, what does it provide. Then in the next step I want to tell why the OpenHPC project was created, formed, why it exists, and after that I want to talk about what is OpenHPC in form of the project management, how the project is set up, who is involved in it, and some details about its usage. And then I want to talk about, again, what is OpenHPC and in detail what does the project actually deliver for the users, what can the users use from the project. And the last point I want to mention is what will happen in the next time of OpenHPC and things we are currently discussing. So what is OpenHPC? When I first looked at OpenHPC I saw it's a software repository. You can download RPMs and it provides, you can download it through YMM and Zippers, so this already kind of means it's for CentOS 7 and Slash 12. Those are the two platforms OpenHPC currently supports, and it supports the architectures X86, 64, and ARM 64. And this is sort of the first thing what you see when you look at OpenHPC, what it does. And now I want to talk about why OpenHPC. So this basically depends how HPC systems are usually set up. So HPC systems have a bit of different software requirements what the users expect to be there and what is usually available on HPC systems for the users. So one thing you usually see on HPC systems it has multiple compilers. And this is from different vendors, other compilers, and then they are in different versions, so you have multiple compilers from the same vendor in different versions, because every user needs a special version and so you try to provide those compilers for your HPC system. The same goes for MPIs, and you have different implementations of the standard, and there are different versions, and different people need different versions, different implementations. And this goes on and on up the software stack for the libraries, you need them all in multiple versions and so on. And one small example I want to make here is if you have three compilers and each in two versions you already have like six permutations of compilers you want to install, and if you multiply this by three MPIs, two versions each, you already try to provide like 36 permutations of MPIs with the different compilers, and this is the point where most HPC sites are using some kind of automation because you cannot handle this without automation. And so this is a common setup for many HPC sites, so everyone is doing it in some way, and this is one of the reasons why the OpenHPC project was created, and its overall goal is to reduce duplication of work through a community effort so that everyone tries to come together and offer ideas and in case of OpenHPC even software packages you can easily install together initial system running. So the project has a vision and a mission statement, this is the vision statement, it sounds pretty visionary and I think one of the goal is to make everything better and easier and focusing on setting up and running an HPC system and the mission statement is a bit more clear for me, it's more concrete so they want to provide software repository, it's what I saw at the first glance when I looked at a project and it wants to provide best practices so people who know how to run HPC systems will write down how you could do it if you set it up for the first time or if you want to change something. So this is the list of the current project members it's a combination of industry partners, universities, some of the big labs in the US who are all involved in the project and they are trying to work on the things described in the mission and vision statement before. It's a Linux foundation project so I guess it's set up like many Linux foundation projects we have a governing board there's going on discussions and then the main technical discussions are going on in the technical steering committee it meets once a week, it's open so if you're interested what OpenHPC does or if you want to get involved somehow you can join this weekly call it's documented on the website and another thing is if you as an institution as a university want to join the project it's free for academic partners so if you're interested in the OpenHPC project we welcome any further input help whatever you want to contribute to the project. So this is the list of the current TSC member it's a long list so from every member not every member but from a lot of members someone is part of the TSC there are different roles, we have maintainers and people looking for the test cases the people on the TSC are voted by the members and are part of the TSC for one year the elections are each June and currently we always have a project leader it's Carl Schultz from the University of Texas one thing the OpenHPC project tries to do it doesn't try to force anything on you so it's a big pile of things it tries to offer you and you can pick and choose and build exactly what you want so you're not forced to use anything or anything we kind of provide so there are always different alternatives you can choose, you can rebuild it for your needs you can use the binary RPMs we provide so it provides you a lot of tools which can help you in either way you need it this is a bit about the project history the first discussion about an HPC community project happened in 2015 in June at International Supercomputing and the first release was then at Supercomputing 2015 in November and then since then we do continuously have a release probably every quarter so we update the software and do a new release and the latest release what was released in November is 1.3.6 and currently we are working on the next which is 1.3.7 and now I have a few diagrams about its usage so this is the number of components over the time so you see it's not a really big number but it has some components it's slowly growing so we're adding new things slowly to the project which people are asking for which people are submitting to the project and trying to get included the next diagram is about how many components change between the releases so basically which packages are updated so the small changes are always only really minor releases but on the normal releases we get like 30% package updates and then we have another diagram about our users so you see also the number of users grows slowly over the years and this is what people are downloading from us you see in the beginning they downloaded 1.0 so they usually switch pretty soon to the new release people are staying on the older releases and the only thing we're not sure yet what's happening here is in the last month a lot of people started to access 1.2 we're not sure what's happening there but maybe it stops next month again or we have to find out why people are still interested in 1.2 another thing about visitors of the build servers so the other thing was... so that's diagrams, I guess it's important this is about what people are downloading and now some details about what OpenHPC is in actually is so I already mentioned it's a software repository and it provides RPMs and the base of all our things is LMOT it's an environment module based implementation in Lua and the reason why we're using LMOT is that the Texas Advanced Computing Center tech is part of OpenHPC and they develop LMOT and so we have a very close connection with LMOT developers which is good for us because we can get things fixed, change we can test things for them so it's a close collaboration which works pretty good I would say then above the environment modules level we have different provisioning tools so the provisioning tools are used to set up your cluster it installs your compute nodes and boots them and it's possible to run your nodes stateless and stateful so werewolf is part of OpenHPC and then the other one is XCAD which is basically providing same things so this is a point where you already can choose which tool you want to use to set up your cluster then we have monitoring tools like Nios and Ganglia we have different resource managers which are distributing the jobs on your cluster there's PBS Bro is part of OpenHPC then we have also Slurm which is part of OpenHPC this is again a point where you can choose if you want to have this resource manager or the other one then on top of it there are of course compilers, GCC then we have LLVM another thing which OpenHPC does it does not provide the Intel compiler but if you have an Intel compiler installed in your system and it provides all libraries and softwares which can make use of the optimizations already compiled for the Intel compiler and the same will happen at some point with the ARM HPC compiler so it's working in a similar way, hopefully soon and then we have MPIs, OpenMPI, MbarPitch more MPIs as our software stack is not really big we provide spec for further software installation and easy build so you get a basic installation of a system with compilers and a few libraries but if you need to install more packages you can use easy build or spec to install software on top of your HPC system then we have container run times Charlie Cloud, Singularity there are file system clients like Lastre and BGFS and additional libraries but it's not just the software repository OpenHPC also includes in my point of view excellent documentation and what we call recipes so we have documentation, how to set up a cluster for each of the combinations we have Provisioners, Resource Manager, OS and CPU architecture for each of those combinations we have a documentation how to set up the system and the recipe is basically we're pulling out the commands you should type to set up your system in the recipe so you can just run the recipe and have hopefully a running system and it also goes to port the recipes or at the same time as the recipes supports something like Ansible playbooks to install the system and not just long shell scripts and I think there's later a talk about Ansible in combination with OpenHPC even here at FOSSTEM later in this room and the thing with these documentations they are each completely tested with each release so we have a test system where we do bare metal install of clusters using InfiniBand and Ethernet only based clusters on ARM and on Intel CPUs on SLAS and on CentOS and we test all the software we provide and all the installation steps for each release so we make sure that the documentation we provide actually works and once you have this all installed and running you get a user interface using environment modules and it basically looks the same independent of the operating system independent of the architecture so the users don't have to change any paths on name of modules OpenHPC systems will always come with the same paths and environment module names so I have like three minutes so just a bit about upcoming changes in OpenHPC the next release will be 1.3.7 it will be released in the next three months so probably in three months rather than tomorrow it will include the usual updates of packages which have been released upstream and we will try to include them the goal is to include a rebuild of our repository using the ARM HPC compiler and last week two weeks ago we started discussions about supporting new operating system releases there is slash 15 exists and we have requests to enable it there was the row 8 beta release and we are discussing how we should handle this because from our release numbering this would mean we will switch to a 1.4 release because 1.3 means you can upgrade it all the time and you cannot really probably upgrade from 1.3 to 1.4 but doing this would mean that we probably have to drop the support for center S7 and slash 12 because right now the community does not think we can support for operating system releases and the main problem is the testing we built all our packages from a single source the building is not really the problem but to make sure the result which we provide actually works would require the testing and this is something we are currently discussing how to handle this correctly and when should open HPC switch to slash 15 and row 8 because people in the beginning are always not so interested in trying new operating systems on their HPC system so we have to find the right point in time how do we deal with the old releases how do we deal with the new releases and I have a short overview here about links to the project our home page or the CI system or GitHub page with all the spec files for the packages you can look at our CI infrastructure how the current state of our packages are and how the testing goes and now I am already in the end thanks for your thanks for being here thanks for listening and are there any questions the question was what is the best way to start helping us so if you have a problem already fix it and submit a poll request I am pretty sure we will include it if you think there is some package missing we have a submission process I listed it somewhere here component submission so if you think there is something missing you can submit the component submission and this will be included with every release so every few months we can include new things you can, I don't know have a look at the repository basically if something is broken you fix it we will include it I am not sure what this question is about integration testing so if there is integration testing between the components was it that? so we do install testing and we have I would say for most of the libraries we have a test code which we run but again it is test code so we make sure that provided library interfaces are working and the test code runs and that it can do MPI communication using the nodes in our test cluster so there is a certain kind of we kind of try to make sure that it works but it is not a complicated computation which we run on our test system so we make sure that the applications and the libraries are working but I don't know at what point you can say they are 100% really tested