 So next up, we have the part on hierarchical module naming schemes. So this is maybe a bit more challenging if you've never heard about hierarchical module naming schemes. One warning here, if you're using Singularity to run the container image, you will run into trouble. And I suspect because of the permission errors we had before, you may even run into the issues in this Docker container image as well. Due to the mapping of the user outside of and inside of the container image. If that's the case, we'll just walk through the tutorial part and mostly skip the actual demo. Well, the text here in the tutorial part should be rather self-explanatory, hopefully. So we're running a little bit late with the tutorial. I'm planning to stick to the schedule we have and maybe spend a little bit less time in the later parts of the tutorial contributing back. And the comparison with other tools are things we can probably zip through faster than is planned in the schedule. And in any case, we had 45 minutes planned for end Q&A. So we may not use all of that time. So let's look at hierarchical module naming schemes. As I mentioned at the start, EasyBuild uses a default naming scheme to generate modules, which is EasyBuild MNS. You can ask which naming schemes it knows about using this command line option, a VailModuleNamingScheme. So you will see the standard naming scheme, a couple of others which we will not cover here, and a hierarchical module naming scheme. So this is the one we will focus on in this part of the tutorial. So first of all, let's look at the difference between a flat and a hierarchical module naming scheme. So the default naming scheme is a flat naming scheme. So this means that all the module files are available for loading directly, as long as they are in view of the modules tool. So you can do module load on any of the existing module files. And each of these module files uniquely identifies an installation. So we know when we load this SUPREED module, for example, we know it's for this version of SUPREED, which was installed with this toolchain. We don't have to load any other modules first before we can do this. And a hierarchical module naming scheme, things are very different. So a typical hierarchical module naming scheme has three levels, a core level, a compiler level, and an MPI level. So this looks something like this in the example here. So in the core level, you have modules that were installed using the system toolchain. So not a toolchain control through EasyBuild, but using the system toolchain. So you can do this. So not a toolchain control through EasyBuild, but using system compiler system libraries. We try to limit the amount of installations we do this way to make sure we have reproducible installations. So we try to control the built environment, the compiler that's being used for installation as much as possible. So the number of modules you'll have here should be limited. The second level in a typical module hierarchy is the compiler level. So these are all modules that were installed with a compiler-only toolchain. So for example, using GCC 9.3 as a toolchain. In this case, we see two MPI libraries, open MPI and MPI. And then the third level of this typical hierarchy is the MPI level, where any installations are located that have at least a compiler and an MPI in the toolchain that was used to do the installation. In this case, we have three installations here, FFTW, ScalaPack and HDF5. So all of these three need both a compiler and an MPI library to be available during the installation, which is why they are located in this location. So any software that needs at least compiler and MPI will be located here. Any software that only needs a compiler to install will be located in the middle level hierarchy. So one key thing here is that some of the modules in the lower levels of the hierarchy in core and compiler levels are what I call gateway modules. Whenever you load one of these modules, they will open up a part of the subsequent level in the hierarchy. So in this case, when loading the GCC module, you basically open a new box of modules that lives in the compiler level. And in this case, there's only one additional module, at least that is shown here, which becomes available, which is the open MPI module. In the same way, the open MPI module is a gateway to the lower level to the MPI, or the higher level depends on how you look at it. Open MPI module is a gateway module to the MPI level of the hierarchy, where a bunch of other software becomes available. So remember, these three all need a particular MPI and compiler to be active or loaded. So this explains this in detail. Here, I think it basically boils down to what I said. So we have gateway modules. So the characteristics of a module hierarchy basically come down to from the start when you log into the system or when you start a new session. Not all the module files that exist are actually directly available for loading. So some of them are hidden out of plain view. And you may need to load other gateway modules first before you can access these modules. So that's really the key part in the module hierarchy. So why would you organize modules like this? So first of all, this is not that trivial to do. You have to pay close attention when you generate modules or create modules yourself to see where they belong in the module hierarchy and which modules should be gateway modules to open up additional parts of the hierarchy. So what's the advantage? Well, the module names are a lot shorter because by the time you see an FFTW module, you already know which MPI and which compiler is being used or was being used to install it because you have to load these modules to reach FFTW. That means just naming the module like this, FFTW slash version is good enough. We don't have to mention the tool chain because it's implicitly encoded in the location of the module file. So we get short module names. That's very good. Looks very clean. The list of available modules is also a lot less overwhelming since we have less modules at the start. We're not overwhelmed with hundreds or even thousands of module files when we run module avail. So it's a more focused view on the available software. And in addition, we can only load compatible modules together. So it doesn't really make sense to have two MPI libraries active at the same time, for example, having software both with open MPI and MPH active at the same time, doesn't make a lot of sense because there's only one MPI library that will win. And if you try to combine software like this, unless you really know what you're doing, you may run into trouble really quickly. Same goes for compilers. You really wanna have only one major compiler active at a time. There's some disadvantages too, however. So having less modules available or visible, I should say from the start is a positive point which is less overwhelming, but it can also be confusing because people don't see the modules. They may not be fully aware that they're actually there. So they need a different way to locate them. And then the gateway modules may have little meaning to users. So if you tell a bioinformatician that you have to load a compiler module first and then an MPI module, they may not be fully aware of what a compiler or an MPI library is or why it's actually relevant to them at all. So at least in the classical hierarchy, the gateway modules may be more confusing than they should be. Of course, short module names is good. In a hierarchy, we can get away with just having software slash version for a lot of the modules. We don't have to encode the tool chain in the module name themselves because that's implied by the location of the module file. We have less module files available. So module avail is less overwhelming. We can only load compatible modules together. So that helps in preventing people to shoot themselves in the foot by just loading a bunch of modules together and then assuming it's all going to work out well. The downside is not all existing modules are visible. So if you're just going to rely on module avail, you may have some trouble in locating the software you need. Elmod does provide a solution for this. It has a separate command called module spider which allows you to search through the whole module hierarchy without only showing modules that you can load directly. So module spider is like a search tool for module hierarchies. And then the semantics of the gateway modules may not be clear to people, but again, as long as they're using module spider, the module spider will actually tell you which modules you have to load as gateway modules before you can reach other modules. So I have a quick example here. Let's see if this works without too much trouble. We'll prepare the environment first. So we'll do a module purge and a module unused module part. So we actually don't want to have any modules in view at all. So module avail should show nothing or even give an error because module part is not set, which is fine. One thing we have to be very careful with is that we're not mixing modules from a flat and hierarchical naming scheme. So the modules we have in slash easy build modules so the pre-installed software stack is a flat naming scheme. So we should not mix those modules together with modules from a hierarchical naming scheme. That's gonna, that's going to cause trouble because for example, things like a GCC module which in a flat naming scheme just makes GCC the installation of GCC active is not the same module file as it is in a hierarchy where GCC is a gateway module that opens up other parts of the module hierarchy. So we have to make sure we only use modules that are installed in a module hierarchy in this case. This is exactly why we do the module purge and the module unused module part. So we make sure we start with a clean slate. We don't have anything in view at all. This is also why we installed easy build through pip rather than the bootstrap mechanism. So we have a pip installation that doesn't that is not hidden after a module that we have to load because that complicates things a little bit in this case. You can certainly make it work. So easy build is gonna be a module that lives in the core level of the hierarchy. So the compiler only level because easy build doesn't require a compiler to install. So if you do it, you can install easy build in a module hierarchy and then it's not really an issue but in this case, it's a bit confusing. So that's why we recommend using pip to install easy build in this exercise for this part of the tutorial at least. We've already done that. So we can make sure that easy build is still here by checking with eb.exe version that works fine. Now we're gonna configure easy builds to install a hierarchy of modules. So here we have to be a little bit careful. What we're gonna do is we're gonna use the same prefix as before, home easy build. That doesn't really change. The build part here is not terribly important but we're set it as we did before. This is one of the key settings here. So we're gonna tell easy build that software should be installed in slash easy build slash software. So what we're doing here is we're telling easy build about the pre-installed software stack. Stuff is already installed here and we want easy build to be fully aware of that. And we're also gonna tell easy build to use the hierarchical module naming scheme as well rather than the standard one. And we're gonna give a separate location to generate module files. So we want to generate these in our home directory and not in slash easy build. We have modules for the software in slash easy build already but with a flat naming scheme. We want to generate a different view on the software that we have installed by regenerating modules using this hierarchical for the naming scheme. So let's copy paste this in here and make sure if we run show config that it looks okay. So the key parts are the modules go in HMNS in our home directory. Easy build knows about the software installed in slash easy build and we're using the hierarchical module naming scheme. So this looks okay. And this basically explains the output of show config which is good. So what we're gonna do now is generate modules for HDF5 in a module hierarchy. So the good part is HDF5 is actually already installed in slash easy build slash software. So we won't actually have to install the software itself. We're just gonna generate the modules using this different module hierarchy naming scheme. So all we need to do is run this command so we run EB. We give the name of the HDF5 easy config file. We make sure that dependency resolution is enabled with that as robot and we only generate the modules. We don't do the installations themselves because they are already there in easy build slash easy build slash software. When we run this, we should see easy build zipping through the whole software stack one by one and it seems to be working well. So we're not running into permission errors. You will see that the installations are mostly being skipped because of the module only option. So we see a lot of skipped, skipped, skipped here. And basically let me scroll to pause it a little bit. Find a good example here. This ZLIP installation for example, we can see the actual installation is being skipped, skipped, skipped, skipped. And the only parts that are really being done are the sanity check and the creation of the module. So the sanity check is still being done to make sure that easy build can find the libraries or the files and directories that are part of the installation, maybe even run a couple of short commands to make sure the installation is functional and only when that passes, it will actually generate the module file. So with zipping through all of these, we should take, here it took nine seconds, but here I guess in AWS it may take a minute or two because of the limited resources we have, but it shouldn't take very long. It's already doing open MPI, the tool chain, and now HDF5 itself. So this is the full module name for HDF5. So the visible one that we will show to users and this is the location in the hierarchy. So we're in the MPI level for open MPI which was built with GCC. So that gives us 37 modules generated in the module hierarchy. And now let's take a look at how that looks. So to load the HDF5 module, we have to use these modules in the hierarchy. Let's check where these are generated in home HMNS. We have a modules directory, modules all, and here we see the three levels of the hierarchy, compiler, core, and MPI. The top of the hierarchy, so the starting point is core. So this is where we will run the module views commands. We do module views, home HMNS, modules all, slash core. So at the start, only the core modules should be available. We run now module all. We see a limited amount of modules with very short module names, only software slash version. So that's quite clean. And most of these are actually built dependencies for GCC which stay there, so they may come in useful for building other software. So they are not automatically cleaned up or something. But the main one we're interested in here is this GCC module. This will be our gateway module. How can we tell? So what we're looking for is HDF5. So let's run module spider. The special command that lmod gives us for working with module hierarchies. So this gives us some output. It says, I found one HDF5 module for this version. And if you wanna load this, these are the gateway modules that you have to load first. So the GCC module and then the open MPI module that is accessible once this one is loaded. So let's try that. Let's load our first gateway module like this. And then if we check module avail again, we see a whole bunch of additional modules becoming available. These ones we had before, we now have the ones that are installed with a particular compiler available as well. In this case, it's actually split across two different directories where GCC core was a compiler and GCC was a compiler. This is somewhat of a technical detail here, but basically GCC core is the actual compiler we use, which is also used as a base compiler for the Intel compilers, like I mentioned before, while GCC is actually a combination of GCC core and binutils built with GCC core, which is the one located here somewhere. Can't seem to, oh, here, this binutils. So the actual GCC core module, this one, and the binutils built with that GCC core together for this GCC module, and this is our gateway module. So here we see the open MPI installation that was installed with this particular compiler. So it's in the compiler level of the module hierarchy. And like module spider told us before, this is our next gateway module. So we don't have HDF5 available yet at this point. We cannot load it yet. Well, when we load open MPI, we'll load open MPI, and the version was 403, and then check HDF5 again. Now we see that HDF5 is available for loading. So now we can actually go ahead and load HDF5 and we get the commands it provides accessible. For the run module list, you will see that things look very, very clean. We have very short module names. So that helps a lot in a module hierarchy to have things look as clean as they do here. So this was basically me zipping through this whole example here, which also takes it step by step. We load GCC, check available modules. We see the second gateway module open MPI. We load this one. And if we then check module avail, we can see the HDF5 module available, which we can load and then get started by using it. And we get very short module names. So there's a small exercise here that you can try basically to install an additional software package in the module hierarchy and see if you can get that working. So the idea is first to start with a clean slate here, do a module purge, hide all available modules up until this point. So you start from scratch and you can basically do an exercise similar to the one we did for HDF5. Maybe take a minute or two to try that yourself, try this exercise or if there are any questions about module hierarchies, please let me know. So the key point is here that first of all, easy build is well aware of module hierarchies. If this is how you want to install your software or at least this is how you want to organize your module files, easy build can do it for you fully automatically. You don't have to generate these modules yourself and make sure the hierarchy is fully functional. Easy build does it for you. And if you want to, you have full control over the module hierarchy as well. How things look like, what goes where. Maybe you want not to make it three levels but four or five levels and that's all possible through a custom module name scheme. So let's see if somebody is actually working on this exercise. It's called, wow, just exercise, HMNS exercise. Please let me know if this worked for you or not. Most people are up for skipping this. It looks like maybe some people are still working on it. We won't stall too long here because we're getting a bit short on time. So the idea is that you first make sure you have easy build properly configured like we showed in the example. And then it basically boils down to giving this easy config file to easy build, enabling the robot and enabling module on. And before we do this, we can check with missing even without the exercise robot. We can check which modules are missing. Easy build should tell us there's a handful of modules missing, 15 modules are still missing. And it even shows us where things will go in the hierarchy. And when we do this, the robot module only to install the missing modules in the hierarchy and it will basically do this for us. So we expect this sci-pi bundle to go into the MPI level of the module hierarchy because it's installed with a full tool chain which includes both compiler and MPI. So basically in the same location as HDF5 was and to figure out where it is, we can use module spider. So here it's doing FFTW, then it's doing Python which is still missing PyBind, which is dependency for the sci-pi bundle scale up back, which is part of the compiler of the full tool chain. And then here it goes doing sci-pi bundle itself. That installed all 15 modules. And we can do the module use command again to start from the top of the hierarchy, the check with module spider, where we can find the sci-pi bundle module. It says I found one sci-pi bundle module, this version specific to Python. So here we do have the Python part still as a part of the module. And in this case, L wants us to ask for a specific version of the sci-pi bundle module before it tells us about the gateway modules. So we check for module spider, this specific version. And that gives us the same answer as for HDF5 because that's all we have in this environment. If we load these two modules only then do we actually see the sci-pi module available for loading? We can load it here. And then here we're asked to run this command to check whether this is working. So this just prints the pandas version that's included in this bundle of scientific Python packages that wraps up this part.