 kept this cluster inside of this container. So at the time we can say that we run containers on cluster inside container. Yeah this cluster was then moved to the new building. In total it has 3,300 cores, 15 terabytes of RAM and something about 94 teraflops. It's not much today but in 2013 it was enough for our purposes. So the second cluster is named Selmon. The name is based the same way as the Anselm is the name of the same coal mine. The name changed of the coal mine in time. So yeah we upgraded to CentOS 762 this cluster. The number of software is a little bit higher there but not significantly. It has 1000 nodes based on Intel Haswell. Half of the nodes are without acceleration with two Haswell CPUs per one node. In total it's 24 cores and second half is with two Intel Xeon 5 accelerators per one node. The curiosity about this is that at the time it was the biggest installation of Intel Xeon 5 in Europe. Yeah and we have one FET node which have 120 cores and more than 3 terabytes of RAM. Yeah you can read the totals here. I think it's not interesting. Maybe more interesting is that this supercomputer was 14 in the world and 14 in the Europe at the 2015. This is our next cluster, next small cluster. Its codename works but we don't know the name right now. It will be the small cluster on every 200 nodes based on the Intel Cascade Lake and we should have this one somewhere in April 2019. So in three months maybe. I'm a part of the Supercomputer Services Department. We have 10 system administrators and we are doing almost everything that is needed. So from the applications that's mostly my and my coding work through the network we develop our own informating system. We are doing user support and recently we started to do hardware maintenance of our cluster because this cluster is pretty old right now so we don't have anyone to do it for us. So we are doing the hardware maintenance too and we have two infra admins that's just doing the cooling, heating, because we are heating our building by the Supercomputer. We are using the heat from Supercomputer to heat our building and things like powering and ATC. So let's move to the easy build stuff. Yes as I said we are using the CentOS 7.6 for modules we are using L-Mode 777. We don't have a reason for it I think. We just like the numbers and it's working so why change it. The easy build version is every time the latest and we digged a little bit and we found that we started to use easy build in 30 January 2015. So today it's four years. We should celebrate. Something like default tool chain is not strictly default is GCC 630. The reason is that this is the most used tool chain in our clusters and most or just plenty of modules are compatible with this GCC version and then we can easily combine it together and we don't need like one software with 10 or 20 different tool chains or we like dummy. I know that you hate it but we like it and in case of CentOS 7.6 we don't have problems with dummy tool chains because most scientific application is compatible with CentOS 7.6 and we don't have reason to use a new tool chain or new GCC. Yes in case for some high level scientific software or how to say it we use the newer GCC because of the performance of that software. So the performance optimization can be better on the new GCC but for some I don't know graphical interfaces and graphical libraries we just use dummy because it's enough and we can share one module one dummy module through many many applications. We have our own easy configs and easy blocks. I will tell something about it later. The software is on shared storage but the storage is shared only per cluster. So when we have two clusters we have separate storage for each cluster. The reason is mostly that there's different CPUs. We have Sandi Bridge with AVX and Haswell with AVX too. So we need to compile it again so we don't have shared storage through all clusters only per cluster. Yeah we are using the flat naming scheme because we have bioinformatics and everything from the L-mode is automatically grabbed and the module metrics on our website is automatically generated. So when we add a new software it's immediately on our web. And the last thing that is for us the most important is that Isabel is a good servant but a bad master. So it's a pretty good tool and it can do much things but sometimes we just need dummy toolchain or we need to just edit something inside of the Isabel so it can work like we need it so we need to bend it a little bit sometimes. Yeah about the upgrading from the CentOS there's one thing we upgraded from CentOS 6.9 to 7.6 and it was a software disaster because almost everything stopped to work and the main reason was not that we are using the dummy toolchain not that it was the smallest problem. The biggest problem was with C-library or G-lipsy library that changed and plenty of software just searched for the exact version of the G-lipsy and it's inside of the code or the application inside of the binary so we have to rebuild everything in our clusters. So we decide to mark every software before the upgrade with a tag C6 and then we rebuild what we needed. So after rebuild the module file was generated again and this tag just disappeared. You can see it here so just some old versions of OpenMPI we don't rebuild because nobody wants this software so for now it's still marked as old software CentOS 6 software and maybe if nobody just tells us that needed we will delete it so it's something it's some way how to just get rid of software we don't want. Yeah but this was pretty hard in two people just reinstall I don't know maybe 50% of software on the cluster in something like two weeks without easy build we can't do this. One thing you could do here is hide the old broken modules you just tell not to show them. Yeah we can but our users know how to show the hidden modules it's the one thing we realized and the second thing is that we don't need to hide it our users know and we have it everywhere that we moved from CentOS 6.9 to 7.6 and what we need is to just know if somebody needed that software if don't after I don't know half a year maybe year we just will delete that software and we will not ask our users yeah yeah yeah yeah I think I think you're right every time we something delete because we just old software that we think it's it's too old we mark as obsolete and we show just just O in the listing that it's obsolete but users are continuing to using it don't care about about the obsolete not is there and then when we delete it we have a ticket just please install this software it stopped to work so yeah I think this will be the same story but I have no idea how to manage if someone needs this old software or not yeah so back to easy configs we have our own public repository where we have 200 2,900 easy configs that's different from the upstream for 700 different software and we have 187 software that is not in the upstream I will get to this later because it's is because of some changes we made how we distribute the easy configs mostly we are trying to keep things as simple as we can so we just have get the repository one branch for every supercomputer then we just install the software on the one supercomputer and we have scripts that will merge these changes to the master and then will propagate it to the to the other branch the reason for this is that one software can be installed all by the different way on different clusters there are different architecture and there can be a different operating system so in this case we can have different version of the same named easy build easy config file I don't know if it's a good idea but it's working for more than three years right now so we get used to it as it's simple in comparison to continuous integration Jenkins and other things for us maybe we will have four cluster five clusters and then we'll be unusable but right now it's okay for us yeah easy blocks we have on easy blocks but mostly it's only a little bit changes we did only a little bit changes to the easy block in some emergency cases that's something stopped to work and we need just to install an application or modify the installing process a little bit nothing nothing huge like like our own easy blocks made from scratch we are not able to do such a huge things there is some sample of one modified easy block I think it's absolutely unreadable right now but what we did we needed to install matlab for example but we had we have still two two keys and separate separate solvers in each key and we have to install the solvers from both keys and okay this is old maybe just now you can do it but this is this is old it's this exact issue was fixed in the latest release yeah this is I think a year or something like that old maybe more so we just we just use the part of code control c control v we use some unused variable maybe I can find it but but the variable in easy config because we was so dummy or so lazy we reused p and x default class because that was not so used yeah we are lazy yeah so that's that's about easy box you can see this mess on on our git repository but I don't recommend to use it yeah and what we did is we made our tool chain we made this pie because it's it's for python packages and we had a problem that every upgrade of python we have to reinstall all packages to this python and because sometimes just users request I need this package and I need this package so we install other packages inside and not and this this changes was in runtime not in the easy config so we lost track about about the packages inside and everything so we try to make our tool chain pie which is based on gcc core 6 3 0 and include python 2 7 or 2 6 depending on the version we have base version of the python only with I think two or three basic packages inside I have some setup tools or something like that and then every module is installed as a separate every package is installed as a separate module the main thing about this is that we can change the python module and we don't need to reinstall all all python packages you already did it sometimes or few times and it looks like it's it's working or in working condition right now and nobody just have reported an issue in in a year so is the issue here that you're every time you somebody asks for a new python package that you have to redo the whole python installation is that the problem you're not exactly mostly just somebody load the python modules and do a pip install procedure right yeah and we lost track about what's there and and we had you know 20 version of python different packages inside every python so when we upgraded python then we have ticket to add this package to this python too so yeah so this was something like experiment from us because I was just curious if this can work because the python path is is long yeah if we load 20 30 packages it's it's pretty long but we don't have an issue with this and we can have more version of the same package python package so we can trace the dependency and other things again I don't know if this is clever thing but it's working so maybe it's only think you can think about this if it's a good thing or not I was gonna say something I forgot it'll come back to me sir I don't think I understand what you're doing with the python so you just have like a directories with all the python modules in that and now when a user does a python mode 2 7 15 you're creating a whole bunch of python paths to find all these different modules like not exactly we are using the python modules the same way like the lmolk modules so the python packages package is installed the same way like any other scientific software so we don't know the source code compile it using the easy build so they end up being their own most like qt4 for python it's it's one what it's one python yeah yeah it is one package one module per one package okay I'm thinking about you know design it's the opposite where you would just have like a big directory of all the python yes yeah but in that case we can't have more version of one package right in this case we can combine one what we just can imagine one thing you have to be aware of here that the bigger you make python path the slower the start becomes because if somebody does import pyamol and pyamol is at the very end of the python path they will run through everything else hitting the file system but it has an effect on startup time yeah I thought that this will be definitely a problem but as I as I thought we have no issue reported right now and the file system that we use is just base network file system story NFS storage and the cache in this inside of the system works pretty well so after the first iteration of of searching inside of the python path the second iteration I think that the second iteration is much quicker and so on so our nodes has 64 gigs of RAM at minimum and on Salomon we have 128 gigs of RAM so there's plenty of space for cache or for the system file system cache inside so and because we also missed some functions in easy build we made a wrapper for is a build and we just add that functions directly to the easy build so it's easier for us to use it for example after something of new software we mostly need to rebuild l mode cache because almost it's cached to so it's quicker to search through the software and this rebuild is I don't know once per 10 minutes maybe or for 15 minutes so we don't need to wait we can rebuild the cache instantly then be a little bit modified the upgrade of the easy build install latest eb release because we have some our modification links module files are different a little bit yes we have this wrapper so we have to deploy these things so we just wrap this installation process and then continue with our post install steps yeah next thing is the content containerized parameter because this not load the singularity module because we don't have singularity inside of the system we have modules for singularity with different versions so we add this yes and when we detect a success build then we can rebuild the l mode cache automatically so it's easier how is this implemented is this the shell wrapper around the eb is a shell wrapper yes yeah because since when did we implement hooks 3.6 easy build 3.6 something like that you can actually do this in python and have like at least cleaner yeah yeah there's two maybe two reasons for this the one is that this is maybe two or three years old and the second is that we are not so good in python we can write a script in batch as we are mostly system administrator but not not the programmers so so we don't know python so well too maybe we can do these hooks but yes this is pretty old and it's working just just works that's everything yeah and then uh well curious what the users do with easy build because easy build on our clusters is accessible to users and is preconfigured that every user can build software with easy build and this software will be stored in the home directory in the cluster and it was just curious if somebody use it and how use it so we again use our wrapper to spy our users and we are sending this information to metermost which is just messaging application the good thing about this is that I can spy my own easy build process easy build compilations and then I can use notify or notification on my mobile phone so I know that my build is complete or it failed and I know that I can celebrate or or not yeah so that's just another thing and what we found yes our users are using easy build a little bit but not every time it's a success and when it is success then we can steal the easy config and use it we are just small team and we have to we don't have time for everything so we have to help somebody from outside this is perfect way yeah then we have some other base scripts that's uh that helps us to use easy build the way the first is connected with our easy configs mostly how we distribute the easy configs through the cluster so this script will do everything for us and we have everything inside git so we have we have track of model files easy configs settings of of the easy build and everything so if something just screw up we can get the git repository and use order commit ebssh just connect to all clusters and send build some software advance so this is maybe there should be a picture so yeah we know that easy build can do a graph of the dependency for us so we use that and it is just a small wrapper to did the graph on or do the graph on on the cluster then copy the graph file to our computer and open it so it's not so interesting the interesting thing is the we can do the same thing with our model files that's another small base script that can do something like map of our model files and can generate a graph of dependencies from the model files so from the state that is currently on the cluster because uh easy configs can change in time and all software can be installed in just other way yes i know that i can grab just the easy config uh from the install folder and do it by easy build but this is a way simpler for us that that's kind of interesting for the this remove installation thing right to reverse lookup mechanism for the software and system for the module i will get to this too yeah but the the script it's it's not so clever so it's just some some graphs and other things so it works but only in case that the software was installed by easy build so the syntax is exact the same on every module and i can grab the dependencies from there yeah yeah we have on our cluster we have maybe two or three so softwares uh installed by hand everything else is installed with easy build and even it's installed by hand the module file is generated by easy build because we are too lazy so still there's still i can parse this this module files uh no we are not sure if we can just uh publicly share it for example this module can do can do a big mess on your cluster uh or this this script so i don't know if it's clever uh to distribute script where is something like a remote command yeah that this maybe maybe it's safe but yeah maybe we can we have everything already in in git it's just not public so maybe we can get the public so yeah and and the last thing i would like to say is that we have script to remove modules and the dependencies of the other modules so there's just uh lvm yeah in case i'd like to remove the lvm uh module i can use this script it will check the dependency it will check every software in the dependency if it's independent or some other software is dependent on the software so i know that this software i can remove because some other application is dependent on that so in this case i know only that lvm is it's okay but uh zlib and encourages i can remove so this script will find the root of that application the module files and then just delete it that's all it's it's a simple script um what's interesting is the finding of the of the dependencies i think this only removes the modules it leaves the installation directories there or no no it will it will remove the root director of the application too yeah we can use uh uh maybe the better idea is to move the application somewhere else maybe yeah uh but sometimes we just have to try some software with a new toolchain so we build world toolchain every dependency maybe 20 packages and then we realize that we don't need it because it's not working uh like we need so we have to delete every software so 20 packages to delete so 30 files to delete by hand yeah this script can do it at once almost so suppose the zlib module is not used by anything else it would also be removed here yeah yeah yeah this this is exactly what a removal feature in easy build should be doing right yeah yeah we are hoping that this will be in easy build yeah showing you what it considered showing you what is okay to remove and then asking for confirmation after giving you this yeah yeah yeah i have the force parameter there too so you can use force and then no no questions so the the problem here is well now it's not green but suppose zlib is green it's yeah it's it's it's green on my screen no no i mean lvm is green but yeah lvm yeah yeah so assume zlib is green as well it can only do that based on the information it sees right like the current module part if there are if there are link uh if there's if the library is linked in other other way some around the module uh way yeah then i will remove it maybe users have built stuff on top of your modules and then you're gonna break their stuff yeah so that's the reason we don't publicly share this script yeah you can do a mess with this script on the blaster but yeah if you do it manually it's the same issue yeah yeah uh mostly it's in case just we are trying a new a new software or new tool chain and we build plenty plenty of packages and then we need to remove them so in that case it's safe to remove because it's it's a new package nobody knows maybe uh of this package but yeah if it's old package then then you can do a pretty one question so is this will this only work in a flat naming scheme or would have worked in hierarchy in this one uh you flat right yeah yeah you're using flat and i'm not sure right now uh maybe it or if if not uh it's simple to use it with hierarchical yeah yeah it's it's it's undefined and i'm grabbing or taking these paths directly from the modules so if it will be in others folder i can still grab the path and use them yeah so i don't know if it will work because i test it only on our clusters but the modification should be simple to work it somewhere else so thank you for your attention that's all for me for me right now so you can ask so thank you