 Hello, everyone. Yeah, starting good. Hello, everyone. So today I'm going to be talking a bit about how we do anaconda installations in Triton. So we have these anaconda environments that you can, when you go to Triton in our HPC cluster. If you log in there and you just run module load anaconda, you will get these pretty all encompassing environments that contain like hundreds of packages. And basically we have this continuous integration system that builds these for us. So we have this system to build these automatically. Well, I'll talk about our finished product in a bit. But so what's the purpose of this talk? Well, this is a follow up to my previous talk about SPAC installations using the continuous integration system. And here I'll try to explain, like, what are we trying to solve with the build system and what kind of problems are we trying to solve. And the goal is to automate the installation and update of anaconda environments. So the current iteration is about the iteration of the whole build system. It's been used for like two years and it's constantly evolving, but it's in production. So when you're starting to build a conda environment or installing conda, the first thing you need to do is to decide which installer you want to use. So conda comes in two flavors like the main conda. So there's anaconda setup, it contains lots of pre-installed packages. Like a whole setup, that's what normal users usually use. And there's a mini conda, that is a minimal installation of Python and conda. And this is especially used in containers and in automation and stuff like that. And this is what we use, because anaconda works for many cases, but when it comes to, when you need to solve this kind of a bigger environment, we talk about solving later, mini conda is better at those. And by the way, just a note, I will be basically saying the same thing that is in the slides. So if you feel like you can either follow the slides, you can look at me, but some of the slides might have quite a bit of text, so don't get alarmed. You don't need to read it completely, you can just follow my voice. So okay, we want to use mini conda for upcoming reasons. So how do we install it? Well, that's pretty straightforward. You just download it from the anaconda site, you take the latest installer, and then you can install it without any interaction using just simple bash. This v parameter makes it a batch installer, so it doesn't ask any questions. It just installs it. In this case, we install it into myNV location. So this is relatively easy to automate, and our continuous integration system downloads and caches the installer, checking that the MD5 or SHA-256 checks some matches, and using that it will build a base environment for us. Now, there are two ways that you can use to activate this environment, the space environment. The one that is mainly used when you install anaconda, like for yourself, is to just source the activation script there. And there even is like this conda init thing, that basically hardcodes this conda environment to be your main conda environment. So this is very nice, it provides all kinds of like specific like base additions to the conda command, but it's hard to reverse if you're loading the environment from modules. So it's very hard to reverse what conda does to your shell. The other way is just simply setting up this path variable in Linux to be in the binary directory of the conda installation, and that will make conda commands and everything, the Python will work, conda will work, but it doesn't provide all of this fancy functionality. And our chosen solution in this is to use the environment variable method, because it's easy to enable and disable through modules. So basically you just set this path variable, you un-set the path variable. Getting this first method to work with ALO mode, so what we use for environment modules would be really nice, but currently there's no working implementation for that. So we have to go with the second problem. Okay, now we have like a base environment enabled. We have installed the base environment, we have it enabled. So do we want to install stuff into environments or install packages into the base environment. And in normal use people, they get the Anaconda what they want, and then they install different environments that contain different packages using the base installation. The base installation basically provides the conda tool and Python that can be used to create more environments. The alternative is to install the packages into the base environment itself. And in our case, because we are using, creating a shared environment, so the environment will be shared among multiple users, there's no point of really going for multiple, like creating one base environment and creating multiple environments using that, because it's really the base environment that people will be using, especially when we activated with the previously mentioned environment variable method. So there's not that many, like normally when you create an environment, there's this package reuse. So the other environments can reuse packages from the base environment. In our case, because we create like this monolith of an environment, if users create environments on top of it, they rarely share the packages of the main environment. So it's not worth the effort of dealing with. Besides, if we set up the base environment to be like this big environment, users can create their own environment using the tools in the base environment. So it's much better to use the base environment as this static. Like every time we install a new environment, we install it everything into the base environment. Okay, so now we're getting to actually installing stuff to the environment. So we have a base environment of minimal mini conda setup with conda, beep, Python basically, minimal tools. So this stuff should be easy, right? It should be easy to just go in a for loop or something and install every package at a time. Well, it's not that simple, unfortunately, because if you install like conda install pandas, if you run a command like that, the problem is that this doesn't work with hundreds of packages because the order of the install operations matters and a single install command that is somehow wrong can mess up the whole thing. And the reason for this is that like conda has this solver in it and it tries to solve the, I'll talk a bit about the solver in a second, but it tries to solve the newly created environment and it will easily like switch dependencies underneath layers and it will like try to rewrite the basically the whole environment if you just run it one install command at a time. So we have like we now when we install these environments we run this one install command that contains all of our packages. It's absurdly long the install command, but it doesn't matter because it's done by the automatic tool. So no human needs to write it, but we run single conda install command for all of our packages. So about the solving. So this is like something that might, that is not only related to our installation setup. This is something that might be interesting to you as a whole, as a knowledge. So how conda determines what packages to installs when you run this conda install commands. So conda, when you tell a conda to basically like conda install pandas or something like that, you tell a conda that okay, I want the new environment that you are going to be creating like the new package environment, like install packages. I want them to, I want them to satisfy this constraint. Like there needs to be pandas installed. Like there's a constraint. And the constraint is said that okay, pandas needs to be installed. I don't care what a version basically that. And you can create multiple of these constraints of the system. And then conda also has this kind of a priority ordering of, for example, bigger numbers, in package numbers are better than others. And it has this kind of a, like priority here between the packages. But basically when you say conda install pandas, it creates a constraint that okay, this new, after the installation, this condition should be satisfied. And based on the channels, given in the command line and also in the configuration files, conda and the already installed packages, conda will create this kind of an index of facts about available packages. So you have this like huge space of available packages. And then you try to restrict your installation into like the best solution among these like huge pile of packages. What environment would be best according to the constraints. So conda will put all of the index of the available packages. It will put them as facts and the constraints of what packages or what constraints the environment must satisfy, it will put them into a set solver. So this, like it's, I think it's called like something like biasing satisfiability problem or something. So basically you get like a huge amount of facts and then you put some constraints on it and you need to find the optimal solution that satisfies them. It's an NP hard problem. So it's like it's very hard to compute. So they are these solvers that try to do it and the conda has its own solver for this and it tries to find a solution that given the constraints and given the index, it will satisfy the constraints. And for more info, you can check this blog post by anaconda people how the conda actually works in itelias. Well, okay, this sounds good in practice. But, and you might think why don't we use just environment YAML that locks certain versions. So why do we just tell it like a huge bunch of packages? Well, the problem is that previously we used this kind of more of an environment YAML type of a thing where we locked certain versions. In theory this will make it easier for the set solver to work because there's more constraints and it can constrain the index more heavily. So basically the space of packages that it needs to look is like is not that big. In practice this creates problems as the solvers will often fail due to conflicting constraints. So basically if you add a new package to the list like okay, I want this package to be installed there, it might conflict not directly with certain other packages that you have forced to be locked in certain version, but their dependencies for example might conflict. So you get this huge cascade of these conflicts if you try to force the packages yourself. Of course when you're doing let's say research or something and you have an environment that you have managed to solver, basically you have managed to run condi install commands in succession and you get some end product. To reach that same end product you would have to go through basically the same install commands because otherwise you will get different solutions in this wide space of packages. So usually people then share their own environment so that they can have exactly the same environment. But in our case it's not really that we want to use because we want to just have the versions, the most recent versions and we just wanted to do the installation. And this is also why we don't use the Anaconda installation. So basically if we would start from the Anaconda installation like we previously did we immediately lock a huge number of packages to certain versions that they are already installed and condi will try its hardest to not affect those packages. And what happens is that basically some package in there is going to have a conflict with some requirement that we have and then you will get this indexing failed and it will never recover from it. It will never find a good enough solution so it will go on for days until it will eventually fail because it couldn't find a good enough environment. So we try to give the SAT solver as free of pass as possible. But okay. This also brings up another thing about KONDA. And KONDA has this kind of a bad tendency of being like, like how could I say, everything that is doing the installation happens during the installation is like tied to whatever actual command line things you wrote. So KONDA will look at the configuration. It will create the configuration every time basically you check the you run the command and it will look at various like configuration files. For example it will check some system paths. It will check your home directory and it will then overwrite those with the arguments that you keep on the command line. So if you're doing multiple install commands in a row and one install command you give it a list of channels and the next install command you do not because they haven't been written in any configuration file KONDA doesn't care that previously used these channels. So the next install command will not use the channels you specified before. So it doesn't like store these variables into the actual it doesn't store these in any kind of history and anything like that. So when you're normally running stuff like as an admin or something you might have something set in your for example your home KONDA RC so you might have something here some settings set up there. So in our build system the builder always removes that file and checks that there's nothing like that and it will also set this global KONDA RC to the installation graphics itself that has the channels in certain order so that it's made certain that all of the installation commands are done the same way like basically all of the installation commands are done with the certain channels that we have requested. So this kind of configuration management this might be useful for you as well there are for the KONDA KONOFIC command there are options there that you can specify this it's called like site environment configuration or something but KONDA is pretty obtuse about it they don't necessarily tell everyone that how they there is documentation that is very technical on what ordering the KONDA configuration is determined but just to make everything as clear for us as possible we force this one configuration file and we will use it for every KONDA install command it will use the same file so that there's no risk of the channels changing okay so now that we have a good configuration like we have our own packages we have a huge list of packages and we have our configuration file so that we know which channels we want to use to install these packages so now we just run a massive KONDA install command so everything should work well there's this problem that the solver in the KONDA isn't that effective so it takes up to 30 minutes to do the solver in our case when we have hundreds of packages the space of packages and the amount of constraints is so large that it takes a huge time to run well in the in the winter I was battling with this and then Richard pointed to me like this Mamba tool I think it was Richard but basically some people have encountered the same thing before and they decided to do something about this and they created this Mamba tool so this Mamba is basically a drop in replacement for KONDA so it basically just acts like it just takes something like if you run some KONDA command you just write Mamba instead of it and certain commands Mamba will run itself and certain commands it will give to KONDA but what it does is basically it does installations so much more faster than KONDA so the dependency solving solve C++ library that is used by UUM for example to do the dependency solving and it's absolutely faster than KONDA so it takes less than a minute for the full environment like I don't know maybe 30 seconds 20 seconds to run the solving for the whole environment whereas KONDA takes like 30 minutes so if you ever are installing your own environment and you end up in this situation where you see like this solving dot dot dot and then there's this like rotating thing in your command line you know that okay this is KONDA problem so I should probably switch to Mamba at that point like this is amazing tool like I really can't hype it enough and it's available in KONDA forge so what we do is we basically first install Mamba if you're great question of course our system can install stuff with KONDA as well but we don't use it anymore but it first installs Mamba and then it installs everything else with it so that's how the installation goes so okay let's have a good quick demo on how we do the installations so I have created this so let's see just a second I have to put this yeah so here's here's for example our let's see bit bigger so here's our configuration file basically for stuff in Python so I have created this new environment here so there's this meta language in our configuration system that basically like you can specify here what kind of environment you want to do, what installer you want to do, what Python version then what channels you want to do and then we have these collections so basically these collections because we usually have multiple versions of the same environment but we update the list of packages over time so we have instead of having every environment contain a huge list of packages that we want to install we simply say that okay install everything in this collection of packages and this collection for example here in the middle it contains we just want condo packages numpy and pip packages scipy so this tool can also install of course pip packages and for example our trident packages list so here's the pip packages and here are the condo packages so there's quite a bit of packages and these are only the top level packages and of course these bring a huge bunch of dependencies with them huge bunch of requirements with them and then we also have other environments like we have marine here has created this new imaging environment for certain research groups that use certain different packages and we're trying to make it so that if there's a research group and they want certain environment we can custom create it for them okay so now that we have this we can see in like just checking git if so this is all in git and this is in github this repository so we have this new stuff here so what we do is we create like this we commit this so let's put here and push it there let's hope like I haven't tested this let's hope that it works it will start a build in our builder system so this is our build system there's also builds on on the single layer images and the spec installations but this anaconda builder will start so here's the start it will sync the configuration and then it will start installing the actual environment so in here we see that the we look at this it's a bit small but we can here see that basically what it does is that it tries to recreate all of the previously installed environments and then it will first create the base environment and then it uses mamba so here it uses mamba to solve the whole environment so here we see that it tries to install like this numpy over here for example and then it runs and now it runs the bp install commands and afterwards it will it will hard sync the stuff into triton so we can test it out and then if it looks good so here it's already copying the stuff if it looks good it will then we can then put it into the production branch of our software okay so this is how we basically do stuff there's few interesting things like few additional things here so how do we work with updating the installation so let's say we want to install a new package in there so we we need to know if the environment has changed the configuration that we had has changed so how do we do it by taking hash of the whole configuration dictionary so we create the configuration dictionary we take hash of that and then if the hash has changed then we know that the configuration has changed and then it will try to recreate the environment but we update at least of well whatever we had in the list so our hope when we do updates like incremental updates to our environments is that like a single package shouldn't change the whole environment so basically when we do updates we have mostly solved environment but we only need to add few packages there and we don't want to like break up the previously working environment so that our users don't like notice anything changing like non-pi doesn't change on the background or something like that so in these cases we do the environment exporting routes so we basically the builder will export the environment YAML it will create a new environment and it will install the same builds like same packages there and then it will run the same conda install command but this time it will freeze installed so it will try to like force the same packages to be like it doesn't allow to conda to remove stuff from the background and then if the build of the new packages goes through it will replace the old environment with the new one so we can add like a few packages every now and then there so this is pretty fast because the previously installed packages are stored this shade cache among the builders and environment so where in the past I said that we don't really use like the same packages while the system is in production in Triton we use reuse the packages when we are like updating the installations or the installs of the well the environment well the big packages I also alluded to that yeah some packages are only available to PIP so after the conda installation like it's bit harder to manage like the dependency structure in PIP because they're not involved in the SAT process so basically what we do is like after we have installed all of the conda stuff we just put the PIP stuff on top of it and then during the update we just force try to make it so that the PIP packages are as the same as well the previously used to be but it's bit harder to manage them because there's no like definite builds necessarily there okay so few interesting things here are that for GPU enabled packages or CUDA enabled packages such as sensor for pyjotes all of these packages usually require CUDA tool kit package so basically TensorFlow and pyjotes come from different channels and the required CUDA tool kit might differ between packages so the conda will often try to solve this by making one of these CPU enabled version and we don't want that to happen so basically we want we need to often go through like what is the most recent GPU enabled versions of TensorFlow what are the most recent GPU enabled versions of pyjotes and check what is the CUDA tool kit version that both of them are happy with and then specify those in environments for more information I created this small article about how to do CUDA not if you need to compile CUDA code with conda environments and there's also more information about this how conda manages these CUDA tool kits and how these CUDA tool kits relates to these deep learning packages so overview let's quickly go through the overview so we have some time for the questions as well so we create these collections of packages that the environment should contain we define the build configuration for the environment so basically what installer we want to use, what channels we want to use so forth then we build the environment we deploy the environment and the modules so let's see over here just quick check I'll put if I run this module use command that will point like make the make the packages available in our development branch I can load here fgci example yeah so I can load this and then here's our conda environment so we make the modules available so we see something come from Cypicon came from peep and rest of them came from conda so we deploy them we then add new packages incrementally by adding them to the collections these will keep the previous environment intact then at some point we freeze the old environment in place so we say that no more updates to this environment and then we create a new environment release and we let the set go wild and let it create the newest versions of every package of course with some constraints but basically we reinstall the hold down thing and we do this every so often when we feel like it usually when some of these bigger software like by torch or TensorFlow have updated so there's few problems with this whole thing the science build rules that we use to install this is pretty scripty at times so you can find it in github this repository for this installation rules that we use it's pretty scripty so it might be improved a lot the documentation is sometimes not up to date because the builder moves so fast usually so we add new features and stuff like that and we have stuff to build with the builder so we don't necessarily document it or I don't document it enough then the condo functions during module loading do not work and that would be great to enable those but that's a minor grievance yeah and possible solutions is that well we should streamline the code a bit we are working on that and we are adding new people to this project so that we get new eyes to look at how do we do stuff so that we also have to document stuff better about the environment activation there's well there's some interesting things that Richard at least at some issues like talked about and these kinds of reversible conductivation things but we have to check on that okay let's see if there's any questions we can probably stop the recording at this point