 Hello everyone, welcome to this next talk that will be about a tool that is called Micro-Pipamp and that's a tool that we developed and it serves its purpose and that is it acts like one tool that installs Python packages into your environment and it complements Pipamp poetry and PIP tools so all these package installers in Python universe. My name is Fridolin and let's have a look at what we have. So first I will talk about Python packaging in general, then we will take a look at existing formats for resolving and installing dependencies so I will mention tools such as PIP, PIP tools, PIP and poetry. I think these are the main tools for installing Python packages in Python ecosystem. Then I will introduce Micro-Pipamp. We will discuss why Micro-Pipamp was developed, why there was that intention to create it and I will talk about how to use it, why to use it and how you can use it also in Python containers run inside OpenShift cluster and then at the end I will talk about best practices, how to install dependencies inside containers and how you can manage your applications so that they are maintainable and you gain benefits of proper maintenance of your projects. So first on our list is Python packaging. Each time I talk about Python packaging I see this picture. It's coming from XKCD and it discusses how complex Python packaging is and really if there are some issues that these issues can be on different levels and there are a bunch of tools that you can use and how to configure them properly not to go into some dependency traps or dependency hell. With Micro-Pipamp we did not introduce a new node in this dependency graph hopefully but we rather try to reduce the complexity in Python packaging. If we take a look at modules that are installed these can be found on the Python package index that's the public hosted service managed by the Python packaging association, PyPA and you can find it on pypi.org. At the beginning it was really an index so it was a page that linked other pages where you can go and download artifacts into your computer and manually extract them later on. There were developed tools such as easy install to install them. This evolved over time and the current implementation is called warehouse. If you see the picture that is a picture from Sketch Monty Python and it's a sketch called cheese shop and that's exactly how the PyPI was named. The sketch is about cheese shop where cheese is not present so a guy walks in and wants to buy cheese but there is nothing really to buy and that's exactly how PyPI looked like but nowadays it's really a service that serves its purpose and you can find billions of packages that are open source there. It looks like this just to mention it's called PyPI. Some people pronounce it PyPI that's not correct and the pronunciation should be really like PyPI, the package index. If you would like to host your own artifacts you can still do so and many people do that. They host their own Python package index and you can install dependencies from there. The only thing you need to follow is the PEP standard. There is PEP 503 and as an example we run our own index where you can find TensorFlow builds which are optimized for AVX2 instruction sets so if you do machine learning and you use TensorFlow feel free to visit our index and use our builds of TensorFlow and where you can gain performance. Okay so the next step is to go briefly through existing tools and formats that are available for resolving and installing Python dependencies. The first one is PEP. I probably don't need to introduce it in the Python track. It's really simple PEP install whatever package you would like to have in your environment. So in this case we are installing micro PEP and PEP stands for PEP installs packages and previously it was called Py install. It was renamed because the install felt too redundant and now we have just PEP. PEP is the recommended tool for installing packages by the Python packaging association and it really does its job pretty well. It downloads the artifact, extracts it, issues building process if there are some extensions and delivers the package to your environment. However it does not manage log files. These log files are pretty good to have because if you're developing an application you state in the log file all the dependencies that the application is supposed to use and you have one let's say file that states everything every artifact that you need to install on Python layer to run your application. PEP is also not good in obtaining information about installed artifacts so if you would like to know from where the package was installed you cannot do so. So if you install for example requests from PyPI you cannot find out that it was installed from PyPI. Also you can end up with broken environment when you use PEP and this is mostly caused by resolver as multiple installations or multiple installation runs can broke your environment. There is a new feature that is advanced resolver. You can enable it using use feature 2020 resolver and this resolver implemented using backtracking tries to address issues with older PEP releases. As stated PEP is not that directly suitable for Python applications that you develop and push into a cluster so we will try to find another solution. So let's take a look at PEP tools. As stated before PEP does not manage any log file and this is something that PEP tools try to address. So you manage requirements in file that states all your direct dependencies and with PEP tools you run PEP compile that transfers these direct dependencies resolves them into requirements TXT and then you have two files requirements in requirements TXT. These two files you can push to your grid repository and every developer that is cooperating with you pulls the repository and runs PEP sync inside virtual environment to install dependencies that are stated in requirements TXT file so already resolved. You have kind of reproducible environment and as stated you need to manage two files that is requirements TXT requirements in by convention you can also state your developer development dependencies in depth requirements. Sadly PEP tools does not enable hashes by default. This is something good to enable so if you are using PEP tools you can turn on hashes generations that is performed by PEP compile generate hashes and as said PEP tools does not manage any virtual environment so you need to do on your own. Then the workflow looks like this you create virtual environment you activate it you install some dependencies and commit files like requirements in requirements TXT and then your co-worker can pull changes create his or her own environment and perform PEP sync to install dependencies the same dependencies as you had. This is quite verbose and that gave birth to another tool that is called PEP ENV so with PEP ENV people tried to address this verbosity and PEP ENV manages environment for you so it manages creation virtual environment automatically syncing packages updating these packages and everything is stated in two files the first one is PEP file the second one is PEP file lock and in PEP file one states direct dependencies with some additional configuration like Python version requirements source configuration and stuff like that. Then PEP ENV takes this PEP file and resolves the dependencies so it actually implements a resolver and states resolved software stacks inside PEP file lock that is a file in JSON format so it manages virtual environment for you. It has also some neat features like deploy so when you are deploying your application PEP ENV can ensure that you're running proper Python version your direct dependencies did not change and stuff like that. A good thing is that you can ship PEP file and PEP file lock with your application and then anyone can download PEP file and PEP file lock and reproduce Python environment that you had. It's also also to state that PEP ENV is the recommended tool by the Python Packaging Association for managing log files and PEP file lock is let's say now standard for managing log files so we have quite easy workflow what we do we PEP ENV install and then we deploy application. Suddenly one of the disadvantages of PEP ENV is that it's not that verbose it hides a lot of information and when you are installing dependencies and PEP ENV fails it does not produce logs many times that would be helpful so you need to dig into issues and find out from logs what went wrong. The last but not least that is the least last but not least to that is available out there is called Poetry and that is a community effort to address issues in Python packaging. It uses two files by Project Toml and Poetry lock that are two files very similar to PEP ENV files but not not compatible and Poetry manages these files for you and it also manages the whole lifecycle of your application so it acts more like a tool that also enables you to publish your source code on PyPI and release lifecycle and stuff like that so it assumes that your application is actually a component or application not just library. It also uses non-standard version specifiers that is kind of set and some metadata stated in the files are written down such as description of packages but Poetry states it in the in the files and these were tools that were available out there and we decided to implement our own and that is micro PEP ENV micro PEP ENV on its own it's very lightweight wrapper for PEP so it has something like 900 lines of code if we count also comments you can end up with 1200 lines of code but it's really lightweight wrapper addition to PEP it uses internal PEP logic and everything is implemented in one single file and micro PEP ENV has two dependencies one is PEP as it relies on the PEP's internal logic and the other one is Toml or PyToml that is optional dependency and is used for parsing PEP file or Poetry specific files that are written in Toml language suddenly Toml is not in the Python standard library yet but there are discussions to include it so hopefully we will get it available there and I stated it's one single Python file so it has one advantage I will show it to you the simplicity behind implementation is that it does not use resolver it already uses implementation that is available inside PEP or it uses packages that are stated in lock file formats so there is no need to have any resolver implemented and with this it is some complementary tool to the tools stated so it does not compete with tools I've discussed but rather it creates a new layer compatibility layer for all the tools that I stated so how does it look like if we want to visualize it we can see micro PEP ENV as one layer that is using PEP under the hood and then optionally also Toml and the input is any requirements file as produced by PEP PEP tools PEP ENV Poetry it automatically detects what what lock file format is used and it can automatically install these dependencies why is it beneficial so if you imagine running micro PEP ENV in an open shift S2I build process users can use whatever type of whatever tool for managing dependencies micro PEP ENV automatically detects what dependencies or what dependency format is used installs dependencies transparently and the only thing that is present is micro PEP ENV itself PEP and Toml so it's really lightweight addition to Python S2I build process and can make many users happy so micro PEP ENV was designed for containerized applications as you saw and as we will see it was designed for open shift S2I Python build process but it's not limited to this use case originally we had issues with PEP ENV as community was not that active and the release process behind PEP ENV is quite difficult and long when it comes to manpower that is required to go through the world release process also PEP ENV bundles a lot of vendor dependencies that we didn't want to ship in Python S2I containers and we really wanted to introduce a common base for installing dependencies so we introduced a tool that can rule all the open source tool for managing and installing Python dependencies as micro PEP ENV is very lightweight they also reduced builder container size so Python S2I was reduced by 13 megabytes almost with micro PEP ENV we also took care about logs as we want to have applications easy to debug so if there are some issue during installation we really wanted to have verbose but not that much verbose logs so people can debug and see what's happening wrong with their application and we also wanted to reduce maintenance burden when maintaining PEP ENV itself and with all the bundled dependencies you can install a micro PEP ENV using PEP so issue PEP install micro PEP ENV you can state optional extras the DE is Tomo and then you just run micro PEP ENV install optionally you can pass a dash dash deploy tag that mimics PEP ENV dash dash deploy that is used as an option to install command I stated that micro PEP ENV is a single file so if you do not want to install micro PEP ENV for any reason into your environment but you have access to the internet you can simply issue curl command that downloads the Python script and you can pipe it into the Python interpreter pass install and optionally and an option to PEP this dash dash user and everything automatically happens so you don't really need to install micro PEP ENV the prerequisite is to have PEP on your system that I believe all Python people have micro PEP ENV it's also available in federal you can install it using DNF install micro PEP ENV this way I would like to thank Lumiere who packaged and maintains micro PEP ENV in Fedora if you want to give micro PEP ENV a try in containerized environments you can use Python based by S2I and you can pass enable micro PEP ENV you need to set enable micro PEP ENV environment variable that activates micro PEP ENV that is already pre installed in Fedora based Python 3 S2I container images this this feature is available only for Python 3 container images as micro PEP ENV supports only Python 3 and if you are using Python 2 then you should also consider switching to Python 3 again I would like to thanks Lumiere from Python team for cooperation and making this happen hopefully micro PEP ENV will be also available in well and UBI based container images and another way how to try micro PEP ENV is to use TOTS enhanced Python S2I container images so these are container images that are enhanced by TOTS team and we install micro PEP ENV into them they ship micro PEP ENV by install so you don't need to set any environment variable and you can also find UBI 8 Python 3.6 container images or Python 3.8 running also UBI 8 so if you want to do so feel free to pull these images and use containers with micro PEP ENV shipped as stated at the end I would like to say a few best practices so if you are developing a Python application really use log files and ship these log files with your application consider pinning all the dependencies that you use to specific version that can reduce your maintenance costs later on and it will give you the power of reproducible installation installations so if you come back to your project one year later you can still see what packages were present and what packages you installed for running your application even thought these packages will be for example deleted from PyPI or these versions will not be available you still will be able to track down what software you used on Python level also having these log files gives you the power to do integrity checks and also provenance checks so you are sure that you are using the right artifact and coming from the right Python index that you configured so feel free to use PEP tools that manage requirements in requirements TXT but don't forget to enable hashys as I did in the presentation or use PEP file log as used by PEP ENV or if you wish you can also use poet3 and poet3 specific log files I mentioned a few times project taught so at the end I will just briefly discuss this project so Micro PEP ENV was born in project taught that project is in AI COE that stands for AI center of excellence we are team in the office of the CTO and we are trying to make Python a better world to code in to develop applications and as you can see also push these Python applications to open shift build process and run these Python applications inside your cluster if you would like to follow our updates you can do so we have a home page we have a separate organization on the github where we post almost all our source code we have also Twitter for updates what we do we do a recommendation engine for Python applications and we do an advanced Python resolver that can help manage your application more sanely and also deliver better software when it comes to quality don't forget to subscribe to our YouTube channel where we post updates and you can follow also our our scrum demos so we post updates there that would be from me if you have any questions feel free to ask and I will post this presentation in the in the link and it will be also available in the totestation organization on github in talks repository awesome thank you so much so we do have a couple questions um are i'm going to assume the question about the hashes was answered during its best practices but if you want some elaboration by all means feel free to comment and chat uh the other big question was where to go does micro pip env have integration with miniconda 3 or venv no uh so if you want to use micro pip env you need to manage your virtual environment on your own so that's something that is already done in case of uh open shift container okay the process are that yes was to the hashes question being good I guess for your own application the question was um what are the advanced advantages with using generated hashes but it sounds like you covered a fair amount of that in the uh the best practices in talking about locking in so you can reproduce it at a later point okay yes um excellent um I don't see any other questions uh thank you so much for this um this was a very interesting world been tour of um a lot of tools that I hadn't really played with just yet um and and kind of seeing that progression was a really cool way of presenting this um yes ours all clear that cool um so awesome thank you so much I really appreciate you coming out for this