 For the next session, we do have with us Fridolin who'd be talking us about the improvements in the OpenShift Python S2Is, and if you have any of the questions related to the session going on, you can just post out at the chat box to the right side of your screen. So let's get started. Hello, everyone. Welcome to this talk. This will be about thoughts, improvements in OpenShift Python S2Is build process. This recording was created for DEF CON VUS, that is virtual event for 2020. I hope you are staying safe. My name is Fridolin and welcome to this presentation. Let's move on to agenda and let's take a look at what we will discuss in this talk. So first, we will take a look at what S2Is is and how OpenShift S2I process works, and I will explain how to turn your Git repository into a container running inside an OpenShift cluster. Then I will list existing solutions for OpenShift Python S2I and we will move on to thoughts. I will explain basics of thoughts, thoughts mission, and following that we will explain also why thought needed its own OpenShift Python S2I. But then we will explain thoughts addition to Python S2I, what is baked in, what it offers to you, and how you can use it. At the end, I will link some URLs to existing resources, to the web page we have, and also to organization on Quy where you can pull container images that are thought S2I enabled. So let's start. So what is S2I? S2I stands for source to image. And according to OpenShift's documentation, it means source to image is a tool for building reproducible Docker-formatted container images. It produces ready to run images by injecting application source into a container image and assembling a new image. The image incorporates base image or so called the builder and build source, and it's ready to use with build a run command. S2I supports incremental builds which reuse previously downloaded dependencies, previously built artifacts, and so on. So this is based on OpenShift's documentation. But let's take a look at it more in depth. So S2I is a way of building container images that are subsequently run in OpenShift cluster. This is a build process that can turn your Git repository into a container image. You just need to follow some very easy to follow conventions, and you can easily turn any application that is living in a Git repository into a container. If you follow this convention, in the worst case, you need to add some more things, but it's quite straightforward. These build container images are then deployed and run inside an OpenShift cluster with native support in OpenShift, and you can very easily scale these applications that are built using S2I, and you can benefit from cluster deployment and all the things that are with it. So imagine you are a developer, so what do you do? You create a Python application, and you push it to Git repository. If OpenShift cluster is configured, it can automatically build a container image using sources that are present in the repository, automatically based on triggers that are coming from this Git repository, or you can manually start the build process. This build process is running inside an OpenShift cluster, and it results in a container image that is subsequently pushed into a container registry that can be, for example, OpenShift's internal container registry. Then you can deploy your application, or this application can be also manually deployed based on triggers. So once the image is pushed into the container registry, the application automatically deploys and is available to users. So this is basically high scope or high overview of OpenShift S2I build processes. There are already existing solutions, so if you want to use Federa-based S2I container images, you can do so. You can pull them from registry.federaprojects.org, and there are available container images with different Python interpreter versions and also with different Federa as a base operating system or as a base container image that is used. You can also use UBI. UBI stands for Universal Base Image Container Images, and these are available in RedHeads catalog. And again you can find different versions running different Python interpreters. So here you can see listed Python interpreters in version 3.6 and Python interpreter in version 3.8, both running UBI.8. And you can also find older versions like UBI.7 or you can also find container images that are running legacy Python 2.7 if I remember correctly. Besides that, there are S2I container images. You can find listing of these container images in S2I Total Repository in Total Station Organization. These container images are automatically built and available inside QI, so you can pull them from these links. You can find more up-to-date information, more up-to-date listing of these container images in the S2I Total Repository in Total Station Organization. So that's it. And we will focus on Total S2I container images as of now. So why Total Edition to Python S2I? Before we start that, let me explain what is TOT. TOT is a project in AI Center of Excellence or AI COE organization in the Office of the CTO, RedHeads Office of the CTO. Our goal or our mission is to provide some tooling and also some platform that would make OpenShift a better platform to run AI. and ML workflows. So you can imagine that Python is the driving force for AI ML applications as of now. And thus we use Python as a programming language to analyze various Python machine learning applications, but also Python as a programming language to develop the platform for supporting our mission. We know that software stacks depend on a lot of components and software is changing all the time. You can see that there are different layers like Python libraries. Then there are native dependencies provided by operating system in form of RPM packages, for example. But you can also take a look at the kernel space and hardware that is available to run AI or machine learning workloads or software stacks in general. So you can see a base amount of platforms and available runtime environments to run your application and to deliver your goals using the given application. So this is like the highest scope of TOT. And while we were building TOT, we used OpenShift's S2I build process. We were happy with it and we delivered all the components using this build process. Later on we moved to Tecton tasks and Tecton builds. So everything I will speak about also applies to Tecton and the build process itself is not specific to OpenShift's S2I, per se, but you can use these container images like TOT's S2I container images to build software in Tecton. We used P-Penf to pin software stacks. So when we were developing components in TOT, we created a list of packages that should be installed considering direct dependencies of our applications. And the list that was created out of P-Penf is fully pinned down a software stack with all the packages in specific versions that should be installed in order to run an application. This listing lists all the packages, so meaning also transitive dependencies. And this is something you should really do, so you should use software like P-Penf to guarantee that your application states all the packages in pin form, so all the versions, and also P-Penf produces outputs that has digests of artifacts that are installed. And with this, you basically guarantee that your application is built always with the same software. If you run it or you build it today or one year later, you at least know what were present during the build process when you were developing your application. And software or tooling that is in the build process can guarantee that there is certain provenance of packages that are installed into the build container image, but also integrity of artifacts that are installed into the container image. However, after a while, we observed issues with P-Penf, or we experienced issues with P-Penf, and mainly the fact that community was inactive at that time, and it really did not release new versions of P-Penf that made community quite angry. And you can follow posts like if this project is there to just tell us that's an issue on P-Penf issue tracker, and it was not nice to see. We pitched P-Penf a few times to make it work for our components in order to deliver our install software into our components, and we produced our own fork of Tot-P-Penf. We tried to push patches that we maintained to upstream, some of these patches were accepted by upstream, and we continued using Tot-P-Penf, so our own fork. After a while, the community around P-Penf released a new version, but it was quite a long time when we had to maintain our own patches and our own patched build process, because the build process that it was using P-Penf was downloading P-Penf from IPI during the build, so we had to adjust that. When we were experiencing these issues, we realized, okay, so we don't have P-Penf, but do we really need P-Penf when deploying Python applications? If you take a look at P-Penf, it's quite a beast. It has a lot of vendor dependencies, and if you install it, it eats up to 33 megabytes of disk space. This was tested on Fedora 31 using Python 3.6. So it's quite a large project, and it's not bad because it serves its purpose, but on the other hand, for containerized deployments, it does matter because 33 megabytes in the base container image, which is used as a builder image, matters. So we were thinking about, okay, so let's try to reduce that size, and let's try to ship only relevant parts that are needed. And also, if you take a look at the build process, if you obey recommendations like every software should use pinned down software stack, then you can think of, okay, this build process accepts already resolved software stack. So if you take a look at how the build process is used, it uses, or is done, it uses already resolved software stack that is stated inside PIP file log, and all the packages are already resolved. So while installing these packages, you don't need to resolve anything. You just need to, or the build process needs to install software that is already present in a well-documented and well-known format that is PIP file log. So we observed that there is no need to implement resolver in the tool, and that led to introduction to a new tool that is called Micropipan. And Micropipan is a lightweight wrapper for PIP to support requirements, TXT, PIP and poetry log files, or converting them to PIP tools compatible output. This all, as mentioned, was designed for containerized Python applications, but these ideas are not limited to these containerized applications. You can find Micropipan on Total Station organization in Micropipan repository, and you can find it also on PyPI. The project is called Micropipan. The main benefits of using Micropipan is that there are no more vendor dependencies. Micropipan is really lightweight. It has just one optional dependency, and that's no dependency is Tomo for parsing Tomo files like PIP file, poetry log, or PyProject Tomo. And Micropipan is a lightweight tool, as stated before. You can find written everything in a single Python script that has more than 800 lines of code. And if you also, and license-heaters, it has 1,200 lines of code, so it's really a lightweight tool. But even though it's lightweight, it supports PIP and log files, poetry log files, PIP tools style requirements, TXT files, requirements in files, and also raw requirements, TXT files, as one would use, for example, with the setup PyScript. So one can say it's one single script to rule them all. Introducing one common base tool that can install dependencies from these files also offered an easy to follow installation log. Previously, when PIPan was used, PIPan is quite silent about what's going on, and if it fails, the log is sometimes not easy to follow and not very readable. So Micropipan gave it also some structure. And logs are now easy to process by people, but also are easy to parse using machines that enabled us build-breaker analysis in Project Tots. Okay, so let's take a look at the demo, and we have switched to terminal. I'm in a directory that holds contents of Git Repository that is available on GitHub, Predex Repository, S2I example Micropipan, and there are a bunch of files that I will talk about. So here's the repository. I think the most important file here is the OpenShift demo. It is the manifest file that states how to build the application and how to deploy it. So you can find route, service, deployment config in here. You can find two image streams. One is S2I.UBI Python 3.6. That's the builder container image that is S2I.enabled running UBI 8 with Python 3.6 interpreter. Then you can find also S2I example Micropipan image stream that holds content of the built container image that is the result of the build process. Besides that, I will talk about build config. So this build config uses builder container image that is S2I.Tot Python 3.6 as described below. And the resulting container image is pushed into S2I example Micropipan. Here you can see information about additional configuration options that are supplied into the build process. You can find more information in the readme file. And besides OpenShift.yaml file, you can find also ep.py that is the Python script that is run. It runs a simple flask application that simply says hello.tot. There are also log files. So you can find pip file log as produced by pipand based on pip file, which states just direct dependency. In this case, flask in version one of one. You can find Poetry log file that is log file as produced by Poetry based on dependencies that are stated in piproject.toml. In this case, flask in version 103. There are also two requirements files. One is requirements in file that states direct dependencies. In this case, flask 111. Another corresponding requirements TXT file that holds all the dependencies that need to be installed for flask 111. Together with digests. And here you can see requirements un-pinned TXT that holds just flask 0.12.1. So there is no fully resolved software stack. It's just flask that gets installed. So let's see how micro-pipand behaves or different type of formats. Let's also see process OpenShift.yaml. And by doing so, we should be able to see all the objects created inside an OpenShift cluster. And we should see a build that is triggered. So once we switch to OpenShift's web console, we should see as to an example micro-pipand build that is running inside the cluster. Here you can see logs of the build process. And it will take the repository that is available on my GitHub profile. So that's the content you've seen and we went through it. By default, micro-pipand will use pip-file log that is present in the repository. And this pip-file log holds all the information for installing dependencies. So here you can see a log that was produced during installation of these dependencies. In the repository that I've cloned, I created also other directories that just contain copies of files. So here we have files that are produced by pip-tools. So you can see requirements.txt file is produced by pip-tools, requirements in-file, and other file that is f.py for running the application. Now I will start another build, but now I will use the current directory that is pip-tools, which holds pip-tools specific files for installing and deploying the application. So we should see another build running inside the cluster when I switch to OpenShift's web console. I should see that the application is retrieved and I should be able to see very shortly a log. So after pip is updated, I should be able to see information that the installation process is using micro-pipand. And here you can see corresponding pip-file log that was created out of requirements.txt file with all the packages that are present in there. So in other words, pip-file log here is just another format for describing dependencies as were stated in requirements.txt file. As you can see, the build process is successful and the resulting container image is built, is pushed into OpenShift's container registry. Another directory I prepared is using files as used by Poetry. So here you can see Poetry log file by ProjectSomal and our well-known application.py. And again, I will figure the build from the local directory and I will switch to OpenShift's web console. So once the artifacts are uploaded, I should be able to see the third build figured and we already see it web console. So if I click on it and I check logs, these logs will be very similar to the logs we've already seen. So again, there is printed pip-file log. So I see all the packages in PIN form and in well-known formats that is produced by pip-env and all the packages with digest. So I can see that the integrity and provenance of packages is guaranteed. Once the installation process is done, the resulting container image is pushed into the cluster, the registry that is running inside the cluster. And now I will use requirements.txt files, these requirements.txt files. Now I don't have all the dependencies locked and again I will start the build process from this local directory. So requirements.txt file hold just one version of Flask but Flask has also transitive dependencies that will be installed and are required to run Flask application. So here I can see the fourth build that is triggered inside the cluster and in OpenShift's web console I can already see it and if I click on the logs, I see that the application starts to build but I see one large warning that these provenance and the integrity of installed packages cannot be checked. The provided requirements.txt file is not fully locked and not all dependencies are PINs, specific versions with digest of artifacts to be installed. So here you can see packages are installed anyway. In this case, pip resolver algorithm is used to resolve the software stack and at the end I can see pip freeze that is the output of command pip freeze and all the packages that they're installed but without digest and without information about Python indices. Once the build is successful, the resulting container image is again pushed to OpenShift's container registry and are allowed to be deployed. As you can see, this installation process has drawbacks and is not a recommended way of installing artifacts or installing Python packages. So that was the demo about MicroPipan and the S2I taught installation process. As you've already seen, MicroPipan was designed for containerized Python applications but it's not limited to them. If you have a use case where MicroPipan can be useful, feel free to use it. The project is available on PyPI under MicroPipan name. You can also find it in dot station organization. The repository is called MicroPipan and if you want to install it, you can install it also using DNF. So thanks to Lumir who packaged it as an RPM and you can install it using DNF install MicroPipan. If you would like to use the version that is available on PyPI, you can install it using pip install MicroPipan. Another addition in the S2I build process is a tool that is called Tamos and Tamos is a command line interface for communicating with TOT. You can find this CLI tool again in TOT station organization but now in Tamos repository and Tamos is some kind of interactive tool for communicating with TOT for obtaining recommendations. So Tamos detects your build time and runtime environment and submits this information as an input vector together with information that are required to resolve software stacks meaning direct dependencies of your application to TOT which does server side resolution. Check talk that is called reinforcement learning based dependency resolution for more information on this resolution process. In a nutshell how the build process works. So we have S2I build process that detects where the build is happening, what is CPU, what type of CPU the environment offers together with information about Python interpreter version and stuff like that. This information is sent together with information of direct dependencies to TOT API which trigger components that is called advisor which computes recommendations. So it resolves software stacks given TOT knowledge and once this software stack is resolved these recommendations are available to S2I build process which then installs the recommended software into the container image and the resulting container image is subsequently pushed to container image registry available in OpenShift as was seen in the demo. Tamos can be used also from your local machine so you can install it from PyPI using PIP install Tamos and you can simply generate a configuration file for your data repository using Tamos config and then you can issue Tamos advice to obtain information or recommendations on software that you should use. You can follow documentation that is available in Tostation.Linja and this documentation will go through parts of Tamos and how to configure it in order to use TOT. TOT Python S2I is available on Quy so you can find the listing of all the available container images in S2I TOT repository in Tostation organization. Some of the container images are listed here so you can find UBI 8 based container images running Python 3.6 or Python 3.8 or you can find Fedora 32 container image running Python 3.8 or container image that is based on Fedora 31 running Python 3.7. If you wish to use these container images in a build process you can follow the demo that I showed you or you can follow also a documentation that is present in our TOT Station organization in the integration section. You can find more information on TOT's home page to this TOT Station Ninja and all the sources are available on GitHub in our TOT Station organization. The project TOT is a project that is in AICOE office of the CTO. You can find us on GitHub as stated before. You can follow us on Twitter for any new updates and you can also subscribe to our YouTube channel if you wish to receive more information about the project and also more information when it comes to demos that we do and also periodic scrum sessions that we publish on YouTube so feel free to subscribe and follow us. Thank you for your attention. I think there are no questions in the chat box right now so let's get find out the discussion and I would like to thank Rulin for having this wonderful session over the OpenShift Python S2I and sharing out the resources in the chat box about the S2I thought which that hat is currently working over.