 Hello everybody. Have you ever wondered what does it mean to maintain an RPM package in Federa, CentOS, CentOStreem, REL or any other? If so, I will try to answer some of your questions today. I'm a little bit afraid that I will open more and more questions than I actually answer, but I have plenty of time after the talk so we can discuss it. My name is Lumier. I'm from Python Maintenance Team in Redhead and we maintain Python packages, quite a lot of them. So without further ado, the agenda is kind of packed. I'm not sure I will manage it in 25 minutes, so I'll try to do it quickly, but we have plenty of time after that for questions. So let's start. This is probably a well-known bell curve describing the normal distribution of basically anything and if we have it for the users of a software, then the average user is somewhere in the middle and the line here is a software and on the right side from your point of view, it's too new, not well-adopted, not tested and so on and on the left side is too old, absolutely full of flaws and full of problems and security bugs and so on and so on and it's average, right? So my wild guess is that a lot of you is somewhere here in the middle, something which is still supported but well-tested, not too new, not too old and that's fine. The problem for us, RPM Package Maintainer, especially in Federa and Redhead Enterprise Linux, is that we are on the both ends of the both extreme. No software is too new for us. I will show you some examples and no softwares are too old for us. I'll also show you some examples of that. Basically, the flow of the code looks something like this that's, if you really don't know what does it mean to maintain an RPM Package, that it might simplify it, look like this. So somebody creates a software, library application, whatever and on the other hand, there are users who want to use those applications and libraries and in the middle is a magic man or woman, the maintainer, package maintainer, because usually if you want to install some software, use it, you see it on the internet and oh, this is a very nice application, I really want to install it. If you use a Linux distribution, probably your first step will be DNF search, YAM search, Aputus search, something, something and you really wish the application to be packaged into that distribution because it means that you can install it and somebody else take the care of all the stuff around it and compilation and testing and all stuff. And that's who we are in the Python maintenance, but not only in that one team, also in other teams as well. We are taking the code from stream, creating RPM packaging and delivering those packages into upstream, into our users, customers and so on and so forth. I won't go much into the detail today. I won't be telling you about the comments and pushes and PRs and updates and here and there. What I will like to show you is what challenges you might expect if you want to be an RPM Package maintainer. So, I will use a couple of terms here. The list is short. When I say upstream development, I mean the development on GitHub, GitHub, somewhere, the upstream code, the people actually writing applications or libraries. When I say downstream development, I mean the development in the Linux distribution, the packages who are packaging the stuff, writing spec files and so on. When I say CVE, I usually mean some security flow, security vulnerability. And EOL is a shortcut for end of life software without any more upstream support. So, let's take a look. If you want to be a package maintainer, what's your responsibilities? You should try at least your best to keep your package functioning, which usually means that it should work, of course. It should be buildable from sources. So, the process you describe, how to build it, how to compile it, how to test it and how to create a package from it should be repeatable after some time. So, it should be still buildable from sources. It should be installable. The thing that the fact that you have RPM package doesn't mean that it will work on your machine, right? So, it should be installable and should just work. It should be up to date. So, we should update it regularly. You should fix issues and potentially CVEs. That's something I will describe later in the deeper detail. You should also try to limit the impact on other packages, which is something we'll also focus a little bit later. And you should try to limit the impact on the users, which is even harder than limiting the impact on other packages. But the best thing you can do as an RPM package maintainer is that you stay invisible. If everything works, nobody knows about you, because they want to install software, they install it, and it works. I don't care who maintains it. It works. But if something breaks, then it's your responsibility. Let's start with some very, very simple example. The package name IO Pink. That's a very simple utility. It can measure the latency of your hard drive the same way as the classic Pink measured the latency of the network. That's it. It's written in C or C++, if I'm not mistaken. I'm in C. Yeah. And it needs only two dependencies to build GCC and make, because it used make files, and it needs to be compiled. So it needs GCC and make. And only one dependency to run G lip C, which is the core of all the Linux distributions of nothing you should be worried about. And no packages needs IO Pink to build. And no packages need IO Pink to run, which means that this package is a leaf, so-called leaf package. If you imagine the trees of the tree of the dependencies of RPM packages, that one would be at the very bottom. Nothing depends on it, which makes the situation for you kind of, you know, easy for the maintenance, because the probability something breaks for you is very low. Well, if anybody, you know, moves the GCC from version 13 to 20, like that, then it might break for you, of course. But other than that, the probability that something breaks for you is limited because you have very limited set of the dependencies. And the probability that you will break something for anybody else is also kind of limited because no other packages depends on those on that package directly. So only if you break it, you might expect some bug reports from users. And that's something which is really hard to measure because we don't know how many users are using the specific components, the specific packages. So what could wrong here? As I mentioned, somebody can find a bug in the application. That might happen, of course, or security vulnerability. And the problem might be upstream. So in the code of the IO ping itself, or the problem might be downstream, you forget to package some files, or manual pages or documentation, whatever. Or the problem might be on the left side of the previous image. So somebody update or change something and it might break your package, or you change or update something and it might break other packages or the expectations of users. In that case, what to do as an RPM package? Well, if the problem is upstream, the best way is to fix it, test it in Federa and propose the fixed upstream. And we are doing it a lot. That should break the first kind of expectation or that basically RPM maintainers just taking the code, packaging into RPMs and giving it to users. But that's not usually true. And I will describe it a little bit more detail later. The problem is that in Federa, we usually use the latest greatest software, right? And that means that we are testing the software with the latest possible components of the dependencies. And that means that it usually breaks something, especially in case of Python. So we are the first ones who actually see the problem. So it's not just about, hey, I can report the problem, which is the second best thing you can do. I can just report it and wait for the upstream to fix it. And after that happens, I can just take the patch or the new release updating and it's done. Usually it's not. So the best way you can help as a package maintainer is prepare a fix. You can test it on Federa where the original problem appeared. So basically, the environment you want to test it is actually already prepared for you. The second best thing is report the problem and wait. That might be also valuable, especially if you understand the package. And you can provide all the necessary details. I mean, all the versions of dependencies and configuration flags and compilation flags and all the details around that might be really helpful for upstream. And by doing that, you might actually, you know, make upstream to fix that problem very early for others. So you uncover a problem, you fix it, report it. And those might make that no other users on the bell curve in the middle will ever know about a problem being there at all. Or you can do nothing which is bad. If you have broken dependencies, then the first point is the same as a bug or security vulnerability. You just need to take your software and adapt it to the newer version of your build or test or runtime dependencies, which might be might be a little bit hard. Another thing, if you are very lucky and somebody is moving a package to another my version like switching from GCC 10 to GCC 20, for example, if you are lucky, they are aware of you depending on the GCC 10 and they might create a compact package for you, which means that you can switch the build dependency and say, okay, for now, I'm fine. I have to take a look on that because the GCC 10 won't be there for a long time. But right now, I'm safe. Users can still use the latest version of IOPING, but I have to think about it. But honestly, that doesn't happen so often. So you have to switch your package to the latest version of dependencies. And of course, if the problem is downstream in your package, then it's your responsibility, right? That might sound like a lot of work, does it? That's nothing. Not yet. We are not there yet. Let's move to the other side of the problem because we have a cookies. Oh, not actually cookies, but we have a lot of stickers. So after the talk, I will give you some stickers. So let me step aside for a while. I will describe in a more detail what we do in Python maintenance, especially in Federa. Federa loves Python. Almost all of them. You know, Python 2.7, not that much, but still. And we would like the Federa to be the best distribution for Federa developers at all and for Python developers, sorry, for Python developers at all. Which means that we maintain multiple Python interpreters at the same time, continuously. For some of them, mainly because we have to do the same in REL. And that means that if you want to develop a Python application for REL 8 or for REL 7 or for REL 6 or REL 9 and future 10, then you can use Federa for that. You don't need a virtual machine or anything virtual with the old distributions and stuff. You can take Federa, take the Python whatever you want. You can create a virtual environment for that, write an application, test it, and it's all great. If you are on the other side of the bell curve, as we are sometimes, then you might want to test your application with the latest, greatest version of Python, which is also possible. We have Python 3.12, which is now first beta, but we have it in Federa since first alpha, which means that you can really test with all the supported Python, and we support a lot of them much longer than upstream. And also, we have some alternative interpreters like PyPy. We had also Jyton, but no anymore. And also, we are very fast, really. It takes only seven days for 3.11 alpha one to appear as a build in Koji, which means that if you really need the latest, greatest Python, there is absolutely no need to compile it from sources from you. And that's really, really good. Only three days for first beta and only three days for 3.12 alpha one after it was released upstream, it appeared in the build system of Federa, and you can use that, which is quite awesome. And what's maybe more important than that is that these are much faster than the usual CI providers. You know that you can use Python for testing on GitHub actions, and Trevis, and CircleCI, and whatever, but you will wait months after the official release of Python 3.11 or 3.12 in the future. You will have to wait months after the official final release to appear in the CI system like those. You are not a Federa user? Shame on you. But it's not a problem. Because we offer all the Python's, including with the docs, which is a great tool which enables you to test one application with multiple Python versions, even in parallel if you want to. So all the Python's docs based on Federa as a container image available on Docker Hub, and also as a GitHub action. So you don't need to use Federa directly. You can use it only in the CI to test with all the supported Python versions. All right. But the situation for most of the interpreters I mentioned on the previous slides are the same as for the IO-Pink. Because the alternative versions are basically the leaf packages as well as the IO-Pink is. Because nothing should really depend on it. They are not meant to be used in production. They are meant for development and testing. But there is one, the main one, the main troublemaker. And there is only one in every Federa release which is the main one. It's basically usually the latest stable release of Python. And that's the one which causes us the more troubles, the most troubles and most work at the very beginning of the Federa release. Because all of those are usually leaf packages for testing and development. That's fine. That applies also for the future versions. But the main one is the troublemaker here. Let's increase the complexity of the example with IO-Pink a little bit. The graph for Python 3.11 which is the main Python interpreter in Federa 38 and Federa 37 looks like this. We need 45 packages to build it. We need 24 packages to use it. And by use it I mean the Python interpreter and its standard library. Not all the components provided by that package. But that's the part we really cannot affect, right? Something might break here and the probability on the left side of the image is much higher than for IO-Pink because those packages are much more complex than GCC and make. And the numbers are just really, really higher. But the funny part begins on the right side. 4,400 packages needs Python to build. And 5,600 packages needs Python to run. Like, wow. So, okay, what we can do? We can bump a version, build it in Koji, right? And release and do whatever you have to fix it for yourself, let's say. That's obviously something we cannot do. And that's the reason why it generates so much work for us. The reasons are basically two. First one is that the newest Python releases or the every Python release usually contains some backward incompatible changes, like removed the deprecated modules. Because when developers see deprecation warning for two years or three years, that's not enough. So when the module disappears from the standard library, then a lot of packages or a lot of tests and stuff like that starts to fail, right? That's acceptable. So there are some backward incompatible changes. And also one more problem is that Python is interpreted language. And that's true. But a little bit of that is compiled, let's say. And that compilation process creates the PYC files, which is basically a cache version of imported modules. The problem is that the imported modules, the PYC files are written on the hard drives to the locations where only root or super user can access, which means that we have to ship those files in RPM packages. And that means that those are installable only by root. So if a user imports a module and the Python tries to create the PYC files somewhere it's not possible, then it won't happen, right? And the compilation step would be doing the same again and again. And that means that we need something called mass rebuild. But that's somewhere in the middle of the update process. The first step of the update process is write a change proposal, right? You can see those on the Matthew's talk today. So write a change proposal, describe your plan, describe the contingency plan and so on and so forth. Then package a new Python version as a new package. We cannot just update the old one because we want to maintain it. So we want to keep the old version so we have to create a new package. And then rebuild the thousands of packages in copper. Copper is just, let's say an alternative build system we can use for testing. And during those rebuilds, you will uncover a lot of bugs. Some of them you can fix, some of them you cannot. You have to report the bugs upstream of the libraries, downstream if it's a packaging problem, upstream of the C Python itself because we are just humans, right? And Python is just a software, so full of bugs as well. Or downstream, fix it downstream in the package, you know, Python packaging as well. So really incredible amount of work and it usually, the whole process I'm describing here, which usually takes the whole year because Python releases are somehow aligned with the Federa releases, which means that fall this year we expect Federa 39 and also Python 313 alpha one. And we will have the whole year to prepare the Federa packages for the new Python version. Rebuild all the packages in the SciTech again. The SciTech is something like a branch in Git, nothing to be really worried about. So we prepare all the packages we know about all the problems we try to fix in copper. And then we can rebuild them again in the SciTech of the development version of Federa, Rohite, which is actually happening right now, the process of rebuild in the SciTech started a couple of days ago. And we are somewhere around 3,000 of packages already built. And then you merge those rebuild packages back to Rohite. And tada, you have the Python in that case, 312 in Rohite. And that's awesome, right? It's incredible amount of work. And thanks mostly to Tomas, you can talk to him after that. Tomas is doing incredible amount of that work for through the whole year maintaining that process. It's incredible. And some numbers, 3361 packages in copper, almost 50,000 of builds and just almost half a thousand bugzilla. And those are just bugs in bugzilla. I'm not talking about all the GitHub issues and GitHub issues and so forth. So in Federa, updates are really complex. And you have to be really careful because you can break all other packages or a lot of system tools, which is case for Python because a lot of things really in the system depends on the Python itself like the installer Anaconda and the DNF version four, the five is not longer in Python, but the older version is. So you have to be really careful. And but the benefit for the updates is that it usually fixes the security flaws for you and the bugs. So if you cooperate with upstream, it usually don't take that long to release a new version of the upstream project and you can update in Federa. You don't have to backport packages. You don't have to make your own fixes and so on. You can, but you don't have to. And you can do it in a similar way. As we do, you can assess the impact of the change in Federa if you want to. And if you are interested, stay here in this room and Carolina will tell you more about how we do it in a couple in the little bit more detail. But let's move to the other side of the bulk curve. What what if we cannot update? What we what if we cannot change it that much? That's a problem we have to solve in a rail. The promise for a rail customers and users is that we will try. I added the dryer mentioned will keep your system secure without breaking bad bar compatibility of a provided components, which means that the updates are usually out of question. Usually not every time. And I would like to describe that on the two security vulnerabilities we had to fix and how we deal with that on multiple levels to to show you that it might be really interesting and also really complex when it comes to fixing something in a older component in the old systems for the customers who depend on the maximum stability. Let's take the first one, which is a web cache poisoning in your lab, which is a standard standard library of Python. And the problem is that when you have a URL address here, it can be it. It's usually parsed somehow by proxies or web application and so on and so forth. And you have to set the delimiter for the key value pairs here. And World Wide Web Concertsium recommends using the ampersand only. Fine. But the Python had semicolon and ampersand both at the same time as a default. All right. That we don't need to go much deeper into the detail. The thing is that if the application is configured differently than the proxy in front of that application, it might mess up the results in the in the proxy and that's what vulnerability is called web cache poisoning. Okay. So right, what we can do, we can follow the same recommendation, right? And that's exactly what upstream did. All right, recommendation is to use ampersand. Well, switch the default to the ampersand. Mm, that's, that's too, too rush, you know, that's too rough. We have to find somewhere middle ground. All right, all right, let's switch the default to the ampersand and allow users to configure something else. That sounds great. But the separator keyword argument added into Python don't allow you to use more than one character. So we have to choose either the ampersand or the semicolon or whatever you want to, but not multiple at the same time. And that's a problem, right? So the Federa, we are latest, greatest, whatever it means. So we just followed the, the upstream resolution. We updated all of our patents because the fix was kind of quick. So it was implemented upstream. They did security releases. So we just updated all our components. And for the LOL interpreters, which are maintained by us and not upstream anymore, we just backported the patch. So Federa follows upstream and the new default is the standard. And that's what it is, deal with that. But for the RL, we cannot do that. That's a problem. So we basically took the patch from upstream, adapted in a way to allow the old behavior by default. And added the warning. So if you use that default, so which means that your application is not ready for the new code base, if you use that default, somewhere in your log will appeal a warning, appear a warning with a link to the documentation. And the documentation will describe to you what's happening. But after that, we still cannot expect our users to change their code base. So we have to find a way how they can set the new default without actually changing their applications. So in the patch, in the same thing, we have provided a way in the Python code itself, which is kind of obvious. But we also provided a way how to send it, set it, we are a configuration file in ETC, and also a configuration we are environment viable. So we didn't change the default behavior. We added a warning to our customers saying, don't do that, please. Because on the other hand, they might use the default for Python, I'm person and semicolon, and then might have the old some old proxy configured in the same way, and everything is fine. And if we would change the default, it might silently break a lot of stuff silently, that's a problem. So we added a warning, we added the possibilities how to configure that. So everybody can do that on any system sending variable, adding a file into HCC, and whatever. So we had to be very careful. And in that specific case, we have decided to not change the default setting. Another CVE and the last one I would like to talk today about is the tar file module directory travel zone. And if you know that the second part of the identifier is a year, then this one is waiting for fix for a long time. Really. I've tried to set to Peter like, hey, we can wait a couple more years, and then we can celebrate the 20 years of that CVE and fix it, right? That would be great birthday gave. But no, Peter decided to fix it this year. And we are actually in the process of doing that. So I would describe that. So the problem of a tar is that it was designed to back up the whole systems complete, completely, which means that it supports Simlink, Hardlink, Extended Metadata, some special files, relative paths, and a lot of stuff. The problem is that if you unpack an archive from untrusted source, it can completely break our system, really, which is not something we would usually need these days, right? We are no longer back up in our system on the tapes, magnetic. So that was it. But then there's a problem. What's the correct way? How to do that? And the upstream had the perfect resolution to nothing. We said in the documentation that blindly extracting archives is a bad idea. You should now I feel like in church. Extracting untrusted archives is a bad idea. You shouldn't do that. And you should inspect the archive before you extract it blindly. All right, that's true. Documentation said that. But the problem is that still the best place to fix it is in the tar file model. I'm going to be pushed from the stage. No, no, two more minutes, please. So the best thing or best place to fix that problem is really into Python. Even the documentation says that everything is correct in the Python. So Peter, my another colleague, Peter Victorin, decided to change it into Python, but it cannot be done in a backward incompatible way, right? So he wrote a PEP 706, which is Python enhancement proc browser, really low document, full of useful information if you want to read that. And it added filters to the tar fully trusted, which means that you know what you are doing tar somewhere in between, and data, which is the safest one, only allowing some capabilities of tar. Python now have a deprecation period in Python 3.12 and 3.13. The default is the same, but there is a warning if you use that default, same thing as the URL lib. And then in 3.14, the data will be the new default, which is how it works in Python. So in Federa, we basically followed the same thing and backported the patches into the URL interpreters. So now all of them behave the same way. And in the rel, we have decided to do even more strict thing. And we backported the page and set the strictest filter to be the default one in the rel distribution. So we can finally say that after many years, the vulnerabilities finally is finally fixed. And that's one backward incompatible change we had to do, and we will see what it will cost. So some conclusion, I know, I know, I know, some conclusion, being a package maintainer might mean a lot of things, a lot of different things for a lot of different people. If you want to maintain a leaf package with a useful utility, please do that. It won't cost the whole free time you have. But if you want to maintain something like Python in Federa and rel, be prepared to sacrifice something. And in Federa, it's good to know your packages, because if you do so, you can prepare patches, you can help upstream fix the problem. If a upstream is no longer interesting in those in their packages, you can maintain them, you can help to evolve the whole ecosystem, or at least create reasonable reports. In the rel, you have to. You really have to understand your package, the packages you have. You really have to be able to backport patches 10 years down the road to the very ancient component, open the book full of dust and backport the patch into that. So you have to really know what you're doing, and you have to be prepared to do some tough decisions like we did when fixing the vulnerabilities. All right. And that's all. Thank you for your attention. Because, yeah, I repeat the question. Why we have in copper less packages than actually depend on the Python? That's a great question. And you actually look at the presentation. That's great. So the reason is that we follow the dependencies between those packages, and we don't try to blunt blindly build everything. We first prepare the dependencies for all the packages, and then we add them. So it means that we want, we weren't able to build all of them in copper, and we have to start the side tech rebuild, which means that some of the packages weren't in the state, in the buildable state. So we didn't add them to the copper. Because not enough dependencies at that time. Thank you for your question. Well, the question is whether we try to discover runtime issues with the new Python. That's a great question. And honestly, the amount of work we have with the build time is a lot. So there is not that much time to do so, but we really think that every single RPM package should contain tests. So we really depend on the components, depending on Python, to have some tests, which kind of means that we should be able to get them, to get the problems in the runtime. But yeah, if the package has no test, there is no way to verify it works with the new Python. Yes, yes, unit test, mostly. Because during the build, in the build system, there is no access to the internet. So no that much of integration testing and stuff. But every single package should have a check section with a unit test. And if you rebuild the Python and then rebuild the Python test, we hope that Python test has as good coverage as possible. So it will, you know, uncover some possible problems. And I'm sorry for that. Thank you.