 So, hello everyone, my name is Lubomir, I work for Red Hat, mostly on maintaining tools for creating composers and I want to tell you something about how it's done and what problems there are that we face with them and how we are trying to address those problems and where to move from them. So first, let's actually take a step back and give a very brief overview of what the composer actually is. In this case, composing means taking some content for Fedora that's RPM packages from the Koji build system and create something usable from them. Because if you get 100,000 RPMs, it's not fun, you can't really do anything. So we need to create ISOs so that you can actually do stuff, create container images or simply an RPM repo with some structure. And that you can then consume with YAM or DNF. In Fedora, the tool that is used to create these composers is called Panji and I happen to be one of the people trying to maintain it. And to give you some extra perspective, composers in Fedora are really almost everywhere. It's not just the Rohite Compose, which is kind of hard to notice because it regularly spams the devil list with reports. There are also other composers, so behind each update push, there is Panji running and processing packages and creating the repose with updates. So what are the actual problems that we are facing with that? And this is specifically for Rohite and new branch releases. Well, they are slow and that's not particularly great. Like historically, the timings have been different. Like right now, Rohite takes just under five hours to finish. It used to be something like eight or nine hours, only a few weeks back. But there have been situations where the Compose took 15, 16 hours. And that's not really good. There are multiple issues caused by that, but the main problem is that if the Compose takes too long, you can't really iterate on development. So once a new Fedora is coming to be finished, if you need to wait eight hours just to be able to test or fix, that's just not great because that basically gives you one fix a day. Like you come to work in the morning and you see, okay, the Compose failed yesterday. You debug the issue, you write a patch, you build a new updated package and then you wait until the next day because that's how long it takes to get an updated package. So let's be honest here. One of the big issues is just slow things are not fun in computing. It might be nice somewhere else, but when you're waiting for your job to be finished, it sucks. So just so that we have some idea on what's taking so long, let's take an overview of what the Compose actually consists of. It is one big monolithic process, but it's split into different chunks and smaller pieces. So this is an overview of everything that goes into raw height and how Pange is split to create that. It goes from left to right and when there are multiple things on top of each other, that's where there's some parallelism. So it starts with a phase code in it that basically just takes some data and prepares the general structure. So this involves cloning the repository with Compose files. The module defaults files are processed here. And this is generally very fast under a minute. And then comes the first bottleneck and that's the phase code packet set. This is where we talk to Koji and ask, okay, give me a list of all the latest packages and then we go and check headers on every single one of those RPMs because we need to know what is that RPM actually, like where can we put it? The output of this phase is a RPM repository with the packages, one for every single architecture that we are composing for, and also lists of packages like what's actually in the repo. When we want to find some package, we don't want to open the metadata in the repo and search it there. And for raw height, this takes something like 30 minutes, at least in the Compose that I checked this for, which was, I think, Monday Compose. Then multiple things start working at the same time. Like on one hand, the first line, build install phase, that creates the installer image. The way it works is that Panji kicks off a task in Koji and says, okay, let's run Lorax to generate all the necessary files. And then it sits back and waits for the task to finish. Nothing very much in Panji to optimize in that. The OS3 path is very similar. It's just that it kicks off Lorax with different arguments to create and OS3 installer that bundles the OS3 commit in there. The middle line here, gathering and creating repos, that's where some work actually happens. And this is where Panji figures out what goes where. So there's everything, which sounds like it should be everything, but that's actually kind of a lie. Because depending on how you look at it, it's everything plus something extra, or it's just shy of everything. There's all x8664 packages, for example, but there's also something multilip. And not all 32-bit packages are there. So it's not just a matter of taking everything and shipping it somewhere. There's some extra logic. And everything is not everything that's in the compose. There are other parts. There's workstation, there's server, at least for now. And those are subsets of the packages. So in this case, we know a list of packages that we want there to be. Like if it's workstation, then we probably want some stuff. But we need to check the dependencies and make sure that it can actually be installed. Like we don't want to create a repo that contains bash, but not glipc, because then what do you do with that? And this is actually kind of slow. Like in the compose that I checked, the gather phase took all together about one hour and 10 minutes to go through all of the combinations of variant and architecture and figure out what packages should be there. And once we have the list of what to put in there, we also need to make sure that the packages are actually in the compose structure, because it's a bunch of files on the file system. We need to get them there. So we create hard links to the Koji volume, and every single package is hard linked into the Koji volume and to the compose. Interestingly enough, this takes also about 10 minutes. So there's like 10 minute phase where Bungie sits and does nothing but hammer the NFS with requests. It works, maybe not great, but it works. Once we know what we actually want to put in there, we can run create repo on every single combination of variant and architecture and create the repo with the packages that we just hard linked. This interestingly enough also takes about 10 minutes for all of them together. So again, not much to optimize you, although there are some possibilities. And once we have all of this, we can start building the extra artifacts that go into the compose. That's live media, container images, and what have you. The reason why we need to wait with this is that in order to create a bootable live media, we need the installer to finish. If Lorax fails to create a boot ISO, there's no way we can make bootable live media for any spin, any variant, anything. So that's why it has to wait. Again, in all of these phases, they're very similar. Bungie just kicks off tasks in Koji and waits. And this can take from a couple minutes to an hour. Turns out if you optimize the builders correctly, it can be much shorter. Like when I spoke at the beginning about the robot composites going for eight hours to under five, that optimization was done by changing the S390 builders and PowerPC builders and making sure they are faster. And that saved a huge amount of time. That actually makes me feel kind of bad because like no matter what I do in Bungie, like I will never see such speed ups. So that brings us to the end of the compose process. In the image checksum phase, we look at all the generated files and create checksums. Those are kind of useful for people who want to download it and check that they have the correct stuff. And it's actually separated in a separate place because we don't want to duplicate the logic for every single task that is doing something. Just computing the checksums on all of the images takes about 10 minutes, which kind of gives you an idea of how much stuff Fedora is producing. And the last part is the test phase and that's kind of simple. We just run repo closure on the generated repose and some quick sanity checks on the images that we created. Like for example, if we create bootable image, like we actually do and check the file that it can be booted, like it has all the headers because there have historically been bugs where we created stuff that we claimed, okay, this is a bootable image and it turned out you can't actually boot from that because we forgot to run some commands. So that's not going to happen again. So what am I working on right now to fix? And that is in this phase, I'm focusing from the start. So I'm now working on the package set phase. As I said at the beginning, what it does is take all the packages from Koji, mesh that all into one big pile and use that as input for the following phases. This builds on an assumption that historically has been true, that there is one Koji tag with all the packages and we can do this and it works nicely. But unfortunately in the modern, in the brave new world, this is not true because every single module that goes into a compose has a different tag in Koji with the actual packages. So essentially what is happening in the current implementation is that we look at all those different Koji tags, pull the packages into one big pile and then have to do some extra bookkeeping to make sure that the modular packages don't go where they are not supposed to go. This is suboptimal, let's say. So the current work that is in progress is to split this and make Panji aware of multiple tags. Oh, sorry. And have different package set for different tag. In that case, we don't need to do the bookkeeping. We just say, okay, don't want modules. Don't use these generated repos. This by itself is probably not going to speed it up too much but it will lay the groundwork so that we can actually do something more fun. Like one option is to reuse more stuff from previous composes. Like if we have one big pile with everything, in order to reuse something from the previous composes, we have to make sure that we only use the stuff that didn't change and it's difficult. If we have stuff nicely compartmental in nice smaller chunks, then, for example, for modules, we don't really need to rerun that because once the module finishes, if we are using the same module built in the compose, we know what the packages were. We don't need to check them again because given how Code.js is working now, once the module finishes, it doesn't change. So we could just reuse the stuff and not look at all the details in the RPMs. Another thing that we could possibly do with this change is to write a separate service that would listen on the message bus. When it would see a package was built, it would just update the repo and the list of packages so that this would be continuously updated and almost up to date. And at that state, basically the whole phase in Pange is just asking this thing, tell me where the repo is and what's the list of packages and I will use that and the compose will not have to wait for anything. Also, if this actually existed, we could split some other tasks from the compose into separate runnable things. For example, a lot of testing before release waits for installer to be created. But in order to create the installer, you need to have repo with the packages. So if that repo was always there, always up to date, you would just kick off the task to generate the installer and you would wait the 20 minutes it takes to create it. And that would be a fairly nice improvement in terms of waiting for testing of Pedge. But there are also other things that we could look into optimizing. And some of them I have actually started but I don't have specific benchmarks so I will not tell you too many details. Like for example, one thing we could do is try and make faster create repo. Like you may think that create repo C is already faster than the old Python create repo, but we could make more assumptions. Like for example, we know that builds in Koji have NVRs. And if those two NVRs are the same, we know the RPM is going to be the same if we get it with the same signature. So we could use this logic and not look into every single RPM. Like if we know what the RPMs were in the old compose, if the NVR didn't change on them, we can just reuse the data. So there might be some possibility for optimization here. It would not be a general tool because in the wild internet, you can find RPMs that don't hold where this assumption doesn't hold. It probably doesn't always hold even in Fedora. Like there might be cases where this is broken, but it's something to look into. And other things that I plan to do is to look at the code paths that were historically not really used that much. Because for example, I found that this out actually when I was preparing this talk, I looked at the timing of the compose and it turned out that figuring out what goes into the modular variant takes longer than figuring out what goes into everything, which is much, much bigger. And this is just because the code that is used to create modular variant is a different code path. It's working slightly differently to everything. And this was not optimized because it wasn't historically used that much. And this is a place where someone with just knowledge of Python could go and check all the loops and make sure that they are correct. The stuff is done in correct order. This could bring us some speed up as well. So that's basically the current status and where I plan to move from that. So expect some results in the next 10 to 15 years. And if you have any questions, I would be happy to try and answer that. Yes, that's a very good point. As Mohan pointed out, if we know that nothing changed in a particular image, we could completely skip creating it and just use the previous image. And again, this is something that could be helped if we correctly track all the steps in before that particular image in the compose. Right now, we can't really say what changed because all we have is like, we can say like, okay, the configuration for the compose didn't change. But generally in Rohite, some package is changed in the day-to-day business. We have no way of tracking what image is in what package is in what image. So we can't really make this optimization right now. But it's something that is possible and nice and would be useful. Yeah, that's a good point. It's only a couple minutes. Generally, yes, it would probably be an improvement because right now the test phase is done in serialized. So we check one image after each other and we run create repo for one repo for another, another, another. The actual creation of the repos is using a thread pool. So we are using like four create repo C processes at the same time working on the images. So yeah, this would also probably be a possibility to change that and test this as soon as it finishes. The other possibility is to just not care and turn these tests off. Yes? It's all done. Yes. So Panjie is consuming a bunch of packages from Kojie and it's doing that by directly accessing the Kojie volume. So the same volume is mounted in Kojie and Panjie. That's a good question, but I don't have an answer for you. Sorry. It's NFS 3.4 and these kinds of weird problems. So yeah, that's kind of just one thing. NFS 3.3 is a little more challenging. Yeah, yeah. To summarize for the recording, the NFS is not particularly great here. We tried doing something newer and cooler, but it didn't work. Yes? So I had a few quick things. At the end of the images phase, the live images image build back phase. I think, I have to check this again, but I think it waits for all this tests to complete and then, you know, figures out what failed and what didn't fail. But if one of those fails and it's a required image, then it fails to compose, but it waits for all of the rest of them to fail. So one possible optimization here is it could look and if an image fails and it's a required image, it could just say cancel all the other builds, fail now and fail worship. Yeah, so this is basically an implementation detail of how this works. Essentially, Punches spawns a threat for each of the phases and they go and do their own work, but the main threat that is monitoring is like waiting on every single one of those in some order. So if one finishes sooner, it waits until it's turned to wait for it. So this might be possible to fix in some way. For example, I would really like to replace the threats in this case with Python 3.0 async because it's not really doing anything, it's just waiting. Right, but I've seen cases where, you know, it doesn't fail really quickly. Yes, that's very much possible. Like the first task you spawn will fail immediately, but you still have to wait for everything else to finish. This is probably fixable in the code. We probably could do this. So the request is, can we make this more visible? So for starters, like whenever the phase starts or ends, we send a message. So if you're actually interested from that, you can listen to them and do stuff with them, but it's also possible to generate some summary. It goes with a caveat that as I said with the images, like for example, the timing on the phase is not really representative of how long the individual task took. Because for some, if there's like one really long phase, but it happens to be the first one to be waited on, then every other one will be artificially inflated. But having more visibility into what takes how long is something that would be nice. But it's not just a simple diff between timestamps in the log because of things like waiting for some other stuff. Yes? Okay, so the question is what is, what time we want to reach and what good will it bring? So we don't have a set time, like as fast as possible, sounds good. But the main issue we have with the long composite right now is, it takes too long from writing a building a fixed package to seeing some artifact that you can test it with. I didn't fix it, so you need to do another fix to that and it's blocking all the QAA. Yes? Just, I guess I'll point you there's a continual push I think in all parts of trying to get resources that are beyond just our infrastructure to do things how to do a very small subset of speculative composes outside our infrastructure that people can access to find out if those changes work before we engage in a whole composite. Not necessarily making the composes block on that, like being able to spin many of those over the course of the day in addition to our normal composes. Of course that brings up problems of how do you have the storage synchronized so that you're doing something in a public cloud? How is it getting access to the same packages? How do we keep those in sync so that we're not trying to deal, we're not having to do a harder job of tracking where changes went in or didn't go in? But those are problems that we're going to have to solve because we, over time, the pressure is gonna be aligned. I think that there are a, to kind of reduce the amount of part of where the loop. Another way to look at this problem is it would be great if we enabled Fedora contributors to run composes on their own. Right now, if you don't have access to infrastructure, you're out of luck, basically. I think our test that catches the vibe before it gets into the component is in the savings cycle. Right, exactly. Yeah, that for me is like, that's why those two, it's not interesting to me to talk about like doing that and not improving funging. I think like the bone of the rest of the ship. There is one doubt on my list of where to create the minimum composes. But it's still in the kernel interface, but very minimal to that all the time for the different main packages. When it takes a build, it will run the human component probably to make a decision on how to do it. You said for users to be able to run the composes. For any cases, I would like to ask the rest of us to eventually be possible. When we updated Python to 3.7 a couple of releases ago, we do it in a large site tag. And we have everything somehow works. And then we say, yeah, it looks good. And we come to the draw height. And then the composes start to fail for some unrelated reason. And I ask, could that have been somehow prevented that I have tested it? And the answer was not yet. But so is there any plan to support running composes against site tags? So, Pange itself supports that. The problem is that you probably won't have access to the volume with packages. So, if we solve that somehow, you would be able to just take the config from production and say like, okay, instead of F-30 Compose tag, I want to also use my F-30 Python tag. And it will figure, Pange will figure out like, what are the latest packages? So that would work. The only problem is the infrastructure side. We need to make sure that the packages can be consumed somehow differently than direct NFS access. If you convince people from the infrastructure to give you permissions, then you can. This is what you want to do, huh? Well, I mean, yes, that will work, but will that be performant enough to use that? This is why it probably comes, it really comes down to how are we gonna get to a model where our package data is not restricted to the NFS, or whether it goes somewhere that our contributors can like, you know, spin up a cloud machine and access it and do what they can do, and we can control access in a way that, you know, makes that data secure, but gives them all the power to do other things, you know. So that's a key problem to solve, I think. That's not necessarily for you. Sure. Okay, so I think we are out of time, so thank you very much for coming and enjoy lunch.