 So welcome everybody, as you can see this talk will be about Coste, which is quite new service for continuous integration service for federal packages. Coste is able to scratch, rebuild packages when their dependency change and this way it helps developers to fix bugs much more quickly than without the use of the service. Coste is not a typical CI service as used in software development and I will try to explain what Coste is about, what is the problem it's trying to solve, how it was designed and how it was implemented. Also Coste is not perfect yet and I will try to mention future development possibilities. So first what is the actual problem that Coste is trying to solve? Buildability as a measure of software quality. I will first explain what I mean by buildability. A package can at given time be successfully built in distribution or not. So whether it builds successfully or not, I call it buildability here. And the state of package in which it doesn't build is commonly called FTBFS which stands for fails to build from source and I will be using this term in the presentation. Buildability is often the only criteria used for releasing a given package. Sure we have quality assurance team, we have tests, users can also test updates but for I would say most of the packages as soon as the package is built it's being released and there is no other way of testing. Packages are recommended to have unit tests run as part of the build process and checking buildability means running all the unit tests and verifying whether upstream tests pass or not. We have a constantly growing number of packages. Nowadays it's not only the main Fedora distribution but we have a number of site projects such as Copper. Also software collections are starting to be more popular and it means that with not so many more people we have a lot more packages to take care about which means less time for looking at a given package. And it means we need better tools. A big problem is that people are often not aware that their packages don't build or have some problems. In case of less popular packages no one will report bugs and the only moment of time when a package maintainer can find out that his or her package doesn't build is time of mass rebuild. But at that time it may be difficult to fix the bugs. At last time really increases the cost of fixing bugs. Obviously it's much easier to fix a bug as soon as you find it. If you find a typo when you are working on some piece of code then you can fix it very easily immediately. If it's one hour later it's easier but if it's a half year later you may spend a considerable amount of time refamiliarizing yourself with the code. And also more bugs appear. This makes more difficult to understand where is the cost because different bugs can depend on each other and hide each other. Sometimes to fix one bug you need to fix five or ten different bugs before. So this all makes it very important to fix bugs as soon as they appear. It's fixing bugs quickly means fixing more bugs. And that's an important goal of Fedorazza and I understand improving the quality all the time. So that's the problem as Kostje is trying to solve. It's some subset of all the problems we have of course. We can't fix everything. And how Kostje is trying to solve this. So Kostje is continuously monitoring package buildability. So for packages it tracks, it knows all the time whether given package is built, can be built or is FTBFS. Kostje is also informing interested parties about FTBFS. So anyone interested in knowing FTBFS state of given package can subscribe and get real time notifications via FMN on either IRC or via email. But telling people that their package is broken is often not enough. People are busy and don't have time to figure out whether given problem is... Because basically I usually work on those packages. I don't fail to build back-offended response on my container where I create a bug. If you have it, you can give info that this package fails to build on white patches. And the question is, the problem is a lot of maintainers think it's okay. It builds for me around here. It builds on primary and proven packages I would just push. If it breaks something for maintainer, you will have time to respond. Sure, but that's a different problem. Kostje cannot help their maintainers that their packages don't work. But if maintainer is not interested in the package, there is not much to do besides committing the patch by proven packages, as you said. But Kostje is also trying to provide more information about given failure. So very often failure to build occurs after dependency change. That's a common reason because that's the environment. What else can change? There can be either random failure, changes somewhere in build system environment like kernel version or build configuration. Or changes somewhere in the package or its dependencies. So the most common thing that happens is dependency change. I've considered a few options where the packages should be revealed. As you know, we have a lot of packages, about 17,000 currently, if I remember correctly. And it's not obvious where all the packages should be revealed. And the first thing that comes is having some dedicated hardware that will be used for that purpose. But that needs a lot of money and ongoing maintenance. So I've been looking for other solutions such as letting individual maintainers rebuild packages on their own machines, maybe joining some peer-to-peer network or another solution like that. Using Fedoracogy, Cope or Cloud, these are all existing solutions that would be reused for that purpose. Each solution has, of course, its own pros and cons. And the one that was chosen is Fedoracogy. Fedoracogy is an existing solution, so nothing else needs to be developed. It has a lot of spare resources. Koji has, I think, over a hundred builders. I'm not sure, maybe it's a bit less. But all of them are idle most of the time. They're sitting there doing nothing. Koji is just using these spare resources that would otherwise be wasted. Koji is also a canonical Fedora build environment. So by using Koji, Koji is using exactly the same build environment as official builds. And this minimizes any possible hazard factors that could occur if we were using, for example, copper. There are no cases where package builds in Koji but not in copper and vice versa, of course. Koji is maintained by Fedora infrastructure, so Koji developers don't need to do much there either. And there is no need to transfer much data across network. As if we were using some data center in Europe, as I considered initially, we'd need to transfer a lot of RPMs across the ocean. And that would cost more, probably, and it would be just slow. So that is the choice. Koji was chosen because it's the best place to rebuild the packages, in my opinion. Another question after where to build packages is how should they be rebuilt? What should be the criteria for running rebuilds? The first thing that comes to mind is rebuilding everything periodically from time to time. Given resources, how often would we rebuild the packages? Probably not more often than once a week. It's much better than twice a year as we have with mass rebuilds, but it's still too long. Knowing that you broke something after one week is not helping much with fixing the bug. It would be much better if Koji would rebuild the package more often. So maybe more important packages could be rebuilt more often. Some packages have higher importance to the distribution than others, and we could rebuild them, for example, nightly, and the others weekly. But not many packages could be rebuilt this way, and there is a question of how to prioritize the packages. There could be some leaks, but it's not perfect. Rebuilding after dependency change is another option, but rebuilding everything after every dependency change would require a huge amount of resources, which is beyond what Koji currently can offer. So there's need for some other solution that would combine all these considerations. Koji rebuilds dependent packages, so if something depends on a package that was updated, for example, Koji will rebuild all packages that depend on it, but it's not the only factor I will talk about in more detail. So Koji, as it is, is a tool for continuously scratch-rebuilding packages using Fedora Building Infrastructure, Koji. That's how I can define Koji. Sorry, Koji. The name came from Koji continuous integrations. It's always difficult to come up with a new name for your project, so this was done by just grabbing in the dictionary. Yes, actually the word Kojce is a character from Slavic folk beliefs, and all Koji developers are Slavs, so it naturally fits there. Now a bit more information about the design, how Koji was designed to work in practice. There is a set of packages. It could be all Fedora packages, and the aim was to finally be able to track all the packages, but currently it's just a subset, about 22% currently. Koji is reporting buildability via FedMessage and web interface. Package state, buildability state change is broadcasted via FedMessage. Details can be seen through web interface. Koji is constantly monitoring Koji resources it uses, so that it doesn't overload it. Overloading Koji would be bad. Koji is taking steps to prevent that, and it's prioritizing rebuilds according to different criteria. I'll talk about them in more detail a bit later. Debtorority consists of several things. First is time since last revealed, so the more time elapsed, the higher chance of given package being revealed by Koji. Also dependency changes. Koji knows about package build requires, and all the transitive build requires. Distances are taken into account, so if something is a direct dependency of your package, then its update has a bigger effect on the priority. And if it's a very, very far dependency, then changing it is not considered as important. Also the previous state, so if package is currently failing, it's known as FTBFS, then it has higher priority of being revealed. And the reason is that people don't have time to look at failures, and by rebuilding these packages sooner, Koji minimize the time that developers need to spend for looking at the failing packages. So it's just for saving people's time. There's also package importance. Some packages can be marked as more or less important, and they will be revealed more or less often accordingly. There is also a possibility to force package revealed by setting temporary high priority. This is useful for manually forcing package revealed. If a developer is interested in knowing the latest, the most up-to-date status, they can just force the rebuild themselves. And there is also a possibility of writing custom plugins where people could provide plugins for different types of packages. For example, parallel packages would have different criteria for rebuilding based on some custom plugin. There aren't any plugins used yet, but there is such possibility. Koji has a database, and it tracks a number of things. Some of them are just cached from other services like Koji or package database, but it has a number of useful information as well, so it has a list of all the packages. Packages can be grouped, so one package can belong to zero or more groups. I'll talk about groups a bit later. It also knows about the real builds, so the official builds done in Koji, not just the scratch builds. And it tracks all the dependencies and dependency changes in the database as well. So that's an architectural overview. Koji has a front-end and four back-end services. At the central place, there is a database. Koji back-end talks to Koji Hub, Koji packages, package database, and also the message. Users get notifications from that message, and they can check out front-end for further approaches to get the information they want. Koji scheduler, that's one of the back-end services. It takes care about scheduling package rebuilds. It uses priority scheduling, so it maintains priority queue of all packages, and when the condition for rebuilding packages is met, it's being submitted to Koji. Our requests are done from existing KSRPM, so not from Git repository. This means that if the packages commit some unfinished work in Git, and it doesn't build, Koji will not see that. This allows for ongoing development in master branch in SCM without causing FTBF passing Koji. It also is faster because we skip the step for rebuilding SRPM from SCM. All the builds have very, very low priority on Koji, so if they can never compete with normal builds or scratch builds by packages, or even patched rebuilds as a master build. There are a number of conditions that must be met for scheduler to submit the rebuild, and this is a build group that must be installed. Build group is also known as builds is build, it's a group of packages that are always installed in minimal build route. If the build group is not installed, then Koji will not submit any packages for rebuilding. Package monitoring must be enabled, of course. Of course, the packager has an option to enable or disable this. Package cannot be blocked in Koji, all the build dependencies must be installable. That means that there should be never dependency problems reported by scratch builds from Koji. Except for some minor differences between DNF and the AMP, but I'll mention that a bit later. Priority of the package must be high enough beyond some value set and configuration, and Koji load must be low enough, and by Koji load I mean first, task load. There are a number of questions how Koji gets this value, so all the builders in Koji are assigned to channels. Koji is just looking at default channel, it's totally ignoring disabled hosts, and hosts not ready are as considered as being totally overloaded. And it also computes per architecture load, so if at given point, an example, if at given point all the ARM builders are free, no load at all, but x86 builders are loaded in 60%, then the load considered by Koji is 60%, and it's about 50% threshold, so no rebuilds will be scheduled in this case. And also there is a limit of current Koji tasks running, currently it's 30 tasks, so this number can never exceed 30 tasks. Rebuilds are done on the primary Koji instance, so they are only for primary architectures. That depends, for no-ARC packages it's one random architecture, but besides that there is a possibility of override architecture, so by default it's all architectures, but there is a possibility to set just one architecture. Another service Koji polling is periodically checking statuses of all the builds, registering rebuilds done on Koji, and checking packages blocked there, so they can be disabled. And it's also periodically resynchronizing user ACLs from package database. User ACLs are used to display packages for given user. Costier resolver, so the third backend service, it analyzes dependency changes, it gets the latest repo data from Koji packages, it resolves the build group, and dependencies of all the packages, and Hokey libraries used for dependency resolution, that's the same library which is used by DNF, so it means that dependency resolution is done the same way as DNF does, which is different from Koji, because Koji is still using YAN, therefore Costier can report some dependency problems which are not seen on Koji due to differences between DNF and YAN. And the resolver service also updates priority, so if there is a dependency change, it will update priority accordingly, and after the packages are built, it will also reset the priority to its initial value, so it can start growing again. And the last backend service is Costier Watcher, it just listens for pet messages, so task state change when a stretch build finishes, repo done, so when there is a new repo to be processed for dependency changes, build tag is used for detecting new real builds, so whenever there is a new official build on Koji, Costier will also notice it and consider for as its own build, so if the package is failing, but someone fixes it, as soon as the real build finishes, Costier should know about it and mark the packages fixed. And also different package DB messages which are used for updating package information and ACLs. There is also a web front-end, I will show some screenshots in a bit. The web front-end helps see all the information about packages and the implementation. From the beginning, Costier was designed to be for the raw service, so it was implemented with respect to recommended best practices for the federal infrastructure and because of that, Costier is implemented in Python. It uses PostDressQL as a database, ORM with a limit for database access, it's written in modular way, it uses SystemD for each service. SystemD services allow for easy service management like stopping one service and having the other running. And as I said before, dependency resolution is done using Hokey library. So that is the main page of Costier. You can see the list of packages with the state. In this example, all the packages are Hokey. The columns can be sorted by clicking on the name. So currently, packages are sorted by name but it's possible to sort them by state, for example, to see all the failing packages first. You can also see the write-up corner of the login button, so you can log in through OpenID. There is a search form and you can search for packages. Group tabs allows to see all the packages belonging just to a given group. Package details, so each package has a page where you can see current status. In this example, there is architecture override. So this particular package, JFlex, will be revealed just on X8664. Anyone can change the override. There are links to different... In this example, the package is part of Groups Java and Maven. So if anyone goes to Group Maven, the package will be shown there. There are links to different external services such as Koji, where you can see all the bills for a given package. It's useful when you want to see if a given package already has FTBFS reported or not. This is where you can see all the logs of what could cause FTBFS package. It's useful for checking who's the owner of the package and so on and so on. There is one interesting link. File new FTBFS pack. When you click it, it will be redirected to a pre-filed form. That you can just verify and click Submit. This way you can quickly file FTBFS packs, which contain some useful information. There is a list of tasks for the package. Each task lists all the dependencies together with all the versions and new versions of all the chain dependencies. There is also a field of new dependencies since last build. Each build has some details. You can see which architectures failed, which succeeded and what word dependency changed in more details again. The current state is that Costier is implemented and running at Fedora infrastructure. The code is hosted at GitHub. The link will be at the end. It's packaged in Fedora and Lipple 7. We have FMN filters in production. Some statistics. By currently, I mean like a week ago when I last time I checked it, Costier had 3,700 packages. That's about 22% of all Fedora packages. Most of them were okay, so they were building fine. There were a few packages with unresolved dependencies. They are mostly due to differences between YAM and DNF, such as if package has build request on bin C shell. For example, that's resolved correctly by YAM, but not by DNF. There are a number of packages that failed to build. Interestingly, Costier submits about one build every 36 seconds, which amounts to about 100 builds per hour. At this rate, it cannot saturate code. It would be possible to increase this like five times or even more, and it should be. You said you didn't have the resources to build everything. Do you know roughly how we built everything? To build all Fedora packages? After all dependency changes, no idea, but there are a lot of packages such as RPM or G-Lipsy. An example, RPM had a number of updates in the past two weeks, and after each RPM change, we would need to rebuild all the packages. If RPM is updated five times per day, for example, we would need to rebuild hundreds of thousands of packages per day. After all the change, do you rebuild every single package? No, currently we just increase the priority. That's what I mean. The dependencies, because it doesn't depend directly on RPM, they only depend on it through the build process. But RPM is in the build group, so it's a build dependency of all the packages. I see. It's dependency on everything. The packages just wait in the queue, and they will be built later. Yeah, but the goal is... That was one of the options I considered, but the final choice was to use Fedorocoji. Currently it's not possible, and it's not even planned. Maybe another service could work like that, but it's not costly. Currently it's not possible. Costy has a hard dependency on Koji. It depends on Koji architecture and RPC calls. It's not possible to use anything else at that time. In the future, we're planning a tighter integration with PackageDB, so users, I mean package maintainers, will be able to enable stream monitoring directly in PackageDB. Most of the code is already implemented. We are looking into adding some user interface improvements, adding new queries to see, for example, the latest failures or the packages that fail most often and so on. We would like to add some remote interface or common line interface, so users don't need to rely on web browsers. Another interesting feature that I would like to see is removing the ScratchBuild results immediately. Currently, Koji is submitting a lot of builds and Koji keeps all the results for a number of weeks. I would like to be able to tell Koji to remove the RPMs that are resulting from the ScratchBuild immediately, because no one uses the RPMs anyway, just the build blocks. Costy could send more head message events. We are thinking about allowing non-rohite tags such as ePoll, non-automatic backfiling, but that's under consideration. Exactly, I would like to have a Boolean flag in Koji, a XMLRPC that once set would tell Koji to remove all the build results except logs. Of course, it can be an external Koji that would just use resources. So one thing for the priority, do you look at the size of the package? No, that's not considered currently. Well, not necessarily the size, but how long it takes to build? I'm manually looking at some packages that need a lot of time for building and assigning them to just one architecture. X86 currently is the fastest like building Eclipse. On X86 takes half an hour, but on ARM it's 20 hours. So it's... Okay, so there are some links. The presentation will be shared later on the mailing list if you want to see the slides. I don't think we have more time for questions. We're just out of time, but if... until someone tells me to leave, so I'm open for any questions or discussion if you want to discuss something, I'm available after the talk. Thank you for coming.