 So hi everybody, the church beside me start ringing the bells so I guess it's four o'clock. So we can, we can start. I've seen the, I've seen the schedule for today and I'm really, I mean I've seen the speeches in the parallel groups so I'm glad that you've chosen this talk about Federa Kopra and I believe I can tell you some interesting stories about behind the scenes of maintenance of Federa Kopra instance. So hello, I'm Avel, I'm the engineer at the Deadhead, I'm working on Kopra code and Miroslav is my colleague and manager at the Deadhead. Yes, and if you will have any question during this presentation don't hesitate to write it in the chat and I will try to answer it. So at the beginning something about statistics. Currently we have about 3,000 active users who maintain about 19,000 of projects and this eats about 13 terabytes of data in terms of packages and metadata and repositories on the Kopra backend site. Back a few years ago we started with I think four terabytes volume in AWS EC2 and we slowly increased it. We have a nice how to increase the volume on let's say monthly basis without interruption and nowadays we are at 16 terabytes and fun is that this is the maximum value for the volume size. So in the next months we will have to migrate somewhere else. Video backups using automatic weekly snapshots and currently during the last quarter we were able to build like 80,000 builds a day approximately but the previous quarter was slightly higher. I don't know it maybe relates to some Federa release cycle or something that was about 12,000 of builds a day and in peaks it's about 50 gigabytes of package data a day. So that's the reason why we are doing a lot aggressive cleanups. You probably know that we allow only one version of each package in one project or in one shrewd. We offer several RFEs for automatic project removals or a maximum number of builds in one repository and so on because otherwise we would be able to eat all the data very quickly. So we needed some tool to better understand how folks are using the storage. So there is a new page where you can take a look. There are some interesting data about what architectures are eating, how much storage users, projects and so on. And I would like to point out that the last year was really year of packet at least in terms of storage because they multiplied the storage use like five times during the year of 21. And another interesting chart you can see here how important for us is the Federa end-of-life policy. You know that we always send the email and we give you another half a year for preserving the data and after that we are removing it. When you take a look at the first arrow on the on this slide the red one you can see that's the point that's the time when we announced the end-of-life started. And within the first two weeks each ballot in that line is one week. You can see that we removed the duplicates and then the consumption is about flat for the rest of the half a year preservation period. And then the green one shows the time when the data are removed. Yes there still stays some kind of, I mean 30 gigabytes of data stay because some folks opt in to preserve the data longer for any reason. It's good that they can do that. Similar thing is happening with Federa 34 that's the line above that and the second red arrow. And maybe even more interesting is the black arrow in the slide. And back then when we were staring at this going up and up we thought that actually something is very interesting about Federa 35 that it's it's going so high and so quickly. And eventually we didn't started to catch up with Federa Roheit storage consumption. It appeared weird and we started debugging it and we found a bug in the printing mechanism. So we fixed it and we saved like two terabytes of data which saved us basically. But while I'm showing it to you it's really important for us to take care of the storage consumption here. So the top peak I at least I remember was time when we built like 50,000 builds a day and we were not on a limit. Most of the traffic was done by us and by Federa, Federa, Python team and probably rest those days. And copper is designed so so the truth put is not eaten by three three people or three teams. So we have limits and copper should be usable even in such peaks. So and I mean even back those days we had limit like 25 concurrent builds in one project. Nowadays we increase the quota a bit. One user can run 45 builders in parallel and 35 in one sandbox which basically is one project and some other security policies. So yes you can run 45 builders or 45 builds at one time but you have to split it across at least two projects. And currently we can run up to 300 builders in parallel and this is kind of I mean theoretical number. We can go up and down. We have flexible allocation mechanisms so typically we have something about 100 builders but it can go up to this if it's needed. Historically there was a Federa infrastructure driven open stack and we had only x86 and PowerPC little and the end books is there. And we needed to I mean users wanted to have other architectures so we started experimenting and we implemented emulated builds using RPM force arch and then other people came in that it's not enough the build don't work and that they need the native builders. So at this moment in time we run I mean most of the x86 builders on our hardware in Federa lab and we fall back to AWS. There is a S390 support natively in IBM cloud. PowerPC little and the end is again in Federa lab mostly there are two Power8 machines and one Power9 machine and we fall back to Oregon State University open stack and ARM64 is in AWS only. The architecture ARM32 stays emulated at this moment. So at this slide I'd like to thank to all the cloud providers that gave us the computational power for the community purposes. So historically it was quite easy. We had one simple playbook starting a homogeneous set of builders in the open stack in Federa infrastructure but later we had to move to a different lab. The open stack was the commissioned. We started spreading across several clouds and we maintained several hypervisors so we need something to keep the maintenance of the builders sane. That's why the resource allocation client server architecture was created the resource server which allows us to provide kind of easy configuration in terms of shell scripts that just start and stop VMs in pre-configured pools and we can add more as needed. This gives us the flexibility like we don't start too many AWS machines when that's not needed and yeah it it was quite hard to show you but recently Sylvia and Jakub implemented a nice page for where I can show you you can take a look there on how many resources are currently started in copper and and the pools. I've done one screenshot about the as I said architecture ARM64 is currently run only in AWS and the pool looks like that we can start up to 30 builders when we need that. Currently there is 17 builders up and running five of them are ready to be taken if you came to copper and you want that builder you you will get it and 12 of them are doing something and you can see that one is starting and one is currently deleting so something something like that. The point is to do zero babysitting basically unless you write to us or some monitoring shouts at us we don't take care it just does it's it's job. This software is decoupled from copper so if you need it something like that if you face similar issues you can take a look and use it. Yeah so we need to start VMs and we need to stop them very frequently. The reason is that the reason is security. We cannot simply let the users build RPMs because building in RPMs is done in MOOC and MOOC installs RPM for that you need to have a root. So since you since we give the users root powers they can come and break the builder poison that and and affect other users so therefore we never give the VM you are using to other users and vice-versa. We cannot I mean because we need to start it very frequently we cannot simply use kickstart and install them because that would take too long. We cannot even use the pre-prepared Fedora images from Fedora download cloud and so on because we need special packages there configuration and we need to update the packages frequently. So we periodically prepare our own golden images and this is currently pain. For multiple clouds we have to do multiple different things. The major I mean even the majority images are done with vertices prep but the problem is that our hypervisors are mostly on the stable rail 8 and as you probably know Fedora cloud migrated to BTRFS so Fedora 35 and newer images are based on BTRFS file system and rail 8 kernel doesn't support BTRFS so it is kind of a problem not impossible but it makes us headaches. So at this moment we are poking at image builder. The problem is that image builder doesn't support Fedora on S390 and PowerPiece little and I mean image builder supports Fedora and supports the architectures but not this particular combination so we are stuck for several months waiting for this to be supported. So many the peaks in the traffic are generated often by a few people in few very large projects. Here is a nice link to a blog post about that. We tried to rebuild all the RubyGEM packages into RPMs in this project and there is a blog post about that feel free to take a look but the point is I mean we can parallelize a lot we can start 35 builders at one moment in that and most of the projects are built very quickly like within one minute or so so this is this is super nice but at the end of the build you need to put the RPMs into the repositories and generate the metadata and this cannot be parallelized I mean there are 35 builders come in and trying to update the metadata and they are waiting for look and there is a lot of IO in this in this thing create repo needs to read the RPM actually and to construct and construct it so since beginning we are using the update option for create repo simply to not really the RPMs that are already in the metadata but that was not enough so we started using skip start the funny thing is that this option actually I mean users of this option are in the official documentation mentioned to be garable so but yeah we know what we are doing we have a complete knowledge about what RPMs are modified or removed or added so we can afford using that but guess what it was not enough again so we implement in the recycled package list so we don't actually have to traverse all the RPMs in the repository because there are hundreds of thousands of them and even that generates a lot of IO so now with that option we can rely on the set of RPMs from the previous version of the metadata so good but still not enough the CPU consumption in a create repo is very very big and actually it doesn't make much sense to recalculate it all the time for all for all the workers if there are 35 of them why to recalculate it 35 times so we implemented so-called batched create repo and one of those workers is promoted to be the leader and is able to calculate the jobs for all the all the waiting workers on the log and one it's that once it's done all the workers are unblocked so that's cool it was it was much better the truth put but it was again not enough and create repo was I mean after some time when the RubyGems repository grew up it was taking like half an hour to increment or update the repository and last time last month came Daniel from the satellite team and he was finally able to optimize out 85 percent of the time so since the new release we should be able to regenerate the the largest repository in copper in like five minutes which is cool much better than before could be better but cool another problem in terms of metadata repository is upstream upstream builder it takes a lot of time there not that it would be suboptimal but it doesn't know how to do incremental updates so we ended up with official instructions that hey if you are maintaining too large copper repository turn this tool off and don't provide up-stream metadata because it's simply too slow and you would face a build failures and yeah even DNF is kind of kind of slow and I was kind of lying to you on the previous slide that builds in those repositories are fast we definitely cannot make them in one minute even though the build itself takes like 15 seconds the extreme example is the large repository and the DNF needs to actually load the metadata and that takes like six minutes even on my on my really fast I don't know i7 laptop it takes like five minutes to call I mean to load the metadata from XMLs to cash them and load them into memory and in copper we by default have the mock bootstrapping feature on so in that case the DNF needs to be called twice so that's twice six minute blocks just loading the metadata then the build goes I mean you install the dependencies that's fast you build a package it's super fast and then you need to do the create repo task which takes five minutes so in total we added 20 minutes for just 15 seconds rpm though that's that's not that bad not that good I mean at the end of the day we can run 35 concurrent builds in one project so it aggregates like more than one build per second per minute so yeah but could be better historically we had problems with hanging builds some problems some looping test suit or waiting for the input and we had to like babysit the builds and take a look at hey this is here longer than we would expect should we kill it build the user be affected or is it I mean yeah so we implemented timeouts the issue was what timeout we should pick the way we approach this I mean there is no no no good answer so eventually we took kernel builds the those are done in copper very frequently it's it in average it took like four hours so we gave it a five hours default timeout that might seem to be ages for your builds or it might be too low so if you build some I mean it's configurable per build so if you if you build something that builds for five minutes and you have broken spec file that sometimes hangs feel free to lower this value or if you build for example the blink web engine chromium or something like that feel free to prolong it the the maximum value for this is 48 48 hours yes today's okay fun facts do you know what this string on these three lines means and the hint is Ruby gems correct mirror is correct it's a it's a package name of one Ruby of one gem and as I said we tried to rebuild all Ruby gems as RPMs in one copper repository and we did it in a page I mean as long as the gems had good license we tried that and this one was was the longest package name and it cost us a lot of troubles the the build was triggered it was continuously getting new and new resources retries and we was we were not able to even terminate that we had to go straight to the database and remove it because that was a huge headache this this is fixed now so you don't have to try it but yeah that was fun there is there is also a new feature with the betches like you can chain builds of groups of packages and put the dependency between the betches like first set is built first and the dependency batch is built after that this is heavily used by the Fedora LLVM team they are using it for daily snapshots and actually their betches are not able to finish within 24 hours so nowadays basically anytime you go to the Fedora front page Fedora copper front page you see something doing there some betches from the LLVM team and that's because they are building this daily snapshot for today and for the previous day and maybe even over I don't know Fedora use it Fedora copper uses for for signatures the OBS sign software and by default it uses SHA1 hash algorithm and you may know that this this was distrusted in Red Hat Enterprise Linux 9 so unfortunately eventually we found out that we are signing all the packages in copper with wrong signature so we started we migrated copper to the SHA 256 and all the new packages have this signature but we didn't want to resign all the 13 terabytes of data of packages that would take too long so simply we resigned only the packages for Red Hat Enterprise Linux 9, EPL 9 to allow them to be installed and that's it. Users often want from us more powerful builders for tasks like building chromium or so and that's basically basically there we could do that we of course couldn't start that many builders as we have now if they were more beefy but they could have two types of builders we just need to do some tweaks in the web UI in the UI to allow users to specify appropriate takes. Folks often want to automatically rebuild packages in repositories if they rebuild some dependency like full package depends on bar, bar changes I want to automatically rebuild the full package this is to be discussed with Koshay team and we hopefully will find some way to implement this and very often question is like can I build in Fedora Cooper something that's proprietary or something that cannot be shared with public and the answer is no because because the legal issues and second the computational power is given to us for community purposes so no but there is a pull request on this link where you can look it should be easy in the future to start your own co-operable system if you want to that very easily hopefully. So that would be it I'm really glad that I could give this this talk and happy building unless you have any questions and yeah I'm really into packaging and build systems so feel free to grab me on the Fedora Builds channel on Liberachat I will be happy to talk to you. So we had several questions in the QAN and try to answer them during during your sessions is there any any other questions which we may answer live here in the session if you have something you can write in chat I'm not sure if you can ask directly we not David Duncan say yeah Paola Godi or number now I'll likely be asking your packaging now okay okay so here is one question what are folks favorite text-based fast track to first copper build favorite text-based something so so do we have something text-based so command-line interface something easy to do first copper build Paola? Yes there is a copper client the package is called copper-based click and using that I mean hit it in the command line and it will show you all the help you need yeah it's intuitive copper copper create it will create a project copper build and you can pass it source RPM or link to source RPM or you can build directly from IPI or GEMS reception and it's well documented yeah larger file system there is a plan to do some research about Gloucester or maybe do some some rate solution we don't know yet we don't know yet for the record I repeat a question and what have you considered for extending the file system so so so the answer I answered it in chat so elastic file system from from from AWS which can extend infinitely we can we can we can we will consider a palp which has a 3 back end as well which can grow indefinitely as well or just use Gloucester FS and spin up virtual machines and load load the brakes across several machines that's option we just have to consider pros and cons we have some other features in in mine in queue so we have to consider that as well just not the first storage itself yeah with the EFS there's other option to consider the price because they AWS bill for frequently accessed files more often and then we prune the repo we access a lot of files during that so we have to check what will be our billing we actually get the the price from AWS for free they sponsor us but but still we have to somehow watch our builds so it doesn't spin up in the sky so there are questions in the Q&A I think I can go through them about we are out of time so just just quickly the OBS versus copper there is a nice set of blog posts by Mirac if you Google for Miroslav Suhi OBS research you can you can get answers there do you provision AWS machines on demand or those VMS or are those VMS available and always available we yes that's the resell server we are allocating as many as they are currently needed we have some pre-allocated but just a few of them and we allocate as the demand goes goes up would it be better if you could meet builds with XFS2 that's answered by Mirac we have one last thing you know he's not answered what about the fast track for contributing things that don't have a source RPM yeah I don't think I can send the question so something doesn't have source RPM so it has to be somehow created for that we have the how we call it the method where you can put any script and create the source RPM during the build time that's one of the supported methods so you just provide a script which create the source RPM we call it we will execute it create a source RPM store it into our internal disk it and build from that so just just just provide us with the script in the web UI and you can do it very easily yes and as long as your package is in IPI you can use the PIP2 spec method which generates the spec file for you for free so all you need to know is the name of the of the package basically in the IPI and we already ran out of the time and people are leaving so whoever is still here thank you for your time and if you will have any questions about copper don't hesitate to chat with us on our copper pages you you will have links where you can find us RSC emailing please that's it. Enjoy the rest of the conference and happy building. Bye bye.