 Sweet, I like my phone a bit. Welcome to the fall 2017 distribution depro. We have Matthew Queen and Haiko Kumar here to talk to us about RTO's continuous packaging platform. Good afternoon, everybody. Wait, microphone. You're going to need to start. I forgot to tell them. It's my fault. You can, yeah, perfect. Right, yes. OK, so I'm Matthew Yuan. There's not much time, so we'll go fast on some stuff. Hopefully, there will be, well, keep it like that. We'll have time for questions later. If not, I think I'm the only guy with a red cap around here, so you can find me easily. So let's get with the basics first out of the way. If you don't know what RDO is, it stands for the RPM distribution of OpenStack. And if you don't know what OpenStack is, it's a collection of software that lets you install and administrate cloud infrastructures on conventional hardware. RDO is also a big community of users of OpenStack, of package maintainers, as well, administrators, and so on. And their goal in life is to help you install your own OpenStack infrastructure. The maintainers of packages maintain about 250 packages at the moment, and it's growing. So what do we need to build a package? Basically, it's going to be a very high level. First, you need the sources, obviously, of the software that you want to package. We will call that the upstream code in the remainder of the presentation. And the next component of the package is called a spec file, or spec files if you have more. Basically, it contains the building steps of how to build the software from the sources, how to install it, how to remove it, the dependencies that it needs to work on your system, the patches, eventually, if you need to adapt it to your system, and so on. So from that, we can infer that the job of a packageer is pretty much the same as the job of an open-source developer. A packageer will need some form of version control. In the Fedora and CentOS universe, we called the spec file repository the diskit repository. It has to be on a public platform because it has to be open to the community and to reviews as well. You want also to make sure that the contributions that you get are of utmost quality. So you need automated testing and validation. So basically, you need continuous integration on it. And your testing environment has to be controlled. And so the tests are reproducible, sorry, and you have high confidence in their results. You also need a smart merging strategy. If you have a lot of contributions that come at the same time, you have to make sure that when you merge them, you're not going to end up in a state of your master that you don't really know. But when you're a packageer, you also have a specific constraint, which is upstream. You have to follow what's going on upstream. You have to make sure that your spec files evolve if upstream evolves. And you want to release as early as possible after a change occurring upstream, especially if the change is a vulnerability fix or security fixes. And when you package OpenStack, it's even more complex due to the sheer size of the project. OpenStack represents more than 600 projects for the last cycle. Every six months, there's a release. The last cycle was called Newton. So in the last six months, there were more than 600 projects, about 240 commits a day, which is humongous. Just for Nova alone, which is a compute component, we had more than 1,700 commits, which is about 10 commits a day. And during the stabilization phase, which is about one month before the release, it's ramped up to about 280 commits a day. So what does this mean for OpenStack packages in the audio community? It means that you have to make sure that the spec file is validated against each change, because if you wait for 1,700 commits, you're going to have a bad time. So we can do that at the last minute. Also, you need to be aware of the strong dependencies between the software in the OpenStack suite, because if Nova changes, it's going to impact all the other projects. So it's the compute component for OpenStack. So if you have a change on Nova, it's going to affect everything. So you have to test again all your other packages to make sure it's still working. And the test platform must also be able to absorb huge spikes in activity. So now that we know what the problems are, let's look at the tools that we have at hand to build an RPM factory. So first, let's look at what the audio community came up with. The main tool I'm going to talk about, I'm going to be fast on this if you have questions just discussed after the presentation. The first one I want to talk about is called Delorean, like the car in Back to the Future, because it basically builds packages from the future. It looks at what's happening upstream. And when there's a change, it's going to try to build a package using the new changes in upstream. It also acts as a master IPM repository because it uses the latest source code from upstream. And if you want to know more, there's an article linked on the presentation. We're also going to use a CLI, a common line interface called RDOPKG, which will help with the basic tasks needed for maintainers, especially flattening patch chains that have to be included in the spec file. We'll talk about that more when Hikerl talks about the use cases. Next, let's talk about community tools. The centOS and federal community use a tool called Koji. Koji is basically client and server architecture that lets you build and store RPN packages. So you can use your own instance of Koji, but you can also use the community build system, which is an instance of Koji that has been set up by the centOS community. And you can build if you have to set up communication with the centOS community first, but you can use it to build your packages on it and test them. And the last piece of software that we're going to use is called Software Factory. So this is basically a software forge, which includes CI CD platform. And the specificity of it is that it's inspired by what OpenStack is doing for their own CI that we call infra. So it uses basically the same tools, tools that you might know, like Garrett for code review and Jenkins for testing for job automation, but also some tools that weren't that known because they are developed by the OpenStack infrastructure, like NodePool, which is a service that provides job nodes on demand, and Zool, which is kind of a release manager. And also Job Orchestrator. So I go fast on the features that Software Factory provides. It's really cool for development, not just packaging. So you have code hosting and review through Garrett. You have Job Orchestrations through Zool and Jenkins, as I said. You have also Zool as a nice feature, which is project dependencies management. When you build a test environment, it will build it taking care of those dependencies by itself. So NodePool allows you to use jobs, nodes, and slaves on demand whenever you need them. You have a smart communicating as well, so I go faster. The config is managed as code, so you can also use a CI workflow on it. And you have a flexible workflow, so that's why you are going to use it for ADO. So with all that, there's not much time, so we have to go fast. So with all that, we managed to create a RPN factory. So basically, upstream changes are going to be taken care of by the Lorent. And the tests are going to be taken care of by SOTA factories. The workflow is going to be managed by SOTA factory. And all the building parts is going to be taken care of by CBS, by the community build system. So this is a workflow overview. I believe that the presentation has to include a very complex workflow diagram that we're not going to explain, because it means that we're doing some serious business. But basically, what I want you to remember from that is that we're going to use three different kinds of repositories. We're going to work with upstream, with DRDO diskit, which is the spec files repository, and also a repository for patches that we're going to include in the diskit. And I will let Eichel talk to you about the use cases that we cover with this architecture. Thank you, Matthew. OK, OK, just hold it like this. OK, Eichel, I'm the word guy sitting here. I'm here because I'm the RDO release wrangler. So I'm the first user. So here to explain you the use cases. So we have three main use cases. The first one is packaging master branch, as Matthew said. So we take sources from upstream, like, for instance, the NOVA sources. And then we take the packaging sources. And then we use Dolorean, take both repository and try to generate an archive of the sources, and then generate the package using the spec file. It has also some magic to generate proper versioning and stuff like that. And after that, we have two possibilities. The package both. Everything is fine. Well, mostly. Because you could have hide and change that are not catch up. But that's our job from two packages to fix that. It fails. So what will it do? So if it fails, usually it's because sources have changes changed, or we have missing dependencies, or stuff like that. So we need human intervention here. So we create a placeholder review in Garrett. As a way to say, hey, maintainer, we need your intention to fix that point. You have the build logs showing you what's the problems. Please fix it. And the thing is, since we're using Garrett, it's public. So anyone can see the filer and fix it. And the maintainer will be able to review your patch and fix it and merge it. So it also helps to lower the entry barrier to packaging. So next step, well, also we're tracking stable branches. So rather than waiting that upstream makes a release, we just say, hey, let's try stable branches. Do as we do with master branch and package every comment. And see if anything changes. Usually it just works for stable branches, at least for OpenStack. But the thing is, we run CI. And we can detect CI failures early and fix them. Usually it's mostly we have to change dependencies or we have a dependency that was updated and broke, things. So what does it do? Two things. Since we are sharing branches between stable branches striking and actual releases, it detects if we change name and the version manually. So it will do a scratch build on CentOS build system. So we see if it builds. And if it succeeds and the maintainers approve the change, it will do the final build for you and it will get exported to CentOS repositories. The thing is, we get to control who has access to both systems like this. And it helps lowering the entry barrier because you don't have to grant both system access to everyone. So currently for CentOS CloudSea, we only have two or three people with access to the build system. But we have much more people doing the build through Garrett. So this is the same thing. So if it fails, you still need to have the maintainer come on and fix the patch. But it doesn't get merged. The third case is when we have distro changes. For instance, you have a user reporting a bag with system deservices, which are something that are not always tested. So it's a distro specific change. So you do the change. And as a previous use case, it will get build in CentOS build system and like this. You also have another category of patch which are changes, distro specific changes, which are downstream patches. Well, in earlier cases, use case, we tried to limit the number of downstream only patch. So what we do is we have a repository tracking upstream sources. And we manage downstream patches as open review in Garrett. So if you know Garrett, usually when you add patch, it gets merged. But in this specific case, it doesn't get merged. This allows us to track the downstream patches history across the releases. Because we keep updating the upstream tracking repository. And our patches are just a set of reviews. And Garrett will detect if it may need to be rebased or if we have rebase issues. So when you can just rebase, it can do it for you automatically if you configure Garrett to do automatic rebase. If you don't, or you have failure, you just have to retrieve the repository, retrieve the review code. So the RDOPG tool will simplify that for you. And then you just do your manual rebase and update the existing reviews. So it simplifies maintaining downstream reviews. As a package for two distribution, CentOS and Fedora, I had experience with managing downstream patches. And the biggest issue is simplifying collaborative work across patches. So different packages have used different tools and they work sometimes on the same page. And here we have only one, Garrett and Garrett, period. So it makes it simpler, OK? So return on experience. So for RDO, we're in the OKTA cycle, OKTA being the next open cycle release, which will happen around February 20. So we have numbers for the previous cycle, which is Newton. We had about 800 comets from 70 contributors. So my team at Reddit is about seven persons. And not all of them are doing packaging. So that shows that we have multiple people outside the RDO team helping on the packaging. So mostly people outside Reddit and few Red Haters. We also code through Delorean in Master Branch about 230 bulk failure. So it's about one and a half bulk failure per day, but we don't have bulk failure every day. It's mostly about early in the cycle or just before release candidates. So we can get for two weeks no bulk failure. And just before the release candidate, we can get 30 bulk failure at once. But it allows us to detect bulk issue early. When we get, for instance, and I'm coming here, RDO Newton packages were available in central repository 10 hours after the assumed general announcement. And why? Because when the announcement was made, all our packaging was ready. We had CI jobs that was already running against that code because it got frozen at some point. And that's what we had to do was just pushing the final tags in the packaging repo, running CI jobs, generating final repositories, and also updating documentation or own announcement. And that's all. So that's quite fast because the usual process for distribution is wait for the announcement, retrieve the table, try to build the thing. If it fails, fix it. So just imagine for OpenStack, which is about 400 pages and not even counting the dependencies. So if you're trying to do that, it takes at least two weeks to do a roller release. And two weeks is very narrow to do that. So it's kind of successful to release faster and with higher quality, thanks to CI jobs running behind. So the returning experience was we have automated the distribution pipeline. So it's like continuous delivery in practice. So we are following the OpenStack pipeline, integrating within the CentOS pipeline, a release pipeline. We have still a few things to automate, but it's mostly here. Sorry. We leverage a collaborative work through Garrett and reviewing. So it has much advantage over GitHub for request, but I think the same article that Matthew was pointing out early, I think. Another one? OK. Very different topics. So it's not also just really difficult. OK. One of the issue when you start a new RPM or any packaging distribution is that building a community of contributors. OK, let's be honest. Most OpenStack developers don't care about packaging. They don't want to learn about packaging. They don't want to learn about new process. So we are using a process that is much similar to the OpenStack one. They're using Garrett. We're using Garrett. We have the same concepts. And also, anyone that is not familiar with packaging can start trolling up, because they can see through reviews how people are working, what they are changing in spec files and stuff. So it's kind of self-documenting process. So it makes it simpler. And you get cut transparency, peer reviewing, and so on, faster and boarding. So if you want to join us, so if you want to implement that kind of process, you can look at softwarefactoryproject.io, which is the basis of the RPM Factory project. It is used by software factory itself, DistributedCI, which is a distributed CI for RPM distributions. Skydive, which is a network analysis enzyme. Skydive network analysis is the second one. Yeah. And also, if you want to see a live instance, you also have review.air.do.project.org. That also three projects. Air.do from the CentOS Cloudsick, Delorean project, and also Optos, which is another CentOS sick, providing operational tooling. And maybe your project, or maybe you want to build your instance. So if you want to keep in touch, well, softwarefactoryproject.io, or Air.do, also Air.do.project, on three nodes, you have the Software Factory channel. If you want to roll your instance, if you want to have feedback, or help, or want to know about OpenSec, we have the Air.do channel. And also the mailing list of both projects. So feel free to ping us. Mathieu is embering on the RSE. I'm number 80 on three nodes. So feel free to ping us or catch us in the lobby. So thank you for attending. All right. Before we take any questions, since we made it pretty fast, actually, so that's good, I just want to add something that I didn't have time to mention before about the way the workflow is implemented. We can look at the workflow diagram. The important thing that is added through Sorter Factory is that all the tests for CI are done before any actual merging. So that's a very different approach to what usually CI does, because the CI tests are usually launched once some code is added to the code base. And because we are an open source project and we might have a lot of contributions, we want to make sure that the contributions are valid before we actually merge them. So not just a peer review, but also automated validation. So do you have any questions? Is this workflow available on your web page? It's on the audio documentation. Yes? The downstream patches are reposed to be considered usually get-up-stricted. It's complicated. What? About using the patch? You're going to manage it? Yeah. Yeah. Consider using it. Get-up-stricted. Oh. Uh, no. We've just... The current process changes are kept open? Yeah. No, we have our own tooling to flatten the patch in the packaging desktops, because desktops just store all of the downstream patches in the repo. So we don't need to have them inside our repository. We just need the tool to retrieve the flattened patches. So our dear package will connect to Garrett. It will see all the open reviews and it will check if CI is passing. It will check also the voting so you can have patches review open, but if they don't get the right votes they are not included. So you can even remove a patch very fast. Okay. I will just change my vote here to remove it and it gets removed automatically for you. Okay. Another question? That's the last question. Okay. Are you going to use CI to have some... every change to flattened patches? Oh, we're trying. Actually, the thing is we don't have enough capacity to run that, but it's... we're retrieving changes from upstream every five minutes. So that's pretty much close. Sometimes we get three commits merged at the same time and we just test them all together, but that's something we're working on, but I don't think we ever had more than three commit tested at once in practice. And by the way, I'm seeing an RPM factory from contributors here. Sebastian, so if you can applause him. Thank you.