 Good afternoon everybody. In this talk, I will explain what is Zool and how it integrates with Pagur. My name is Fabien Boucher, I'm working for Edat in the production chain infrastructure team of the OpenStack Group. I mainly focus on the software factory projects, that is software development forge based on Zool. But first, to give you a bit of context about this talk, me and my team are Zool operators. So we experience a lot about the innovative features that Zool brings to software development. So we decided to share more about it and one year ago, we started to think about Zool and Fedora. And we quickly figured out that Zool should be able to integrate with Pagur. So we decided to write a Pagur driver for Zool. Then we started a proof of concept to experiment with the driver, but in a context of RPM packaging. So in this talk, I will explain what is Zool, how it integrates with Pagur. I will show some examples of the proof of concept. Then I will share my feeling about how Fedora could benefit from Zool. Ok, but first let's talk a bit about the gate. The tagline of Zool is stop merging broken code. The fact is, keeping a code branch SLE is hard for many reasons. I mean by SLE that the last version of the code continue to pass the test suite of the repository or the project. Indeed, validating patch can require complex integration test that can be also artifact building or even multiple node to run. And the problem is even worse when there is multiple pull request to go through. And merging is a human process and problem like miscommunication can lead to broken master situation where the branch no longer pass the test. So to mitigate that simple gating strategy is usually adopted. So pull request à la Github ou à la Pagur are implementation of pull request workflow. So a code maintainer can accept or reject a pull request based on review. But also to help maintainer to decide if pull request should be accepted or rejected, we can attach automatic testing to pull request. And obviously if pull request does not pass the CI test, the automatic testing, the maintainer should not merge the pull request. But the simple gating strategy is flawed. Indeed, some time can pass between a pull request is approved and the time it is merged. And during that time the target branch might have changed significantly enough than when you merge a pull request it would break the code. In that case the test suite of the project or the repository no longer pass and it can take a significant amount of time to detect and fix the issue. And the problem is even worse in case of project composed of multiple repositories. And in that case it will to test a pull request and to merge it on multiple repository. So multiple pull request, it will require human coordination to avoid the breakage of the project. What should we do to improve the gating workflow? So first approved patch, so patch that are going to be merged. So when I talk about patch you can think about pull request obviously. So when the pull request is approved, the pull request must always be tested on the latest version of the target branch. Especially just before the pull request will be merged. Also multi repository and dependencies. The system must provide a solution for developers to specify dependencies across repositories but also to define dependencies on pull request. The system must be also able to preserve the approval and dependencies order when testing and merging pull request. The gating process must be automated. That means that no human should be involved in testing and merging pull request. The system must implement a gating workflow and handle it by itself. Then the system must be able to scale to handle more projects, more patches and more CIR. So let's talk a bit about OpenStack. Given the specificities of the OpenStack project, I mean a lot of patch submission, really long functional test and dependent repositories. It was needed for the OpenStack community to build an innovative gating system, CIR and gating system. So to do so they have built Zool and here are the main concepts they have implemented to it. So multi repositories and patch dependency management. That means that the system allows developers to define in the configuration to define dependencies across repository but also to use a specific keyword that is dependent on commit message that allows to depend that help developers to specify dependencies across pull request. CIR and parallel code gating. So Zool is a job scheduler and job executor. That means that it is really good to execute CIR jobs, any CIR jobs, but also it is a powerful gating system. When a pull request is approved, Zool will rebase the pull request on top of the target branch. Make sure the test suite pass, then it will merge the pull request on the target branch if the test pass. Also Zool, by design, is aware of every pull request approved on the system. That means that he is able to orchestrate the testing and merging process. Scaling, Zool is based on a microservices architecture, so it scale well. And also it is compatible with different code review system, like Gared, GitHub, and recently Pager. On the right side of the slide, you have some statistics from the OpenStack CI. So Zool manage a CI there. It manage around 1500 Git repositories. It's about 10,000 jobs per hour, and around 10,000 patches merge every month. And it's worth mentioning that Zool is generic. It is not tied to OpenStack only, so it can be used in every project. There is other interesting concept and feature into Zool, so I will go over all the bullets. Zool implement even drive and pipeline. So Zool react to code review system event, especially event on pull request. And it will provide a way to trigger jobs based on specific event. Zool is CI as code. Indeed 99% of the CI configuration for a project is stored in Git repositories. And Zool will validate and gate those repositories, so the CI configuration. If you want to change the CI configuration, you can open a pull request. Then Zool will look at the pull request, load the CI configuration change, and undo it. And so that's really great because it avoids the need with other system to merge a CI change to see the effect and eventually break the CI. And it avoids also the need to have a CI staging system. Zool jobs are in fact Ansible playbook. So that means that you don't need any agent on the test node, on ESSH plus Python. As it is Ansible, it gives also access to the Ansible ecosystem. And finally, if you have any existing role that run in your CI, Ansible role, then you can reuse them directly with Zool. Zool support job inheritance, job dependencies, job chaining with artifact sharing. Zool support also multi node jobs. In fact, when you define a job, you will attach a not set definition. In this not set definition, you can specify one or multiple host. And in fact, Zool will flatten this not set definition in the Ansible inventory of the job. So then if you have multiple nodes, it is really easy to spread the task with Ansible across the nodes. Zool is based on a resource lifecycle management that is not pool. So not pool provide a pool of test node that Zool will use when running the jobs. Each time a job starts, not pool will provide a fresh node. And when the job finished, it will destroy this node. So it also produce reproducible job environment. And it help a lot to eliminate CI flakiness. Zool also provide a secret management system that is really simple. Secrets are stored in Git repositories along with the CI configuration but encrypted. And when you need the secrets, Zool will provide the specific secrets to the specific job. And finally, Zool is multi tenant. So you can, there is a strong isolation between tenants. And you can use only one Zool instance across for multiple projects. So now let's talk about the event pipeline. As I said, Zool react to code review event, especially on pool request. So when a pool request is created or updated, it will enter the check pipeline. And when a pool request is approved, so prior to be merged, it will enter the gate pipeline. So on the left, you see the check pipeline. Each box is in fact a change, a pool request. And each part here, you have some part in red and some other in green, some other in blue, is in fact a job. Here in the gate pipeline, we see the box expanded and we see all the job running for that specific change. You can also define custom pipeline, like the post pipeline that will run jobs when pool requests are merged. Also, I said before that Zool supports repository dependencies and patch dependencies, pool request dependencies. So, for testing a pool request for a project, Zool will look at, for a repository, Zool will look at the dependent repository. And then it will check out the dependent repository of the repository, the pool request is open on, into the job workspace that make all the sources available for testing. So on the left side of the schema, you have three repositories, E, A, B, and C. And in fact, those repositories are dependent, they are part of the same project. And if a patch, if a pool request is open on repository A, then Zool will check out repository A, B, and C in the job workspace. And it will rebase the pool request A on the repository, the pool request on A on the repository on A. So as you can see, you have the dependent repository and you have also the pool request into the workspace. You can also specify dependencies between, between pool request. So on the right side, you have a pool request open on repository A, that depends on a pool request on the repository C. And Zool will detect that and check out for you in the job workspace, the three repositories, but with both patches rebased on A and C, repository A and C. And remember that in the gate pipeline, Zool merge pool request. So it will make sure to preserve the dependencies when merging. So Zool won't merge the pool request on A if the pool request on C is not merged. Now let's talk a bit about the gate pipeline workflow. So in this pipeline, this is pool request that has been approved, that enter in it to be tested prior to be merged. So as you can see, so pool request enter in this pipeline in the order of approval. We have a patch on Nova that have been approved first, then a second patch on Nova, then a patch on Keston, and finally a patch on Nova. Nova et Keston are repositories, they are tested together because they are part of the same project, so they share the same gate in queue. And in fact, each patches is rebased on every other patches, a head of itself in the gate pipeline. That means that the Nova patch 2 is rebased on Nova patch 1. Keston patch 3 include in its workspace Nova patch 2, that is rebased on Nova patch 1. And Nova patch 4 is rebased on Nova patch 2 and include also Keston in its workspace. As you can see, Zool expects that functional test, that jobs for those pool request will succeed and all the pool request will emerge in this specific order. By doing that, Zool speculate on the future state of the repositories and it help a lot because it will speed up the gating workflow. But sometimes failure happens, so in that case we have a job on the patch, that test the patch 4 on Nova that have failed. That can be an issue with the patch 4, but probably it is more an integration issue with the rest of the chain. Another example, here we have a job that fail on Keston on patch 3. In that case, Zool will invalidate all the subsequent run. So in that case, it will stop the jobs for Nova patch 4. Then it will restart the jobs, but without Keston in the workspace because Keston has failed. And as the patch on Keston has failed, Zool won't merge a pool request, it will report the failure on the code review system. Then the patch author will need to fix the incompatibility. And probably without Zool, the patch on Keston would have been merged. And in that case, multiple developers would have been impacted by the issue because the master state of the repository composing the project will no longer pass the test suite. So that's why this tells that broken master, but at project level. But up fully with Zool, we detected the issue. For the other change, test succeeded, so they are merged in the order of approval. So to sum up, Zool is able to use this speculative merging strategy and it allows Zool to detect integration problem before the code is merged. Now let's talk about the integration we did between Zool and Pager. So as I said, Zool react to code review system event, but each code review implement different interfaces. So Zool implement some driver to interact with different code review system. So Garrett, GitHub, GitHub Enterprise, and recently we added the support, so the driver for Pager. How Zool interact with the code review system? So first, it should be able to listen to event, like pull request open, pull request updated, but it should be also be able to act on the code review system, especially on the pull request. For instance, to change pull request CI status, or to command on pull request, or even to merge the pull request. Here is an architecture schema where you can see Pager on the left, Zool on the middle, and not pull on the right. So let's take an example where a pull request is open on the repository A. So Zool, in the middle, will receive on the Pager event stream the event pull request created. Then the Zool scheduler will look at its configuration to check if there is a matching pipeline. If there is a matching pipeline, it will look if for repository A, there is some job configured to run inside that pipeline. And if it is the case, it will look at the node set definition of the jobs. It will tell node pull to spawn the nodes. Then it will tell the Zool executor to run Ansible playbooks, so the Ansible playbooks of the jobs on the nodes. When the Ansible playbook process finished, node pull will destroy the node. Then the Zool scheduler will report the job result through the REST API of Pager inside the pull request on the repository A. So it will report a CI flag, so is the pull request passed a CI or not. It will also command on the pull request to give the link to the artifacts. And in fact, in the meantime, Zool will have extracted the artifact into the log server. So now let's talk a bit about the proof of concept. We wanted to experiment with the Pager driver of Zool in a RPM packaging context. Artifact sharing with child job. Our first scenario, we wanted to be able to build a package with a parent job and share the package, the RPM package with child job. Child job will validate the package that have been built. So on the right side of the slide, you have a changebox from the Zool statue page that shows the jobs in progress for the pull request 5 on Python gear repository. Python gear is in fact a diskit repository. The first job, row-eyed RPM-COG scratch build, will build on COG as scratch build, then retrieve the artifact from COG so the RPMs and create a local repository on the test node. Row-eyed RPM test and artifact RPM link are child jobs. The first one will look at the diskit if there is any functional test in tweet. And then if any, it will take the artifact build by the parent and run the functional test in it. So for this test, we have also experimented with the standard test interface of Fedora. And for the last job, again, we will take the artifact from the parent and we run the RPM link on the build package. On the left side of the schema, you see this definition. So in the check pipeline for Python gear repository, this is the jobs we will run and not the use of the dependencies stanza. That means that the child job, RPM test and RPM link should not start if the parent job, RPM row-eyed COG scratch build, has not finished. And that makes sense because we are waiting in the child job for the artifact from the parent job. So let's have a look to the parent job definition. So as you can see, this is EIML. So in fact, every CI configuration in Zool is all EIML. So CI configuration, but also Zool jobs because this is Ansible. So first, there is name and description stanza. Then you have the role stanza that we tell Zool to check out the repository called Zool distro job into the Ansible workspace. In that repository, we have some Ansible role that help for packaging, for RPM packaging. And those roles will be available for the run and post-run playbook. Run and post-run playbook are in fact Ansible tasks that will run on the test node. There is also the use of the provide stanza that tells Zool this job will provide an artifact called repo. Also, there is a use of the secret stanza that will tell Zool to expose the following secrets to the playbooks. And here, that makes sense. We need secrets because we are going to build on COG. We need to be authenticated. And then there is a not set definition. So in that case, Zool will flatten this definition in the Ansible inventory. And it will make available a mock host inside the inventory that will be based on a cloud Fedora node. A cloud Fedora node is in fact defined in the node pool configuration. Yes, in the second scenario, I will use mock, but not for this one. And we build a scratch build on COG. And finally, some variable that will be exposed to the jobs. So remember, we wanted to share an artifact with ChildJob. So this is via the provide stanza, the artifact repo. But we need to tell Zool what is this artifact. So if you remember in the job, we extracted the built RPM from COG into a local repository. So we need to extract, thanks to the first task, the repository from the test node to the Zool executor node. Then in the second task, we will use an Ansible module specific to Zool that will return some information about the artifact to the Zool engine, to the Zool scheduler. And in that case, this is an artifact called repo with an attribute that is URL. And in fact, URL is going to be the URL to the artifact repository where we have the RPM. And how ChildJob will reuse that. So both ChildJob in their definition will require, using the require stanza, a repo artifact. So Zool will provide the artifact detailed in the jobs, in the ChildJob inventory. So this is a snippet from the inventory of the ChildJob. So you see the information about the artifact and you see also the URL to the RPM repository. And then it is really easy to have an Ansible task that will use the information from the Ansible inventory to create a human definition file. And then on the test node of the ChildJob, this task will be executed and the artifact repository will be available. So then the ChildJob will do the rest of the work but they will have the artifact from the parent job. So to sum up, here we saw how easy it is to share artifact between jobs with Zool. So this is a second scenario, pull request dependencies and RPM build require. Here we wanted to show the use of dependent pull request artifact. So to do so, we have created a specific job that is able to build RPMs but with the capability to handle RPM build require. So you see on the screenshot, this is a pull request open on Python Redis, this Git repository. And you see the use of the dependent scale world to the pull request one of Python Mock, this Git. In fact, Python Mock is a RPM build requirement of Python Redis. So next slide we'll see how we handle that. So the job we have built is a raw RPM build that use Mock to build the RPM. So we no longer use Koji in that case. This job definition provides a report effect but also require a report effect. And you see that Python Redis pull request depends on the Python Mock pull request. So the build package Python Mock, build in the Mock route will be exposed by Zool to the job workspace of raw RPM build of the testing of the pull request of Python Redis. So in that case, to build Python Redis, Mock will find expected and dependent RPM build dependencies that is Python Mock. But Python Mock pull request might have depends from other pull request as well. And in that case, the build artifact of the other pull request would have been made available by Zool in the job workspace of Python Mock, so the Mock route, and Python Redis as well. So to sum up in this scenario, we succeeded to handle RPM dependencies pre-merge and speculatively for run type and build. Last scenario, we wanted to give a try with FedMessage to run asynchronous job. Goal was to validate package build by Koji outside of the pull request workflow. And fortunately, again, Zool is driven by Git and CodeReview and is unable, as of now, to react and report on a message bus. So the challenge was to convert the message from the bus to FedGit repositories and have Zool run a RPM link job for the Fed repository. To do that, it was required to build some additional tooling. So we build a Zool gateway that is in fact a Fed CodeReview system and we build a FedMessaging consumer and producer. Also, consumer read events from the message bus, filter them, we just want to keep the event from Koji when there is a build succeed. Then we store them in a backend and finally we have the producer that will take the event and orchestrate a Zool gateway to create a Fed repository, a Fed pull request, that will, by effect, instrument Zool to run our job, that is a RPM link. Then when a job is done, the producer will report back on the bus. But in this scenario, we weren't able to report back because it seems we need to be authenticated. But at least, it can be complex, but it proved to work as expected. Finaly, our Fedora could be called a benefit from Zool. So don't take me wrong, I don't have the in-depth knowledge of the Fedora CI to tell that Zool can do a better job than any other CI system. But I feel that the innovative feature of Zool could benefit to Fedora in some way. That's why I would like to extend on some point. So multi-repositories and depends on. This makes the CI testing by far easier. The rebase and dependence logic is already on lead for you by Zool. Any software developer can understand again, he can provide even for project just spread across two repositories. But in a distribution context, we talk about thousands of packaging repositories. And everybody knows that RPM brings a notion of dependencies. So I think Zool can help a lot in that context. Cogating as a package, you can specify dependencies between packaging change across repositories. And have Zool handling them during testing, merging and publication. So in some way, it is possible to gate RPM repositories with Zool. Cross provider, as far as I know, most of Fedora repositories are hosted on Pagur, but some of them are hosted on GitHub. So it can be good for the Fedora project to provide a solution for developers, packages, to specify pull request dependencies between Pagur and GitHub. And Zool support that. Zool job presentable. In fact, nothing more is needed to define a CI configuration. Zool brings a clear and understandable framework. And finally, CI configuration as code. The CI configuration is fully defined in Git repositories that make the CI configuration versionable and auditable. So some links. If you want to learn more about the proof of concept on the Fedora wiki. If you want to experiment with Zool, there is some quick start. So the first one from the official Zool project. So this is the official quick start. But there is also another way to deploy Zool really quickly, a sandbox using the software factory project. And finally, there is a link to the software factory instance. So software factory is a way to deploy Zool and not pull. So this is the instance we use to work on the proof of concept. So that's it. Let me know if you have any question or comment. No, there is multiple software factory instance that runs Zool and not pull. So the first one is software factory project.io. This is the one we operate me and my team. So we can provide a CI for anybody in fact. So if you want, you can ask. And we have also for the RDO project an instance of software factory. So that runs Zool and not pull. That RDO project is using. And we have also an instance internally at relat. So this one can be used as well for private project. Not ready for no. I know there is the work on open shift operators. But I'm not sure this is fully functional at the moment. But this is something that will happen in the future, especially to run an open shift. Consider a case where we have a group of dependent group requests in the check pipeline. And we pass the check pipeline and the gate. So finally this group got merged. When they trigger post pipeline, can I still know information of the group or post pipeline is triggered only per project? In the post pipeline, the job for the change will be executed in the order of merging. Yeah, but I wanted information about the group. So I want my post pipeline to announce not every change individually. But we hold the group of dependent patches. I want to do something with this. Create one announcement for the whole group of patches which we are just watching. I don't think you have this information in the job inventory. So maybe this is something that can be improved at Zool level. But your job can request this information on the code review system. To find the dependency. Yeah. So there is two solution. First one is to not use a post pipeline and do it in the gate pipeline. And the other solution is a new kind of pipeline. That is a promote pipeline. So it's something quite recent in Zool. And at that level, the information is not that there is only a change merged on a branch. But we keep the information from the code review system. So we have still the information of the pull request. It's not just a commit merged on a change. So I guess at that level, in a promote pipeline, we can do something as well. But we need to experiment with it. That's failing. No, no, with Zool you should have stable jobs. Yeah, but there's a difference between open stack and proper RCA and all the rest of the world. Because open stack would grow to this point that failing test means failing. And there's no way you can pass. But in other projects which are just getting bored with the whole concept of CI and gating, people don't have tests good enough. Yeah, but that's a good occasion to improve the test. Yes, but we still have this as a requirement. People want to have the option, like, you know, be back door to see the pass through. The pipeline configuration. So the job to be triggered in the check pipeline, the job to be triggered in the gate pipeline for a specific repository. It's not a global configuration, so that can be set by repository. So if someone does not have jobs that are stable enough for the gate pipeline, the jobs can be... Also you can put a job optionale. So not optional, but not voting in that case. If the job is flakky, the result won't be considered. You still have access to the merge button, but it will introduce failure if you start to do that. Yes, I know. So unfortunately, is there actually control over the zoo, like how you command it, like run it, or I did a manual review and the code looks reasonably... So just merge it, you give it some command in... Yeah, so... What is the user interface or internet interface? Alors, so on... On get it, on get an unpagger, it's going to be a bit different. It will depend of what provides a code review system. On get it for... This is not called a pull request, but this is called a review when you provide a patch on a project hosted on get it. You will have some labels where people, code reviewer, can set a not in a specific label. So usually there is a code review label where you can set plus 1 or plus 2. And there is a verified label that the CI, so Zoo will use. And also there is a third label that is called workflow. This is like the final approval. So Zoo look at that to decide if it should take the patch and make it enter in the gate pipeline to have it merged. On get up, there is a reviewer approval and this is a couple with the tag system on pull request. So if a patch, if a pull request is approved by reviewer and if it gets a specific tag, then Zoo will detect that. It's part of the event of the system and it will take the pull request, put it in the gate pipeline and run the test, have it merged. So this is the way we control Zoo, in fact. The check pipeline starts whenever there is a new pull request. But the gate pipeline starts when the person says that you are ready for the game. Yes, this is really important. In fact, most CI systems provide the check pipeline. When you have a pull request, you simply run jobs and you have a result. So that's great. But you have the job result at a specific moment in the time with a specific status. For instance, the pull request might be based on a specific commit. So this is what the CI system will test. But in the gate pipeline, we really want to test the pull request on the really last version of the repository of the target branch. I'm asking mainly because there might be infrastructure failure. So you are downloading something from external service and it fails for some network connection. So how to actually, actually run around the gate or something like this. So this is in the pipeline configuration. This is really customizable. You can tell if someone type as a command regate, then Zoo will detect that command. And put directly the pull request in the gate pipeline. Or if you want to reinforce a recheck, then the patch should go through the check pipeline, then again through the gate pipeline. Imagine, again, we have fewer project main components and some maintainers or two maintainers decided that actually they like the idea and they want to use the Zoo for the pull request. And they want to get a Zoo to work for them and then just improving. So we can enable just a project main. Yeah, exactly. We can have a global configuration with project template. So this is a common set of jobs that are good enough for every project, like a bit of what I showed. So the build on Koji has scratch build, then the retrieval of the artifacts, then some validation jobs. So this can be quite common. So we can propose people to use that by default. That can be done globally, but also by project. Then each project can add additional jobs. So projects definition will augment, will be merged on the common configuration. In fact, we tell Zoo in its configuration to listen to event, but for specific project. In the project definition, you can even say that it's going to be rpm slash star. So in that case, it will be every product. So recently in Zoo, they have merged the capability to use multiple, to select the ensemble version job should use. So now we can use 2.8, 2.7, 2.6. Yeah, at the job level. At tenant or job level. Any other question? This is a question I think that come really often in the IRC channel of Zoo. So I think just someone need to tackle the task. But yeah, I did most of the work for the Piger driver. So if GitLab is really needed. In fact, it will really depend about the API of GitLab. So if the API we really need for Zoo are there on the API on GitLab, it will be really easy. The fact is for Pager. So the Pager developer is working at Red Hat. So it's quite easy to discuss and to have API change merge on Pager. But for GitLab it's going maybe to be a bit complicated. Yeah, so we need to look deeper at it and it's probably possible. So thank you everybody.