 Okay. Hello, everybody. I think that's time. Thank you for coming. And my name is Ciaric Lukov, and this is Lukash Lukasevich, and we are going to tell you something about how we use Zool at Tanks and Fabric. So, yeah, let's start with some introduction. We are admins and operators of the Tanks and Fabric's continuous integration and continuous build system. We started deploying Zool version 3 in November last year, so we already have a journey with Zool that lasts for one year. We both work at Codeline, which is a Polish company based in Warsaw. And we offer DevOps, SDN, NFV, and cloud-native services and consulting. We employ about 200 engineers who consume about 50,000 of coffee shots per year. And in case you want to contact us after the talk, here are our email addresses. So let's start with some background about our project. Tanks and Fabric, some of you may know it under the former name OpenContrail. This is a SDN framework that is multi-stack in a sense that it integrates with many kinds of workload orchestrators. The first one and one of the primary ones is OpenStack, but we also can provide virtual networking for Kubernetes and OpenShift, VMware, Hyper-V, and also connect to public clouds. So because we integrate with OpenStack, we face similar challenges like the OpenStack CI, because we need to test the software against different versions of OpenStack, for example. And the specifics of our project is that we use very many programming languages. We have C, C++, we have a kernel module for packet forwarding, for example. We have Go, Python, Java, and others as well. We have a single build of all the core components, and this requires checking out 30 repos into one build tree and making the build. For this, we are using the Android repo tool, which checkouts all the projects, and it was not that trivial to integrate it into Zool, because Zool does all the code checkouts by itself, but we managed to do it. And after the build, we need to create Docker images, because the tanks and services are deployed as containers. And we support a couple of platforms. We mostly focus on CentOS and the Red Hat family, but we also have Windows Server and in some places, a few Ubuntu images. Okay, so let's talk about how the build system looks like and what's the history. So the starting point, before we migrated to the CI to Zool version 3, we had tanks and had a continuous build system on Jenkins, and it was a totally separate system from the continuous integration system, so it had a duplicated set of scripts, duplicated locations of dependencies, and a different set of slaves for building, so it required a lot of manual synchronization, because when some important things changed in the code, the build scripts had to be synchronized between the continuous integration system and the Jenkins continuous build system. The build system worked. It worked well, but it required some more work. So while migrating to the CI to Zool version 3, we had an idea to use the same jobs that we use in continuous integration for testing to power our release pipeline. So we managed to do it, and right now the tanks and release system runs entirely on Zool, and the build pipeline consists of three steps. We first compile the code and package it to RPM packages, then we build Docker images, and then we publish it to whatever registries we want. So we have a three-step daily build pipeline, and to give you some more details, we have one Zool job that builds and packages the software. We upload it to RPM repo that is hosted on Nexus, then a second job fires which has a dependency on the build job. So it starts after the first one succeeds. We provide the URL to the repo, to the package repo using Zool return. So the image building job uses the variable to install the RPM repo, the YAM repo in the container, and the container step in turn uploads the images to a temporary registry, and the publish job fetches the containers and publishes it to external sources where you can download the images. We are not using or not planning currently to use the job-posing feature. We have the RPM repo and the temporary registry are living on a static node, and we think that it is a good solution for us because it reduces the load on the system. We don't need to have a VM running because during the build, we still need to retain the images and retain the packages, so this way or another, we need to upload them to some static registry. So we are not planning to use the VM from the first job, for example, to host the RPM repo. Okay, so this is one swim lane of building stuff, but we are building stuff for different platforms and for different versions of software, so we need to have multiple of them. So for example, for one platform, in this case, OpenStack, particular OpenStack version, we run it as a one pipeline, and when I say pipeline, I mean a workflow of jobs, not the pipeline in the Zoolsense. So we have one sequence of jobs that finish with publishing, and then we run them multiple times for different versions and for different platforms, and we even have one quirk that we are still using Jenkins under the hood to build some Windows containers, and this is just a Zool job that triggers Jenkins and then waits until it finishes and then downloads the containers and publishes them to registry. Of course, we are planning to use the notable Windows integration to incorporate this entirely into Zool version 3, but this is a temporary solution. Okay, to give you the full information, of course, we are using mirrors of different artifacts. We have RPMs, we have depth, we have PyPy, cache, Maven, and Docker images also. As for the builder VM images, we took the minimal approach, so we don't install any dependencies on the notable images. We are using a plain operating system-based image. We only install Zool SSH keys and all the dependencies are installed during build, so that we are sure that we maintain all the dependency configuration inside the source repos and not in the disk image builder image definitions, and it was kind of disappointing for devs when we were explaining it that the disk image builder and the elements mechanism is not a way to cache builds, because for me it's important to think about the disk image builder definitions as an infrastructure thing. We are providing images for different operating systems, for CentOS, for Ubuntu, but this is not... You can't submit a review with changes to node pool configuration and then test how the job will behave with your change, so it's not possible to change the image definition and see how your jobs run with it. You need to merge it, you need to wait for the disk image builder to build it, and then only then you can see how it's behaving. So we are not doing any build artifacts caching or dependency caching inside the images. Now you know how we configure the jobs, so it's important to say how we are triggering the events, so one way of doing this is a scheduled trigger in Zool, so of course we use it to build daily nightly builds, and when the release is approaching we change it to build to start twice a day, and on every merge we are triggering build pipelines for documentation, and also for third party RPM packages for our dependencies, so we have a cached repo of dependencies built from RPM specs, and it is also possible to trigger the build on demand, for example when something failed, and Zool administrator is able to trigger the build using the CLI. Okay, to make the build system a little bit more useful and to give a better interface for users, we provided some extensions, and all the things that I will be talking about are just things that we wrote in Ansible and in Python, we didn't change Zool in any way, so the first thing was consecutive build numbers. We have a requirement in the release process that every daily build has an incrementing build number, so for every branch that we build, we have a sequence of build numbers, and we are also dumping exact commit information into JSON files so that we are able to see what was included in the build, and also to reproduce the build in case we want to test something. We are dumping information about artifacts, so for example Docker images and the RADs for later verification, and from the commit information lists, we are generating tables of changes that went into the current build, and also a list of bugs. So the build number feature is written as a custom Ansible module that is using SQL database, and for every branch, we are saving the list of the mappings between Zool build set IDs and the build numbers, and I think for me, it is a kind of a persistent Zool return, so we are kind of saving values returned from Zool build sets for later reference, and also we are using the build number thing to, we change the way Zool is generating log URLs, so we have a list, we have a directory with all the build logs for every branch. And for dumping commit information, we have a JSON structure that for each project used in the nightly build set, we have a list of changes, commit chase and change numbers in Garrett are also there, and in case the commit referenced some launchpad bug, it's also, the information is also there, so we can generate tables and developers can see what happened in the last build using this, and yes, so as I said, everything is entirely implemented in Ansible, so we are running post playbooks to generate all the information and upload it to the log server. Okay, and so this was the build pipeline, but we started with implementing the jobs for a check pipeline, and we use exactly the same jobs, the jobs inside are aware of the differences in environment variables, so for example, we detect that we are not running against a change, but we are running against a ref, and we detect that we are running in a nightly pipeline, for example, and the jobs select, for example, the place to upload artifacts, depending on the information that comes from the Zool environment. We don't need a published step in the check pipeline when testing code, so we dropped the published job, and we performed the build, and then we performed the build using the same jobs as in nightly pipeline, and then we run a suite of integration tests for different platforms from the same artifacts that are built in the container building job. Okay, so we can also benefit from the fact that we split the jobs into two, because we can perform the package build once, and then use the artifacts in multiple image building jobs, for example. And Zool also made it easier for us to integrate job done by different teams, so all the components here were written by different teams. We made some changes to the build process and the packaging process. One of the teams prepared the tool for building containers, and the QA team prepared all the test suits, and Zool made it easy for us to integrate the job and to synchronize different teams to get to this point. Okay, so this was a description of the build system, and I will now hand the mic to Lukas, and he will tell you about some of the findings along the way. Thanks. All right, so I want to share something about reusing Zool jobs today. It's all fine, except the fact that we wanted to reuse Zool jobs outside of Zool. And the original idea was that we already have jobs that are shared between the CI and the CB system, so the majority of the work is done. We decided that perhaps we can use that in the development environment for our devs to make it easier for them. So we decided, okay, so let's focus on that and create some sort of Zool agnostic playbooks and roles, so they will be completely decoupled from Zool. We wanted to do it because Tungsten Fabric is rather an extensive project. We have a lot of repos that are tightly coupled together, and if you want to get a full build, you actually need to check out all of them and then run the build. So at the same time, we wanted to do it for developers, so it will become very easy for them. It will be like one-click solution. We wanted to save us the time because we already robbed the jobs, and at the same time, it's really cool to reuse the stuff you already written because, well, doing something twice is not fancy. And at the time, we approached some sort of a dilemma because we, as I said, wanted to have reusable playbooks that we can brighten Zool, but at the same time, we wanted to focus mainly on Zools variables because, well, they're just convenient. They're there. You can use Zool change or any other variables to tag your stuff or actually generate links to the builds. And at the same time, there is another dilemma because you want to have very good array visibility for the developers to actually be able to debug their jobs, see which exactly steps has failed and what do they need to do. And it becomes a little bit harder if you want to put all of that in the CI system and be dependent, so you want to have like one single show entry point very similar to what OpenStack intro team did for the migration between Zool 2.5 to 3.0. And of course, I want to say that the idea failed because run a playbook has to do all the work that we require. What that means, that all the building or packaging has to stick inside of run. And because of that, we cannot properly leverage pre and post playbooks. Pre and post playbooks, they're very good to use in Zool. They're very convenient. They allow you, for instance, to retry your job if they fail in the very beginning or upload logs because, well, this is like the last step you need to do. However, it's very hard to tell and to actually tag your stuff in a way that you are able to say, okay, so this pre-playbook is very Zool specific, but this pre-playbook actually insults our dependencies. One other issue is that I believe that Ansible is too well integrated into Zool. And what I mean by that is the fact that it's so convenient and easy to use Zool variables inside of Ansible that you don't actually want to create some sort of abstraction layer for your variables. And well, basically, you just use Zool change or Zool patch set, build set and et cetera, et cetera. One other issue is that if you want to leverage pre and post playbooks, you need to actually know which one of them you need to run. And this is something that implicates that you actually need to parse Zool config on your own because you need to see what the job you need to run. Later on, see what parent this job has and go over every single parent to the very, very top of the tree to see what pre, post and run playbooks you have to run. And combining all of that and then the decision of which playbooks are Zool related, which are not, it actually is a lot of work. So after a month of that is that we decided to move all the packaging, building logic to make files inside our code repos. So we have repos with the code, we have make files, we have all the built steps over there. It becomes simple. However, since we actually wanted to reuse some stuff and not do it twice, we decided to create some sort of make files that will allow it to, that will allow us to do it a little bit in an easier way. So in case of the CB or CI system, we normally run pre playbooks. Later on, we start with run. However, there is a huge difference between the dev environment and playbook and the run playbook. Dev environment runs target called make all, which basically will install all your dependencies, all the stuff you need and run all the targets at the same time. However, since we really, really, really hard wanted to be, wanted to still keep the ability to split the jobs inside array and be visible, we run something like target list. It generates a list of targets available. We save that to a variable. Later on, we have another task, another task that basically includes a different file with some, with the list of the targets. And basically, we include the, we include the file for every single, for every single target and that way it is well visible in an array. And from the dev environment perspective, as I said, post playbooks like logs or package upload are not really important. This is only something that sticks at the developer's laptop. So they are not really interested in that. And now I want to focus on testing your jobs. Because your jobs are already stored in your repos as a code. So in theory, you should be able to test them like everything else you want to test in Zool. However, there are some things that are not very easily testable in Zool. And there is a very good reason for that. And what I mean by that are secrets. Secrets are very difficult part of Zool because you want to have a system that allows you to be as open as possible and have as few operator ingress as possible. So you don't want operator to be administering the system. You don't want, you actually want the administrator to operate the infrastructure, not the system itself. So, and because of that, you allow secrets to be created by every single user. However, you need to make sure they are not readable by every single user. So you cannot test if the secret you want to add before actually merging the code to your trusted repo works or not. So you can take the risk, merge it blindly, or try to do something else. We have a few ideas for testing your jobs, actually. So you can start with setting all of your pipelines or the pipelines you want to test your staff as post-review. I would recommend that only in environments that are not security focused. So I would say something rather like 10 people company and knowing each other for over 15 years. That's a pretty good scenario for that. Another example would be a separate development environment. So separate development environment, I mean by that, setting another zoo, garrick, node pool, and all of the components needed for running your CI system. This has a huge disadvantage. You still need to clone your repos and actually integrate them to the new system. You need to make sure all of your DNS addresses or IP addresses, they don't overlap in any way and you don't have any mix-ups. Other thing is if you want to import your secrets, you actually cannot import your secrets because you're encrypted with different key of different zoo. But this scenario may not be perfect for testing your jobs, however, this scenario is pretty good for testing your zoo upgrades. So it has one of the biggest advantages. You can also run zoo on a laptop and you can try mocking node pool part or garrick part, checking out the code locally to your laptop, later on running some simple docker slave, which will be a static slave. However, again, you still approach the very same problem. You need to clone your repos. You need to set up your secrets. It's not fully viable. Another scenario is unique testing your roles. So this is something that will not help with secrets at all. However, it will allow you to make sure that your jobs are passing smoothly. And what I mean by that, if you're writing your code and your CI jobs, you can actually treat them in a regular way. So, for example, you want to set up, you want to change something in docker configuration in demon.json file. You do that in a regular way in some sort of role. And for testing that role, you basically, you can provide a dummy file which will not be demon.json. You can change the line you need or template the file you need and see if the exact result is over there. So that will be something like unit testing but for your roles. But again, it only covers the roles, not playbooks, not jobs. It's only a half solution. But I believe it's a solution that everyone should have in place because most of the time you actually don't need to test secrets. You need to test your jobs which will probably already have some base jobs with secrets already there. And the last idea or the last suggestion is running copies or mocks of jobs. And I want to focus on it a little bit more because normally when you want to open a review and you want to use a secret, for example, for uploading your logs, that would be a trusted config repo. So if you open that review, you basically won't be able to test the secret. However, you can try opening a review to an untrusted repo. And if you want to see how it will work. So normally secrets won't be picked up. However, you can replace those secrets with variables. Basically, instead of encrypting all of your stuff, you can provide some dummy values. You just need to test and see how they behave. However, they're also tasked that in config project, they basically allow you to run on top of executor directly instead of the worker nodes. So if you are required to run something on top of the executor, you basically need to change as a host because from untrusted context, you cannot run anything on executor. And the last part for today is actually stuff you would appreciate to see in Zool. However, we can still live without it. The first part is matching executor with its cloud. We have two environments and we want to expand a little bit more, mostly focused to the public cloud. With current scenario, Zool executors are not aware where do they sit. So they will try to pick up all the jobs and all the IP addresses of worker nodes all over the place. It doesn't matter for them if it's cloud one or cloud two, it's a private cloud or a public cloud. And our environment is actually in private, in the private network behind firewall. And we do not operate with public IP addresses for our slave pool. And we wanted to change that. And because of that, because of the fact we cannot differentiate two clouds, we would have to either run all of our executors on top of the public cloud, on top of the private cloud, because from the private cloud, it will be well available to them. So they will be able to access the private cloud, they will be able to access the public cloud. Public IP addresses are pretty much public. And we wanted to have a scenario like this. We wanted to have a very strict saying to Zool, hey buddy, your cloud is cloud one. Run only on top of that cloud, because, well, it's easier for us, we don't have to establish any tunnels between two clouds. We don't have to establish any routing and actually interaction with the infrastructure is limited on our end. One other part it might be actually profitable for is the fact that perhaps in very dispersed environments, having Zool executor binded locally to a cloud might actually decrease some sort of timeouts or pings or any other issues that you might expect in something like that. And the second thing is matrix build definitions. This is something we had in Zool 2.5. And if we focus on the part on the left, this is our example configuration we're going today. Basically, the difference between all of those three jobs is OpenStack version. This is pretty much simple, however, when you try to assign the jobs to projects, it becomes longer and longer and longer. And at the same time, you need to have a very, very separate job definitions, even though they actually don't differ much. It's only one single variable. And on the right side is my idea just random thought, how it could look like, not necessarily how it will look like in Zool 3, but how it could look like in Zool version 3 to actually be able to, you know, say, okay, we want to have some sort of regex matching for the job or some sort of variable matching for the job. And because of that, we will just provide one variable, the job name will change, and it will be happy. Ropping up. So I hope that one of the takeaways and the first and most important one would be that ThanksInFabric has a cool CICB system. One other thing is we hope that you better know or you better try to do it in in that way, or you actually know how to deal with binary artifacts with Zool, because it hasn't been trivial for us. Reusing your jobs is the key. However, I believe it's not always the key, as we proved with the LopNet environment. It was possible, it just took too much time and too much effort compared to the gains we got. So simple make folks was the key. And you can test your job not in the production, so you don't have to merge your code and see how they work before you actually run them. Our future plans. So we want to focus on continuous upgrades of Zool right now. We actually have two environments and frankly, we update them rarely. We want to run build and unit test jobs inside of containers instead of VMs, so that would be a driver for notebook. And for third-party packaging that Yarek already mentioned, so those are dependencies that we have RPMs spec files for and we build them. We want to use superseded pipeline manager, because if we have a lot of reviews coming in, it will basically allow us to build all of the packages we need in a very simple way. Okay, so thank you. And I want to invite Yarek actually in here, so we can take all of your questions. It's good. Right, so thank you. We have some swag, so first come, first serve. A few power banks, a few tungsten fabric stickers. Run.