 We want to talk today about how we use OpenStack in our CICD system. This is Stefan, I am Rick, and we both work in the release engineering team. So first talk, let's talk about our motivation. So we had a couple of hundred servers sitting in a data center. They was configured in special environments. And changing the configuration of these environments was pretty difficult. It took a long time. Additional to this because these environments were so special. Some of them we didn't use for several weeks or months. So we thought, how can we increase our usage of the hardware? So we decided to set up a cloud, use our own product for this, so have some usage of our own product first-hand on. Then, yeah, we added cloud. We increased the test coverage by being more flexible, more agile in how we set up the environment. We used heat for this. And also we simplified our CICD architecture. And of course, we couldn't put everything in the cloud in virtual environments. So we kept some of the physical hardware for special use cases like performance, scale, SIOV, DBTK, and so on. So this is how our architecture looked like. We had two data centers, one in Germany and one in the US. So in Germany, we did all the building of the product, mirroring, and also our Jenkins was located there. We then set up another mirror in the US data center. And the Jenkins agent was all located in the cloud so that the communication between the agent and the heat stacks was very efficient. But of course, this wasn't enough or is not enough. So we have some latency and reliability issues. So an idea of what we have, what we can do is to co-locate the Jenkins and the agents in the cloud to have better reliability. Also, another option would be to create another cloud environment in Europe. And because we want to test more stuff, we would like to enable SAP IPv6 and we also would like to use Ironic. And on how we did it, Stefan will tell more about it. Thank you. Okay, so some implementation details. At the core of our CI-CD system, I see something that we generally refer to as the cloud generator. And what that is, is as the name implies, it is a tool that is able to generate all the configuration elements that we use to deploy an over cloud. It is able to generate the configuration, the product specific configuration for the open stack lifecycle manager product, but also the heat stack that describes the virtual infrastructure that is used to deploy the under cloud. It combines basically two templates. It combines a control plane configuration template that describes how open stack services are spread across multiple node clusters and a network layout template, which describes how those open stack services are communicating with one another. And of course, on top of that, there is a wide range of parameters that we use, that we can be used to control things like the number of nodes in each cluster, which features or services, which open stack features or services can be turned on or off, all the way down to setting individual configuration options for open stack services. And of course, because we started out migrating from, migrating away from bare metal infrastructure to virtual one, we kept the support for bare metal clusters. So we were basically to seamlessly address, to basically to seamlessly target both virtual and bare metal infrastructures, which is a great thing. Next, I would like to walk you through some of the challenges that we encountered and the solutions that we found. And of course, the biggest one always is network. More specifically, getting the virtual network to match the expectations of the lifecycle manager. First thing we had to do is obvious. We had to disable both the ACP and port security for everything but the administrative network because we actually didn't use those features and they were creating all sorts of communication problems. Next one is also obvious. When you need Neutron to perform VLAN routing for you, you have to use Neutron trunk ports. This is a nice feature that Neutron has. And next one might not be so obvious. The lifecycle manager needs to manage some aspects of the administrative network that is more specifically allocating IP addresses and that can come into conflict with the underlying Neutron management for the virtual network. So we used the combination of fixed IP addresses and IP address pools for that and that worked okay. Next challenge was more complicated. We had to figure out how many resources to throw to using our CI CD system and we had to get a reliable CI and also get timely results on that. And the problem that we had to solve was essentially to maximize test coverage while at the same time minimizing resources each and run time. More specifically deploying all the open stack services that we support in our products including logging, metering and monitoring which is known to consume a lot of resources while at the same time being able to run a full set of tempest test cases in under 24 hours because that was the target of releasing maintenance updates and so the optimal resource which is that that you can see there and we've been using for most of our jobs we've been using three controllers to exercise high availability and two computes to be able to exercise live migration and other nice features like that. Also running tempest we've been able to run tempest in parallel with four workers and finish everything in an average of 18 hours. The other thing that we had to fix was getting the volumes in the correct order the correct order was the actual order that the lifecycle manager was expecting it to be. Nova ignores the mount points configuration that you give into the heat stack. Cloud init offers a solution for that but it's quite intrusive and difficult to set up so we used heat alternative depends on attributes and test resources for enforce the order of creating resources and insert delays wherever necessary. Another thing which we hit right after that was reliability issues basically timeouts everywhere. The performance of our cloud was not good enough so we went back and looked at the problem. First thing that we did was turning off spectrum meltdown mitigations to get back some of those virtualization cycles. Another great thing that helped was adjusting the API and RPC open stack service worker count everywhere that was an issue but the thing that was really important was using the right type of storage for services like MariaDB and storage. Basically MariaDB has really serious latency, really serious performance issues when it's running on top of distributed storage because of latency and nested self requires quite a bit of tweaking to get it through work right. For all of these problems using storage that is localized to the compute node was just the thing that we needed and that's how we solved this problem. So when you're hitting quota issues in your CI what you want to do is you separate your projects for CI and development workloads and when you do that you have to share flavors and images between these. The other thing that we had to do is to handle the fact that load is variable across compute nodes and yes, we did went back to the project and we fixed the performance issue for that. I'm afraid I'm not gonna have time to finish so yes, that's all we could do in 10 minutes and thank you very much for your attention. Have a nice day.