 Hi everyone, so my name is Brian Carey and I'm pretty new here, so I work for Red Hat and I'm the upstream QvertCI maintainer. I'm here today to talk to you about how we build QvertCI with CentOS Stream. So just for some background for anybody that doesn't know, what is Qvert? Qvert is a virtual machine management add-on for Kubernetes. It allows users to create, run, manage VMs in Kubernetes clusters. It's considered a production-grade hypervisor, and as far as I know, it is the leading virtualization add-on for Kubernetes, so it's the leading way to run VMs in Kubernetes. Last month, we had a very big milestone in that Qvert finally reached version 1.0, so that was a big release for us. Testing a project like Qvert comes with its challenges as it integrates a large number of different projects and sub-projects, but in Qvert proper, our testing can be broken down into two main categories, which is unit tests and our larger E2E tests, so our end-to-end tests. The end-to-end tests actually require running against valid virtual test clusters that we spin up. One of the other aims in the project is that we try to stay as close as possible to the methodology used in the upstream Kubernetes project, so they're testing methodologies upstream. This means that we end up using their kind of ecosystem of tools like services like Prow, which is basically like a Jenkins for orchestrating CI jobs against Kubernetes clusters. So I can hear you thinking, okay, that all sounds great, but where does CentOStreme come in? So CentOStreme is the solid base that we use for our virtual test clusters, which are then schedules to larger workload clusters. So these clusters are spun up for our end-to-end testing. The test cluster node images, there's a specific project that we have, a sub-project within the Kewvert organization called Kewvert CI, and this project is responsible just for building these cluster providers, as we call them. Overall these cluster providers, the virtual ones are based off of CentOStreme vagrant images. So we basically take the latest CentOStreme vagrant image continually as much as we can. So the building process, all of this is done through automation, but the general guideline for it is we basically have Fedora containers spin up. This Fedora container has all the tooling required to start up the CentOStreme vagrant image. And then we have a tool for orchestrating the provisioning of these images. We call this tool, it's written in ghost, it's called Ghost CLI. This tool basically just orchestrates which scripts are run where, as some scripts are run during provisioning, and some scripts are run as cluster runtime, basically. Once the provisioning scripts mainly focus on setting up network and storage requirements, there are many dependencies that we have for installing Kubernetes with QBADM. So these scripts are all run against the VM at provisioning time. Once these scripts complete successfully, the VM inside the container is then shut down and we then commit that container image as an image, as a container image. So that image is then committed with the updated VM image inside. So okay, that's great. We have a node image, but how do we know what we built is actually a valid Kubernetes cluster? So within the QvertCI repo, we have a number of test lines that run against these new, any changes that are pushed into the QvertCI repo as pull requests. So these run a subset of the Qvert end-to-end tests, and they also run the suite of Kubernetes conformance tests. So that we know what we're building is an actual valid Kubernetes cluster based on center stream. Any merge changes to this QvertCI repo leads to new cluster provider images being published to Quay. So they're always available there. When our automation picks up that there's a new image in Quay, this image is then picked up, and a PR is created against Qvert-Qvert proper, so that we can run the full suite of end-to-end tests against this image as well. So we do have some protections there against running against the latest center stream where we rarely hit on the issues. So this is kind of a picture I drew up, part of the reason why I wanted to do this talk was to spend some time drawing this up from my own mind. So this is how it looks when it all comes together. So these end-to-end test pods, they're basically scheduled to our worker nodes. The ones here below just reflect the top one. It's the same thing throughout. Generally, within the end-to-end test pod, we have a podman instance, which spins up our node container image that we've just published, and then this node container image in turn starts off the center stream VM, and then the Kubernetes cluster gets to a running and healthy state following some runtime scripts, and then we basically install Qvert against it, and we run our test suite, which basically Qvert uses this vert launcher pod concept for spinning up VMs within these vert launcher pods. So basically those test VMs that we spin up inside that test suite are running in nested virtualization. The bare metal nodes up until recently were also running center stream as well in production. Some of the benefits we saw from using center stream, first of all, it's an extremely stable base, so 99.9% of the time we don't hit any issues. So it's just smooth sailing 99% of the time. It allows us to catch any potential issues earlier than we would have previously using the old Centus 8 model. So previously the providers were built with Centus 8, but we've moved to stream. We've moved to stream 8 and stream 9. Over the last year, we've hit a couple of issues, really only a handful of issues. A couple of examples here. So we hit a couple of kernel bugs. We hit an issue with network manager and some DHB clients. And then SE Linux policy changes can trip us up every so often. There are components in Qvert that basically require certain privileges and center stream can't be aware of that. So then we have to talk between projects to see what is the best way forward in that way. Sometimes we make changes on our end, sometimes center stream makes changes. So overall, center stream is a very good target for us for testing as it really reflects our main downstream product, which will be OpenShift container native virtualization. So it's very good at reflecting the environment and it also allows us to catch issues very early. And so problems we've hit, I really struggled to come up with any major problems here. I had it on my overview for the talk, so I said I'd better include it. So issues are very rare, as I said on the previous slide. Sometimes we get blocked with center stream issues, but they're not center stream issues, but issues that we hit. And that blocks us from delivering new providers. So if a new version of Kubernetes comes out, sometimes we could be blocked from testing against the newest version of Kubernetes because we have some issue back here in center stream. The second point is just a pet peeve of mine because we're always testing against latest. Basically, we have the latest kernel in center stream, and then we're running our automation, and the automation fails because the kernel modules aren't there yet. So that's just the best moment we hit every so often. Then I have community members coming to me and going, why are these lanes failing? And I'm like, just give us 20 minutes and they'll be there. So this always starts up the conversation. Within the Kuber community, we want to pin to a certain version of center stream or not. In my opinion, we get way too much value from running against the latest in order to pin it, and we do not want to be managing uplifts of center stream because we will fall behind. So I generally just prefer just to go to latest. Sometimes we'll pin an individual package for a certain period of time just to get around the problem, but that's always a temporary measure. Running off the latest just gives us too much benefits. So yeah, as a bonus content slide, I said I'd show you where to get center stream VM images for Kuber. So if you have a Kubernetes cluster and you have Kuber installed somehow, and you want to run a center stream VM, we have container disk images that we build and publish in Quay as well. So these container disk images are used for ephemeral VMs normally. They're very handy for CI or any kind of testing that you might be doing. They're basically based off of the center stream cloud images. So they're loosely wrapped around those. So basically any of your configuration will be cloud-net config that you would need for that. Yeah, and the image registry also includes a handy YAML example to get a stream VM up and running very quickly. I was going to attempt to do a demo on it, but I chickened out towards the end. So yeah, here's just a screenshot of the landing page as a result. And as you can see, the YAML is quite brief and it's probably the small on the screen there. But you have your cloud-net config down here and you just add your cloud-net config and then you can do whatever you want. That basic example will get something running, but it won't be much used to you. You won't be able to sign into it. You won't be able to get into the VM or anything like that. So you add your SH keys, your authorized keys or whatever to that cloud-net config. So yeah, just to conclude, Qvert loves CentOS Stream really. It's not just, as I alluded to previously, it's not just using Qvert CI. It's used all over Qvert. So all of our Qvert artifacts are built in a CentOS Stream workspace. Qvert actually relies on the CentOS Stream virtualization stack. So we actually take Levert and Kemu from the CentOS Stream 9 repos and we actually use those and build those into our virtual launcher pods that I mentioned earlier. So they're actually the components that are actually starting up the VMs in the Kubernetes cluster. So that's all Stream 9 virtualization stack. And our production CI workloads cluster was deployed on CentOS Stream to very recently. Unfortunately, the burden of maintaining OS updates and Kubernetes updates on that cluster was too much. So we moved to OpenShift just recently. The updates were on me to do so. It was too much. Yeah, so then just to finish up, just to say thank you to the CentOS community and if there's any questions. Q I asked about the vagrant layer. Was that a decision you made? I'm curious a little bit about the vagrant layer. Was that a traditional decision or is that something that you chose? Yeah, it was a long term decision. It was a long decision made a long time ago and it's been carried forward. The vagrant image gives us kind of a handy user setup and login details into the vagrant image and we just use those then throughout our automation. So to change it would involve a bit. But the vagrant images have been working quite well for us. Awesome. Yeah, it's good to hear from vagrant users for sure. This might be a basic question. I missed the beginning, unfortunately, because I was between rooms. Can you explain to me what's the elevator pitch for KubeVirt? It's your virtualizing inside virtualization? No, no, no. So the KubeVirt product itself will be, that's only KubeVirt CI. So that's only how we're running our E2E, our end-to-end tests. So that nested virtualization only happens within our test suites. But KubeVirt itself, for anybody that has large Kubernetes clusters that they want to run VMs alongside pods, for large organizations who might be migrating to a Kubernetes workflow and they have existing VMs, they can easily just take that image and deploy it straight onto their Kubernetes cluster or OpenShift cluster. And then they basically, it helps to transition to that Kubernetes kind of way of working. So the VMs are basically treated as a container, well, not the word container, but a place that apps go. It's not that the containers are running on top of that VM. You could. You could run, you could go all the way down, but no, generally the VMs would be, yeah, just running inside pods and they'd be more or less, they'd be kind of like, like, monolithic services that would be running there. Okay, okay. And then the other question that I had was when you're doing nested virtualization in your CI. Yeah. Is the environmental difference when you're doing that, is that limited in terms of you don't find certain things because it's not quite the same environment? Not really. There is a performance impact. So there's a slight performance impact. So when we run our test suite on bare metal, it runs a lot faster. But the nested virtualization, there's a small performance impact there. So some of the tests may have, may take longer. So we do have kind of, we have a flaky test process that we have to try and identify these tests and make sure that they're kind of fixed to fix those kind of flakes. And then is it running on open QA in Fedora? No, no, no, this is all running on a big, large bare metal workload cluster that we have. Okay, thanks.