 Sounds good. So thanks very much for joining and everyone stay after the video. Video is about half an hour and then we'll have a question answer section with Mike and David. Okay, I'm going to start sharing the video now. Hi, this is our dev talk. The topic of our talk is using containers and VMs on free public CIs for fun and world conquest. Just a brief intro. My name is David Davis. I'm a software engineer at Red Hat and I work on the PULP project. My name is Mikey Paolo. I'm a service reliability engineer at Red Hat and I also work on the PULP project. Okay, so a quick overview of our talk. First, we're going to talk about PULP, which is the project we work on. It's sort of needed to understand our presentation. Next, we're going to talk about the problems we faced working on PULP when we used hosted CI environments. After that, we're going to talk about what do you use containers or VMs. And then we're going to talk about the different solutions we tested and what our results were. And then lastly, we're going to talk about some tips for using VMs and containers with hosted CI environments. So a quick rundown of PULP. PULP is a software project. You use it to organize packages of software into repositories. It has a plugin architecture and we support a bunch of different content types. You can see some examples here, Python, NPM, RubyGems, even Docker images. And you can create a plugin for any content type and extend PULP to support that content type. Lastly, PULP is entirely free and open-source. It's on GitHub. So if you want to check it out, there's only right there. So what are hosted CI environments? So these are typically free for open-source projects. They're public. They provide a way to build and test out your code. And also these hosted CI environments, they let communities fork and test using your CI configuration. PR uses the same CI environment as you to test out your tests and stuff like that. So for PULP, we use two different CI environments or hosted CI. We have to experience with both Travis and GitHub Actions. And so overall, they're pretty similar. They do have different cloud backends. One uses Google. One uses Microsoft Azure. They have the same operating system for Linux. They have similar specs on RAM and disk space. Their virtual CPUs are also similar. We ran P7 zip benchmarks against them, which is a good indication of CPU speed. Those were pretty similar. The one notable difference was the disk performance. It seemed like GitHub Actions was almost three or four times faster in terms of both rate and write than Travis. We speculate that probably GitHub Actions is using SSDs while Travis is probably using hard disks. So what's the problem with hosted CI environments? Some problems that we encountered is that there's a limited number of OS or distros by default. Also the environment is preconfigured and it comes with a limited set of software packages. There's also typically limited hardware resources, which can make virtualization hard. And then lastly, you can't reboot or replace the kernel when using a hosted CI environment. So to solve some of these problems, you might turn to containers. So when would you use containers? Containers are pretty lightweight, so you probably want to use those. Some cases where you would want to use them. Suppose you want to test against a bunch of different distributions of Linux. Containers provide a great way to do that. Sometimes you might need tools like DNF, which aren't available on all distros. This is actually a use case for us. We have an RPM plugin and we want to test using DNF to install packages from pulp. Of course, people who doesn't have that, so we use it. We spin up a Fedora container to test that out. You may want to use certain versions of software. You can put that into a container and not pollute the host OS. That's a good use case for containers. Also, we had a case where we're using a Python package that's provided as an RPM and it gets installed system-wide. So we use a container for that to install and test out that package, which we don't package ourselves. So when would you want to use VMs as opposed to containers? One of the big use cases that we ran into is that we wanted to test against kernel modules or modes that require certain kernel parameters. A couple of good examples are SC Linux and app armor. Also, we support FIPS 140-2. This is a government standard that forbids the use of weak cryptographic algorithms. And since Holstein-Rail actually provide a mode that you can enable with a kernel parameter, and it will actually the use of some of these algorithms. Of course, you can't do that on using a hosted CI unless you use virtualization. Another good example is there are certain types of projects. Maybe you're working on a bootloader or a kernel module and you would have to use VMs in order to test out those things. So let's go into the nitty gritty details of virtual machines versus containers because this affects whether we can run them at all in these CI environments and how we have to go about running them. So like we mentioned, we prefer containers over VMs. We only use VMs when we can't use containers. And the containers have a number of advantages. Because the container, the all the instance, all the multiple objects that are running share the same hardware and the same kernel, there's no performance penalty for running them. You get 100% individual CPU performance and if a process needs RAM, it seems like the other process. Whereas in VMs, you have like a 10% CPU performance head on average. And on top of that, the host and each guest have separate unused RAM. So you could have one guest that's swapping heavily and slowing down the entire system, one other guest has a gigabyte free and the host has two gigabytes free. And there's little you can do about that if your VM is already running. Also, virtual machines require a special hardware feature. This was implemented about 15 years ago. But rarely can you run VMs on top of VMs. Rarely is nesting as it's called compatible. That hardware feature is just not exposed to the guest or exposed to another guest usually. So it's important to note that both Travis and good at actions, they are running in VMs. Technically, they're cloud instances and they are ephemeral and have temporary state. But for the rest of the administration, they're referred to them as VMs. Next slide. So as we mentioned, we love running containers on these virtualized CI environments. And that is our preferred approach. And because containers on virtualized and actually always work, there's no reason why you can't run a container on top of a virtual machine. Our preferred container runtime environment is Podman. But the environments do bundle Docker. You can use that too. Docker does have disadvantage like a daemon that lunges root, but this is a temporary scratch test environment. That's not as much of an issue. But any other container solution nowadays will work because these containers just use standard features in the Linux kernel to run. Now, virtualization on containers can work as well. It depends on whether the containers have privileges to run them. But that's not relevant for the rest of the presentation, though, because we wouldn't test these CI environments used virtual machines. At least the ones we've tested are limited to virtual machines. Next slide. So we have good news, though. We discovered that KVM actually works on top of Travis. The nested virtualization works there. We tested it on Ubuntu 18.04 and we tested it on Ubuntu 20.04. This was a surprise to us because Travis doesn't advertise the fact that they support nested virtualization. And although a recent Q&A said it's not possible, but we ran all these commands. We ran commands like KVMOK and more detailed commands to see the capable use of the virtual CPU. And it was there. And we've been running it successfully in our CI now. Unfortunately, this does not work on top of GitHub Actions, though. GitHub Actions, they may in the future use newer instances that do support nested hardware virtualization, but not yet. Next slide. So you may be wondering if, you know, is there anything we can run besides virtual machines and containers, especially if I want something like a virtual machine on GitHub Actions, which we prefer anyway, because it's faster. So anyway, we're also wondering what did people do before hardware virtualization was introduced in 2006 or so? Are these old solutions actually a more elegant solution from a more civilized time? Are they like Lightsaber versus a clunky blaster? Next slide. Well, the answer is, unfortunately, a resounding no. So in the Linux world, the term Plan 9 is used because it was the name of another open source operating system. And the movie Plan 9 for Outer Space, that's the same name, Plan 9, refers to one of the worst movies of all time, possibly the worst. And one of the reasons Plan 9 for Outer Space is so bad is that it's convoluted in complex plots. Basically, aliens want to invade the Earth, and rather than using something simple and straightforward and efficient, like shooting us with laser guns or propelling an asteroid at Earth, they just decided to raise an army of zombies and have them do the work for them. It's convoluted in complex. So if our preferred plan, Plan A, Plan 1, is containers and our backup plan, Plan 2, is virtual machines, well, let's see how bad the plan's 3 through 9 are, or how decent they are. Some of them actually are decent, acceptable as I'll explain. I don't want to grip on all these solutions, but still. Next slide. So virtualization has been around longer than 2006. VMware basically, well, IBM invented it in the 1960s, and then came to x86 computers and similar platforms in the early 2000s. So the early and mid-2000s approach was software virtualization. And there still is an open-source implementation of this virtual box which can run in a harder virtualization mode, and by default does, can still run in software virtualization mode. And basically, the way this works is that most of the time when the software virtualization runs the code inside the guest, it runs it as is. It just passes it directly to the CPU. It has no translation to do. But for some of the instructions, instructions that run at its ring zero, it has to translate those into instructions that the host can run or can run safely without clobbering itself, its own OS. So because of this, there is a performance hit. And in order to overcome that performance hit, these software hypervisors do a lot of tricks. They do some of the tricks that include patching the binaries that it's running to replace the more efficient code paths. When all these tricks work well, you have about two-thirds of the CPU performance that you originally had. So we tried this on both CIs, and it does work on both CIs. However, we soon discovered that it's even slower than the two-thirds performance. We discovered that we assume that's because all those patching and other tricks have not been maintained over time. It has all these other limitations that they never bothered to implement in the software mode, such as only one virtual CPU and 32-bit guest only. Nobody uses 32-bit CentOS 7 and 64-bit CentOS 8 doesn't exist. So that's not a OS you really simply want to test against. You only want to test 64-bit CentOS 7 and 8. And then on top of this, we discovered that this feature was actually removed in VirtualBox 6.1. Like the command line argument is there, but the GUI grays it out and throws an error basically saying, you know, feature not supported. So the last version of VirtualBox to support this was VirtualBox 6.0. However, it is no longer supported. When you combine the fact that this old version of VirtualBox is no longer getting support, it's not getting any micro updates, any patches. The fact that it actually requires an out-of-tree kernel module, that means that any day now, Ubuntu is going to issue their own kernel update and you're not going to build it or compile it on Ubuntu anymore. So we do not recommend this approach. Next slide. So let's go a little bit further back in time. In the early 2000s, before any open source hypervisors existed, there is emulation and the most mature emulator by far is QMU. Now basically the way this works is that all the code running against and the guest CPU has to be translated so it can run the host CPU. In fact, the host is basically just breaking down every instruction and re-implementing every instruction. So at best it has a one-tenth of the CPU performance of the host and this all the supplies, even if you're emulating XG6 on top of XG6. So this is, it's a lot of work to emulate. It's a lot of performance penalty but it is actually very elegant and works really well actually. QMU on top of get-of-action Travis works exactly as intended, which is one-tenth of the CPU performance. However, originally you do get all the benefits of that, like QMU shares code with KBM basically. So you have multiple virtual CPUs, 64-bit guests, you can use all the other features and like options to KBM that you would only pass. And it's also very simple for us to to configure a vagrant to use QMU. We'll cover this later but because it's so slow and it's basically your last resort. If you're running a small utility like say a boot loader, it's fast enough but if you're running a fronter boot like a Linux distro, it'll be painfully slow. You don't want to have like a 10-minute or so just a boot into your OS or more even more than that. Next slide. So it's worth noting that there have been other solutions over the years that are various types of software virtualization but these two mature open source ones both require the host to run a special kernel. Therefore, we cannot run them either get it back into Travis because we can't replace the CI's kernel. Next slide. Okay, so now that we have our preferred solutions, any container runtime or KBM VMs on top of Travis specifically, let's we now need to actually run our test code against them. And like with most CI environments, you're generally writing scripts. So we do a way to program access these VMs and containers to run our scripts against them or individual commands against them. We'll cover how to do that and we'll also cover our tips for optimizing performance because as mentioned previously, these environments have very limited resources, especially RAM, only seven or eight gigabytes. Next slide. So to access containers, this is the first four point is very well known. There is the PowerMan run and the Docker run command that you can use to start your application. You can also just run commands individually. This is designed around the premise that your container runs only one process at a time. If you need multiple processes running, you can share file systems directly or you could also between them or you just run the containers independently. They don't share data on disk. But say you do have a need to run a process inside a container that's already running. Say you have the daemon already running, you ran it with PowerMan run, but now you need to launch a test suite against the system that say looks at the local disk to verify the data there correctly or needs to use IPC, for example. Or whatever it's maybe alternate, but it's the most efficient way to do it for your use case. So there is another command that may not know about which is called PowerMan exec and Docker exec. It just launches an additional process or an interactive shell in the container that's already running. Next slide. So how to access virtual machines? Now this is a little bit trickier. So the fault behavior of a virtual machine as used with like Burt Manager or LibBurt with a VNC connection is virtual keyboard, virtual monitor, and virtual mouse. That's not something you want to script against. And so the most elegant way to access it is to use a bunch of the utilities that are called CloudInit. Now these are meant for cloud instances. And the containers, the VMs are created with the CloudInit utilities on them and they expect some configuration so that the SSH client can access it. Fortunately, we can use these KVM style cloud images as LibBurt KVM VMs. And I say fortunately because these KVM cloud images are very common. Basically every single distro creates them. They make the list as open stack format, for example, but they're perfectly usable in KVM. So you would start out this out by running a CloudInit command called cloud local DS. This actually creates a virtual floppy disk with a config file on it. Config file with settings like here's the IP address of the VM and here's the SSH key I want to use to access it. And then you call the virtual install command from LibBurt to install the container, I mean to install the VM with those booted with a floppy and it boots up, uses IP address and installs the SSH key and then you can just SSH into it. You have a fixed IP address. You can access it that way. That's the simplest way to do it. There are guides online for using CloudInit with our local DS with all the commands spelled out, but it's not too bad. It's only like four or five commands total. And once you have that status connector, of course, you can run individual commands against the VM or you could install, pass over a script and run that script. Next slide. So there is a more convenient way to do all to create these VMs and it involves less scripting on your part and it's also has lots of cool features to make things more convenient for doing other stuff against it. It is called vagrant. The famous of vagrant is that it helps you create a development environment for your application, which could also be a test environment, of course. It has a Leverd plugin. It creates the virtual machine. It does provision it with a prisoner code such as Antibode code or Chef code of Povet and also facilitate all the SSH access. So the advantage to it is that it's very convenient to use and has some features you might like. However, I always hesitate to increase the VM usage on these cloud instances. It uses a different set of images called vagrant cloud boxes. This may be disadvantageous or may be advantageous to you. It depends on what that is and images are to your liking though. That's worth knowing that PULPS, the CI that PULPS is developing actually does use this. Next slide. Previously, I mentioned that the virtual memory on these CI instances are very limited. It's about seven gigs on each. Having a little RAM is a very big deal when you're running virtual machines in particular because if out of memory needs to kill a process, it can kill the entire VM. Even if it's containers, it can still just kill the process running a container that you may rely on. So GitHub Actions actually gives you a four gigabyte swap file to overcome their small virtual memory. But we recommend creating this swap file on the as well. This is we recommend creating a swap file on Travis. X swap on the first place on Travis. It's worth, you may be wondering why swap files. Why don't you have a swap partition on your hard disk on your developer system or a server? Well, swap partitions are preferred, but swap files with a performance penalty do work well or the only way we can create swap space on these cloud instances because there's no blank virtual hard disk for us to access. GitHub Actions actually already uses slash mount for their four gigabyte swap files. We just create another one that's, you know, take up the remainder of slash mount. That's our recommendation. And once again, GitHub Actions will have much faster swap space if you do end up accessing it. Next slide. Another performance tip we have, which applies equally well to containers and VM images is to pre-build images with your application's dependencies. So why pre-build with dependencies? Well, with most software projects like pulp, our dependencies, which are in our case a mixture of C libraries, usually from RPMs or devs or the OS and Python dependencies change rarely. They'll change like every few weeks or once a month or so. In contrast, our code changes constantly. So we want, whenever we want to do a pull request, we don't want to spend time making it always wait for it to install all the dependencies. We just want to install the current version of pulp itself and test it. So therefore, we recommend using a type of CI job called a cron job. Technically, it's not the cron like you used to, but it's a CI job that's scheduled some of the cron. And with those, so in those cron jobs, you'll build a container image or VM image and you'll push to the registry at the end. So the registry would be quay.io or docker hub, bacon cloud, or directly your own place for storing VM images. Here's the URL. And then for containers, of course, pulp can also serve as a container image registry. We highly recommend using pulp as well. So I throw that out there. A little bit of self-motion. So with containers, this is a very well-known process. You can typically, the advice is to create a container file, aka docker file, or use any other solution you want with a very flexible pipeline and with the pipeline build command. But say you're just trying to do something quick and you have some other reason why you need a running container, the bunch of running processes, and then you save the current state of the container at the end. Or you can also use the pipeline commit command. It basically takes the container's temporary storage, never changes it made to the slash file system, and it creates a new container image based on that current state of the container. Similarly with VMs, a vagrant box repackage is a great way to, well, with vagrants, the images you use are called boxes. And if you're done making changes to your vagrant dead environment, you just want to repackage command and create a box based on the changes you made at runtime. Very similar to pipeline commits. Another option that was missing as well is if you want to robust where to create virtual machines, according to reproducible logic, hacker is an open source utility that can do this for KVM. Next slide. So our final performance tip is if you need to run Kubernetes. So remember, Kubernetes is a container infrastructure and it has a running daemon on a particular node that controls the rest of the nodes, even if it's only one node. This daemon is quite CPU and memory heavy. So we try to multiple lightweight versions of Kubernetes, and either they wouldn't work on because they were on GitHub actions because they were actually using virtualization. That's how Minikube works, for example. Or they were not as lightweight as K3S. K3S turned out to be the most lightweight. So our recommendation is that if you are testing against Kubernetes, like you are developing a manifest or an operator for your application that use K3S for this. At install time, it'll default to using a bundled container D, but it also offers, lets you specify when you use Docker and the ECI environment to actually preinstall Docker. You just enable that as a service in your CI config, and so you can do that as well. Next slide. That's the end of our presentation. Any questions for the Orange and for World Conquest? Thank you for an excellent presentation. Welcome back, Mike, and I'm adding David as well. So if anyone in the audience would like to ask questions, please do so. Mike and David, do you want to add any comments or thoughts? I would just like to point out, we've been developing our CI a little bit more since the presentation, and it's been a lifesaver of being able to one CI test for SC Linux in FIPS mode, which we had a requirement for you now. Yeah, very much, very much so. Okay, if you have any links you want to share, feel free to do that, or anyone if you want to share contact information, or how anyone wants to get started, or anything like that. Let me think for a second. I could definitely share the instructions on like using like cloud in it. That's something we referenced there. There's also, I'll dig up that link right now, but there are multiple and sorry environments out there. They get hub accidents and Travis still have these getting started guides, but you'll be adding the virtualization container there on top of it. Okay, well thank you very much. We certainly appreciate the presentation and it's very well done. Yeah, I'm sharing the link right now. Yep, and thanks David for pasting the link to pulp. No problem, yeah. Yep, thanks very much. That's great. Okay, well, thanks guys. I certainly appreciate it. Yeah, thank you. You're welcome. Thank you for having us. Yep, absolutely.