 And there, my name is Michael Henriksen, I'm a software engineer at Red Hat and I'm here at KubeCon North America 2022 to present from pre-population to disasters, manage and protect the state of KubeVert VMs. Okay, this session is for people that want to learn more about KubeVert storage, a lot of the guts and details there. Maybe you've contributed to KubeVert Core in the past and you want to do some storage related work. Maybe you're just interested in how it all comes together. Both great reasons. So this session, we'll talk a little bit about general KubeVert storage architecture. We'll dig deep into a couple specific API flows and integrations, not the complete set of flows that we support, but some new and important ones that we've been working on. And lastly, we'll talk about what's coming up, what you should be excited about in the future. And hopefully by the end of this presentation you'll be pumped and ready to make some real meaningful contributions to KubeVert storage. Okay, KubeVert storage architecture. Okay, so when I am asked to describe KubeVert in a couple of words, I typically say, KubeVert is VMs in containers on Kubernetes. And yes, that is a mouthful and probably means nothing to 99.9% of the population but to tech people, that could mean a lot because those are all kind of loaded terms. So I will limit the scope of I think what is relevant to this presentation. By VMs, I mean QMU, KVM virtual machines. And I also mean VMs with persisted disks. So VMs, you can start back up again and it picks up where you left off. By containers, I mean basically nothing special here, just that containers run processes and in this case, they're running virtual machine processes like QMU process. And lastly, since this is on Kubernetes, we're gonna be talking about pods and Kubernetes persistence implies persistent volume claims. So we'll be talking about that. Okay, so this is kind of the KubeVert storage architecture on the right is a persistent volume claim with a single file called disk.img. That is a virtual machine disk image. That is what QMU, KVM virtual machines expect to boot from. Most of the time they can be given kernels or there are other things, but for KubeVert, we for the most part, you boot from a disk image. So how does that plug into the rest of the architecture here? So when you start the VM, pod gets created, a pod has a container that runs a virtual launcher process for a launcher communicates with Linvert. So KubeVert starts QMU, which is basically your VM. And the VM boots up. It has a boot device. In this case, it is devVDA. And in there is, you know, presumably a kernel and what looks like a standard Linux file system. And yeah, so according to the guest VM, it just sees this device. Turns out that device is actually this disk.img file on the file system. So pretty simple. I mean, for the most part, I think file system PVCs are the most common. And this is what will probably be the most common storage configuration. Okay, KubeVert also supports block devices and it's not much different. On the right, we have a block PVC. The data on that block device is simply that disk.img from the previous slide but written out to the block device. And that block device is made available to the guest VM as, you know, devVDA in this case. And yeah, it works very much the same as before. Okay, I would be remiss if I did not mention birdAOFS. BirdAOFS is a great way to share PVC persistent volume claims between container and virtual machines. You cannot boot from a birdAOFS configured persistent volume plan, but it's great for sharing data and it all is accomplished via the magic of Fuse file systems. So in this case, we have a plain old persistent volume claim on the right with some files on it. It is made available to the guest VM via birdAOFS and that starts birdAOFSD, which starts also the Fuse client in the guest VM. And yeah, so basically the Fuse call that Fuse implementation is pretty simple just translate the calls from the guest to the client and mirrors everything. So yeah, it's a great way to share data between VMs and containers, no disk image files required, but of course you cannot boot from Fuse. So this VM has a regular boot disk as well. Okay, so we're gonna get into some flows, APIs, flows and integrations here. Again, not the complete set of things you can do with Qvert but just some things we've been working on recently and wanna put out there in the world for you guys to use and hopefully enjoy. Okay, so we're gonna talk about, we're skipping day zero, which I guess is planning day because on planning day, you decide to use Qvert and that was a very wise decision in my book and I'm sure the decision you will look on fondly. So to talk about day one operation for visioning, you want to set up your environment for maybe you're building an infrastructure as a service and you wanna make it really easy to provision new virtual machines. Okay, so first you have to talk about the data volume API for talking about persistent virtual machines with Qvert. If you've used Qvert before, you're probably familiar with the data volume API. It basically encapsulates two things, a PVC definition and the source of a virtual machine disk. When our data volume controller encounters a data volume, it will create a PVC and populate it with the source disk image. Often we use PVC and data volume interchangeably and yeah, so that's basically it. So if you can look at the state of volume definition, we see the source on this left here is a URL and we have the PVC definition down here, a 10 gigabyte rewrite menu block PVC. Here are examples of some other sources. We have a registry source, which we'll see later. We have a PVC source, which we'll see later. This is what we call also data volume cloning. This is the most efficient way to populate a data volume or PVC, assuming that your storage supports snapshots or CSI cloning. The upload source allows you to upload a disk image from your laptop. The blank source is basically just, we'll create a blank disk that you can mount in your virtual machine and create a file system to whatever you want. Okay, so this is, I'm gonna describe here briefly DIY cold and image provisioning. This was what we kind of initially suggested when we rolled out Qvert and CDI. So at the beginning of the pipeline here, we have your CID system pushing out disk images to a registry. Those disk images are getting imported into this golden images name space. In this case, we have a rail, a fedora, a bunch of image. And then the clients will create data volumes that refer to PVCs in that golden images name space. So you get that efficient cloning advantage. And that's all pretty nice. You have this catalog and golden images and clients can create their virtual machines from that. And that works well, kind of the first time through, but then some challenges kind of creep up. So, okay, well, how do you keep this golden images name space current? So the CICD system is constantly pushing out new images. There are security fixes and you wanna make sure that all your clients get the latest image. How do you do that? And then just the mechanics of how do you do that in a non-disruptive way? You can't just delete these data volumes. What if someone is using them as a source? So some challenges there. And it came up with a solution called the data import cron API. You basically, it will, you point it to a registry source. In this case, this Fedora registry demo thing, you give it a schedule in this case every hour. And then there's a new thing called a managed data source. So data source is kind of a sim link for a PVC. So in this case, we have a managed data source named Fedora. And then over here, when you're the consumer of this new API refers to a source ref kind data source in the golden images namespace. So it's not referring to the PVC directly, rather it is referring to this managed data source. So let's see how that works in practice. Okay, so we have the data import cron process down here is watching the container registry. It is configured to go every hour. It is polling for this Fedora image. Every time the container registry is updated, it will create a unique new data volume here and update the data source to point to the newest one. And then the clients, they simply refer to the data source that they are interested in. In this case, it is we're only showing the Fedora data source. So that solves the two problems. The data import cron is responsible for polling the registry and the data source gives us a level of indirection from the data volumes. Okay, so day two, we're gonna talk about some of our data protection flows here. Okay, first, we'll deal with the VM snapshot and restore API. This is a great way to back up the state of your VM in a cluster. Okay, so the API is pretty simple. It basically just give it a source VM here. In this case, it's VM one. And then in the background, it will do all this stuff. So it works for VMs that are running, VMs that are stopped. When VMs are running, I guess the couple of things to mention here are these steps number two and four where our snapshot code will integrate with the QMU guest agent. If that is running in your guest VM, that means you can run basically user-defined backup hooks like if you're running on my SQL server, you can configure it to dump tables, whatever any sort of queues and you wanna do in your application. We'll also FS-freeze all the mounted file systems. So I'm not gonna go through everything here, but essentially, once your snapshot is complete, you can't have an application consistent in cluster backup of your VM disks and your VM configuration. Of course, snapshot is not very useful on its own. So we have a restore API as well. So you basically just give it a target and a source snapshot name and it will overwrite the existing VM definition and create new PVCs with the appropriate data from the backups. And yeah, you're back in business with, so snapshots are a great thing if you're going to do a schema change on your database, you can create a snapshot, apply your schema change, make sure everything works well, if it doesn't work, you can restore back to where you were before, things like that. It's really great to protect for things like that, but how can you use this? They're not particularly useful in the case of catastrophic failure. If your data center is hit by lightning, your data's gone. So, and also the snapshot and restore APIs are kind of a bespoke, covert thing. How does any existing tool integrate with that? And yeah, so those are two problems that we've got to solve. Okay, so part of the solution we came up with is a virtual machine snapshot, virtual machine export API. The export API basically works great in conjunction with virtual machine snapshot. So you've made a snapshot, then you can do an export. And what export does is create PVCs from the snapshots and that virtual machine snapshot, creates a pod that has an HTTP server and serves up those disk images. You know, it's kind of for a long time in Kubevert, we worked very hard on getting data into the cluster and we only just recently with this virtual machine export API started getting, being serious about getting data out of the cluster. And yeah, we're doing that by serving up disk images of HTTP servers. The HTTP servers are secured by these tokens down here. So this is just a user token. It can be wherever you want. We will also auto generate tokens if necessary. So once the pod is running, the virtual machine export status will have, you know, certificates that can be used to securely download the data. It will also give you a directory of all the links. So if you have a snapshot that had a disk called, you know, test VM Fedora, this is the URL you could use to download that disk image. And there are a couple of different sections here in the links. External and internal external is set if you have a router or ingress in your cluster. So this is, you know, externally addressable. You can download this image from your laptop from another cluster from anywhere. Internal is, yeah, it's just an internal URL. It will reference the pod service directly. And this is kind of just a weird example of how, you know, that URL and the link section is parsed. So in this case, it starts off with vertexport if you add up my cluster, which will get you this blue box of the ingress here. And it will, the ingress will parse the vertexport proxy parts and pass everything else here over to vertexportproxy. And vertexportproxy will parse, you know, namespace name, whatever to get you to the vertexport server. And then the vertexport server on the end will, you know, serve up the data, the final URI here, volumes fedora disk.img. So that is kind of how this URL is routed and goes through the system to serve up your disk image. At least this is the external link case. So these are some use cases for virtual machine exports, disaster recovery is a good one. It's a good one. You can create a snapshot and then export the snapshot and stream everything to an object store or stream to a registry. It is also a great use case for migration. So if you want to migrate your VM from cluster one to cluster two, you can create the export and directly import on cluster two using those URLs. As we saw earlier, data volume support, HTTP import. So you can easily do that. And local sharing is another, you know, potential use case if two users want to share disk images in the cluster, you know, transfer an image from namespace A to namespace B, but they don't have permission to do the cross namespace work, you can use the API. Okay. And yeah, Valero plugin is, Valero, first of all, is a very popular open source Kubernetes backup migration tool. It will basically back up your resource configuration, you know, your YAMLs to an object store. It will also back up your PVC disks in a number of different ways either through, you know, if you're in a cloud provider, it will use the cloud provider snapshots or it can use volume snapshots or it can use Rustic. Anyway, it's very configurable and very popular and they have, and it works great with all the built-in resources in Kubernetes like deployments and replica sets, stuff like that. But it also, you know, it can't understand everything. So it has this plugin architecture and we have a Kubrick plugin. And the plugin kind of serves a couple of different purposes. First, it helps us with the object graph. Say a user wants to use Valero and they just say, hey, back up this virtual machine. Well, it's a very popular plugin as virtual machine. Well, backing up a virtual machine definition is not gonna give you a complete backup. So there are accommodations and the plugin to basically build up the entire object graph. So given a virtual machine, we wanna make sure if that refers to, you know, virtual machine instance type, virtual machine preference and the data, we wanna build up this whole object graph and make sure that it is complete. And then there are also actions that can be performed in this plugin architecture. So we wanna make sure that if a launcher pod, you know, the pods that are running our QMU virtual machines are encountered, you wanna make sure that that freeze thought API guest agent integration that I talked about earlier with snapshots gets executed. We also wanna kind of sanitize some resources like data volumes and persistent volume claims. We got some annotations to make sure that everything gets restored correctly. Basically, we don't wanna repopulate a data volume that has already been populated and overwrite any data. And yeah, there are certain resources that we don't want Valero to restore. So we show that object graph and part of that is a virtual launcher pod. We want that in the graph so that the backup hooks get called but we don't want Valero to recreate that pod. So that we wanna skip that on restore because we want our controller to create that pod. And similarly, we don't want, if there's a virtual machine that owns a virtual machine instance, we wanna create that virtual machine instance as well. So there are some weird rules that we have to make sure to get covered so that things get ordered and restored correctly. Okay, so that includes the APIs and flows section. And now what's coming up? Okay, one of the big things coming up is volume populator support. Basically volume populators are a community cap, community initiative. We kind of built data volumes initially years ago because there was no volume populator support. And volume populators make it possible to define in a standard way what data should come with the PVC when it is initially created. And just to show an example here, we have a data volume that has an HTTP URL import. The populator equivalent will have a data source ref field set that refers to populator.cdi.keeper.io which would be a new API group that kind of import. So basically just refers to another custom resource and this is what that custom resource could look like. It's basically what's going on here is your, the data volume has all the information embedded in it for the source. And with populators that is kind of abstracted out into another resource that can be shared, which is kind of nice. And here's the process for population. The big thing here is step one, the external provisioner will ignore any PVC with data source ref set. And then a set of populated controllers basically take over to get the right data on there and then we rebind the PVC that the user, since the external provisioner ignored the PVC, our controller is now responsible for binding it back up. So that's what happens in step six here. So that's volume populators kind of the new kind of club, Kubernetes native way of populating PVCs. We'd like to have support for that for all of our existing data volume sources. Okay, so another thing that we're doing pretty soon, we're gonna have cdi.keeper.io move to the one version, but before we did that, we wanted to enable this data volume garbage collection, which basically, as I said before, we often use data volume and PVC simultaneously meaning the same thing. Well, it's kind of confusing for people. So we're now offering garbage collection ability, similar to how Kubernetes cron jobs or jobs get garbage collected, we're gonna allow data volumes to get garbage collected. So once your PVC has been populated and it's got nothing left to do, it will delete itself or it will get deleted by our garbage collector. So it is configurable on the CDI custom resource. And yeah, since we're moving to V1, we have to finally get rid of the alpha API. Here's some other stuff. We're gonna be aligning to Kubevert releases. Currently, CDI is released every three weeks. We're gonna be releasing that, changing that to three times a year to be in line with Kubevert and staggered somehow with Kubernetes. We're gonna move the snapshot API to V1, beta one. We're going to have a new data volume source and probably a popular data volume source for volume snapshots. Do automatic size detection, yeah. And keep expanding upon the VM export API to be more useful. So yeah, that's what's coming. I hope that is in line with what you in the community are looking forward to, but we'd love to hear from you and we'd love to have more contributions. Any features that you want to implement, we are happy to work with you on that. It's a maintainer session. So yeah, we really want to be more active with the community on the Kubevert storage side. So we're calling on you to, yeah, let's work together to make CDI and Kubevert even better. Any questions?