 Plenty of seats left. So hello and nice to see all of you here. My name is Fabian Deutsch and I'm working for Red Hat and my background is basically I've been, no I am a Fedora user for the last, I don't know, 17 years, 18 years with adventures to Gentoo and all the other distributions, to the Ubuntu and so on and so forth and finally I landed at that place where we get a Red Hat when onboarding. Today, and the last four years I spent working really on virtualization at Red Hat on Rev which is the downstream product of the upstream overt project which is taking care of having a virtualization management system for VMs in the classical sense. And today I want to talk with you, yeah please ask questions if you have them or we defer them to the end but today I would really like to speak with you about pet VMs in Kubernetes and what the heck, how can that work? Because, you know, the question is why, why do you combine these words anyway, VMs and Kubernetes? So how many of you do no Kubernetes, how many of you do no containers? No, about the same number and how many of you are using virtual machines? That's good, so that's an ideal, then you're in the right talk. So here it is about to see how can you take your pet VMs you have today and run them in Kubernetes or if it's possible at all, if it's feasible, if it makes sense or if it doesn't make sense from our perspective. So to get into the topic just to let us define the terms pet VMs and Kubernetes or containers before. So if I speak about pet VMs then I really mean VMs which you care about. I mean you take your MySQL database and tune it for 15 days so that it has all these specific properties you want or you create your virtual machine which is for your workmate providing a desktop and you set up multiple monitors, you set up USB forwarding for, I don't know, your USB camera so he's able to do Skype. So a pet VM is really something handcrafted. You spend time on modifying it, on making it specifically work for you. And you notice that you have got a pet VM so when you delete it you say, my God, I deleted my pet VM and it's really an issue that I deleted it. And then somebody is probably telling you to, well, I guess you have a backup and you can restore it. On the other side, we've got Kubernetes. I don't want to directly compare Kubernetes and pet VMs because one is a management system, the other is an instance basically. But I took Kubernetes because in Kubernetes they care about containers. And containers are to some degree similar to VMs as they run a workload. So they run something for you. But the assumptions about the container are different. So a container is an instance of an image which is pre-built and you can create a thousand gazillion instances of that image and all of them are the same with some minor differences like I don't know. Different MAC addresses and so on and so forth. And you know that you've got a container if you notice that, I deleted my container and you don't even notice it. So it really is, it's normal that containers go away. It is expected that they go away and reappear on a different host. But they serve a specific purpose, that is what they're built for. So that is for these two terms. Now if you, and Kubernetes, yeah, let us, and containers and VMs are both run and managed by management systems. So on the one hand side, you've got a container cluster which, for example, you can use Kubernetes for to manage a cluster where you can run containers on with all these assumptions. I mean Kubernetes is specifically built to run containers in the way they were created for. So we've got, for example, replication controllers, all of you who lifted the hand know it, replication controllers are there to keep a specific number of containers or pods in the Kubernetes sense up and running on all the hosts available in the cluster. If a host goes down, Kubernetes does not care so much but just ensures that the missing containers are brought up into different place again. On the other hand, we've got different cluster system for VMs, like Overt, where you also got a set of hosts which you manage with a central interface and you can instantiate your pet VM and then you start tuning it, maybe use Ansible. And maybe it's even pinned on a specific host because you need specific hardware. So, and the assumption on where this management system is built around is about taking care that you can really manage the VM with all its details. So multiple displays, device path through, USB path through, optimized protocols to view the content of the screen like Spice. So all of this is built into the management system. But both of them are management systems to some degree for their specific workload. So for both management systems, in their sense they differ but if you take a step back, both within the management systems which is tuned towards a specific workload, so their workload specific management systems. So these management systems differ in their workload specifically and the functionality tuned towards it. But they also have stuff in common, the infrastructure. So if you look at containers, they also need storage to store your data. They also need network to get access to your network to expose their functionality. And you've got the same stuff on the VM side. You also need storage to store your computer results. You also need network to expose your VM to get access to it. So despite the differences, there are also some things in common. Scheduling, for example, is also something which you don't see at a first glance. But both management systems need to take care to find the right spot on any given host in the cluster and decide do I put my workload on that host or on a different host. Host lifecycle management is also something you normally or you often don't see. But you also want to have a way to say, for example, I want to update my host. So I need to make sure that all VMs running on that host are moved away so that I can safely update the host and apply a new kernel update. So that being said, if we had a management system where we do have the same, the goal now of Cupid is to have a management system where the infrastructure layer is shared between both workloads so that you can run VMs and containers on the same infrastructure because of infrastructure is so much the same. The question is how we can get there and that is what it's Cupid about. I mean, there have been a few approaches how to get VMs into Kubernetes. And they were in a specific way, which we'll get to in a moment. And that was mainly the reason of why we want to do it because we want to reuse the infrastructure and the side benefit is then that you're able to run both workloads of the same infrastructure, which is pretty efficient. And allows you, by the way, the same migration from the one system to the other or the other way around. Okay, so what we did is we did the first try. To achieve this. So we tried, how can we, or we approached this problem because we said, we want to use the infrastructure of Kubernetes. So how can we run VMs on it? And we put our hands on it. The first try we did was we tried to replace Docker back then. It's about, I don't know, a year back. We looked at Kubernetes and looked, okay, how can we do it? We went to the Kubernetes source code and looked, okay, let's replace the Docker calls with a QML command. I mean, from the high level in detail, it looked a bit different. But yeah, so we did it. And you can spawn a VM by Kubernetes. The QBlit, the host demon, can be modified to launch QEMU instead of Docker. And bring up a VM instead of a container or a pod which are multiple containers. But in the end, it failed, it didn't look right. So why? Because we noticed, we got Kubernetes to launch a VM. And by the way, today, that's much easier to do because there's the container runtime interface and there are several implementations like Run-V and the vertlet, which do what we did in the source code with a defined interface. But it's so, for our use case, where we say we care so much about VMs, it looked ugly because we now had running VMs but they were defined by the pod specification. So the pod specification is basically describing how does the container or the number of containers look which are running on top of Kubernetes. And this specification was not expressive enough to describe the pet VMs we wanted to run. We had no way to specify the number of monitors to use or which PCI device to use or I mean, it's quite common to public network and maybe a storage network and some other networks are like and it's not possible to express that in the pod specification. Right, and the handling was, it was, it was an ugly implementation because we had, because we wanted to support both workloads, containers, and VMs. We had to add many conditionals to make sure that containers were running and VMs. So, especially the representation was an issue because the pod is after all not as expressive as a pet VM. So, we took some time, thought about it, and did another try. And that's Qvert. So, Qvert is, is another effort or this time we take it serious and we learned with all the stuff we did wrong in the, in the past. And we position Qvert as a add-on to Kubernetes. So, there are already a couple of add-ons to Kubernetes like SkyDNS, for example, which you can deploy on Kubernetes as, native Kubernetes applications, which cause their self-contained and you can just use Kubernetes entities like deployments to the, deploy them on Kubernetes itself. And Qvert is, is not about modifying Kubernetes itself. So, for example, in the technical detail then, to get the Kubelet to run VMs. But rather to find, it's three-fold. So, it's rather to find the right resources in Kubernetes. So, if you know Kubernetes, you know there are pods and there are replication controllers, there are deployments. And Qvert is about finding the entities to represent VMs and all the ecosystem we've got around it. So VMs, networks, disks, and, and identify how we relate all of these entities to each other. All of this is done, not done directly inside of Kubernetes, but as an add-on, using third party resources if you're interested. But that's just the representation side. To make Kubernetes being able to handle these new entities in Kubernetes, we then also had to come up with new controllers and demons, which handled it on the cluster side and on the demon side. I'll get to that in a moment. So, in general, it, it then looks like that. So, is there a pointer here? Oh, cool. So, you've got Kubernetes and there we've got, we wrote additional controllers which just handle VMs. So, whenever you create a VM entity, this control, oops, that's the wrong thing. There you go, you press the wrong button and all is gone, oh, yeah. Don't press the red button, Mr. President. So, of Germany. So, so we wrote additional controllers which handle the specific words, specific logic because, you know, in Kubernetes there are already controllers and they are tuned towards handling pods. So, there's a pod, there's a controller which is handling garbage collection, there's a controller handling replication control and they are not intended to handle VMs. So, they are intended for pods. So, we added our controller which is using the same approach as all the other controllers. It's, it's reactive to the VMs created in Kubernetes. It's following a declarative approach. So, it feels like it is Kubernetes but it's an add-on. It's added separately because we believe we need to focus all the words specific stuff outside of Kubernetes. That is what we deliver as an add-on. Then we have the vert-hander which is running on the cube or alongside the cubelet on the host, which is then actually responsible for bringing up the VMs. So, taking this picture here, you got your controller which is scheduling a VM onto a specific host. The vert-hander is seeing, oh, that logical VM is scheduled on my host. So, I make sure to really bring up a VM and specify it with all the details which a pet VM has. That's actually not just in theory but, oh, that is the broader concept. So, it might look simplistic for now and it's pretty abstract but it's not just abstract. So, we actually wrote the code and you can even try it. So, I would encourage you to just give it a try and look at the architecture yourself if you're a developer or try it and play with it if you're interested in seeing how you can interact with it. We've got a small demo prepared which you can look up in the slides afterwards. So, so far, what I described, so you can define a VM and we took care of VM entities or we cubered currently, I mean we are working with President Github, is in the state that we have VM definitions, a very minimalistic VM definition. We don't have disks yet and we don't have networks yet, for example, which are also essential for VMs to somehow make something useful with them. We don't have that yet but we're working on it. What you can do today is however you can still deploy cubered natively on your Kubernetes cluster and get a VM running. It's pixie booting, it will not find anything but you can still get a feeling of how you handle it, you know, how you can create a VM entity in Kubernetes and how you can delete it and that you see the VM is going away on the host again. So, there's something to try and there's some gaps and we are encouraging you to contribute in case that you're interested in the same area. But, as I said, disks, networks and scheduling, for example, we have an imprimitive way or in the Kubernetes way, is something which we want to add in the near future to make it more usable. So, to give you a nice demonstration of how it looks to boot your pet VM on Kubernetes using the native formats you all know from the whole libverd world and boxes and the management systems. So, that's really the near-term future but that's not where we stop because we also need to take care of other virtualization specific features. So, we said we take Kubernetes as it is and that means advanced features which we have for virtualization like migration or live migration needs to live somewhere else and that is exactly the stuff we then need to implement in our controllers and demons so that they are capable of knowing that it's running on Kubernetes and find a way how to deliver, for example, live migration. We know that live migration, you know, you need to have a shared, if you don't do block live migration, if you have live migration then you need to have a shared volume which is accessible on both the source and the destination. And, these are difficulties currently with Kubernetes. So, to enable ourselves to deliver this live migration Kubernetes when looking at bringing live migration to Kubernetes but rather to see that we get Kubernetes to close the gaps we need or we have to deliver live migration on top of Kubernetes. Other stuff we want to provide is like templating that you have VM templates. We want to provide, oh, there are a lot of details so I could go on there for that. A host API so that you can do, I don't know, host PCI path through, you know, we live in a container now on a host so we need to respect that the host might be used by other parties as well. So, we need to find different solutions than we have today. Besides of that and that's actually another benefit of this approach is we also need to work with the Kubernetes community to improve the areas where we leverage Kubernetes extensions to provide our functionality. So, TPRs for example. We are using TPRs for third-party extension resources extensively but Kubernetes has difficulties in these areas so we hope to be able to work with Kubernetes to fix the gaps we have like doing custom renderings on the client side but these improvements are just not only for us but also for other parties which use TPRs. For example, OpenShift is also using TPRs for cron jobs for example. If I'm not mistaken, please correct me if I'm wrong. So, that is where we have, where Kuber is right I think because, you know, we fix the stuff in Kubernetes which is beneficial to other workloads or other parties as well. So, other parties benefit from these fixes and we just provide as a note on our functionality outside of Kubernetes. Yeah, snapshots and volume improvements are also something we need to do as you know VMs are working with with with raw devices or block devices and containers work with file systems. That's obviously a gap so we need to do some work there. We're actually already in discussion with how that can be solved. These are all gaps. We see them and that's okay. And besides that, there are also things which are not so much a technical problem but rather a conceptual problem. Like live migration I mentioned there are some requirements we have from the word side that you have shared volume but in Kubernetes there's currently no concept of being able to transfer a state which a live migration is to transfer a state from one side to the other because it just kills the pot and it's gone and you don't have the time to to really migrate the memory and all the necessary metadata over to the other side so we want to work with Kubernetes say, hey, please give us the time to complete this transition or these migration before killing that pod. There are already some smaller knobs. They might not be sufficient so we need to find solutions there. So, it's not all easy but there are also gaps. Besides that, I think all in all at the bottom line there are a lot of opportunities so I think especially if we manage to beef up the infrastructure of Kubernetes to meet our requirements and the requirements of containers because they will benefit from it as well, then it's a win for both sides. For us because we have one common layer with Kubernetes providing the infrastructure and us on top providing the virtualization functionality which looks to the outsider a native Kubernetes property. Yeah. So, are there any questions? If not, then I would really like to invite you tomorrow to join our deaf room in C236 where we will be hanging around drinking water on cold drinks and yeah, speak about Kuvert how some of these problems I mentioned can be solved and maybe you've got some thought as well what is missing or what we could be doing. So, questions? If not, ping me on IRC or anywhere else? Sure, so the question was if I could explain what the VertHandler does. So, the VertHandler, I'm going back to the slide just so you have it in front of you, there we are. The VertHandler is down here. So, the VertHandler is doing all the host side of logic to which is necessary to launch them. I mean, after all, the VM is running on a host and the VM needs to be connected to the local storage. So, the VertHandler is responsible for finding the right path of the block device and attached to the VM. The VertHandler is responsible for making sure that the right NIC, which you logically define in Kubernetes by the way, so we assume that Kubernetes will get multiple network support at some point and we are actively working on that. So, if you have these logical networks that the right network, the right the right physical virtual NIC is getting attached to the virtual NIC of the, that the right interface of the kernel is getting attached to the right virtual NIC of the VM. So, doing all that host stuff. Numa, for example, is also something which is important there. And, for example, providing the entry point for live migration where you need a direct access from host to host. Yes, please. You get the point that you mentioned that it's working with liver. Oh, yes. We should highlight. So, currently, because we don't want to reinvent the wheel again and spend 90% of our time on bringing up an abstraction layer for different virtualization systems on the host, we decided to go with livered. So currently in our demo livered is running alongside the vert handler on the host to really do the heartlifting of the VM. What the vert handler is, it is doing the mapping from the logical concepts we have in Kubernetes onto the livered layer. So, livered. Yeah. So, the question was is livered running on the host on a container? And we put livered into a container. We softened it. So, it's running in a contained environment. So, it's obviously, it has more privileges. We turn off some namespaces and but run in a container. Why do we do that? Because we said we wanted to be a native Kubernetes application. And that is, the red line is that you need to make sure that you don't have to touch anything on the host. And that's why we decided to chip in a container because now you can run a single command against Kubernetes and our demo, our work in progress is just working. You don't have to touch your host at all. And it's been tested on Ubuntu. So, that's also pretty nice. Is anybody of you working on a similar approach? No, okay. Thank you very much. I'm done.