 Hi, everyone is after lunch. We are going to walk you through today about hot plugs in virtual machines. My name is Eddie, I work for Red Hat on a project called Kuvert. And this is Andrea. He is also from Red Hat, he is working on a project called LiveVirt. And he knows, anyone heard about LiveVirt? Oh, everyone, you are famous. Anyone heard about Kuvert? So if you know Kuvert, then you also know Kubernetes, so we are good. Ok, so a little bit of background and context. In the beginning we just had a virtual machine and life was really, really simple, right? We had to manage that one, then we had many virtual machines on many nodes and it was getting difficult, so we invented management. And we had to manage a virtual machine and our projects that we already, maybe you know, like Overt and OpenStuck that manages virtual machines and others. And then came the containers, which are a soft virtual machine. So it's a lighter and nicer and you can run just application inside of them. And it had the same phenomenon there. We had a lot of containers, so we had to manage them as well. So then came the big players and invented Kubernetes. Kubernetes started to manage pods, which are the lowest entity there. And the pods are containing many containers, something like that. So we can consider them as containers as well. So there was just a specific implementation. And then came Kuvert and said, if we can manage the ecosystem of managing Kubernetes, it's very similar to the ecosystem of managing virtual machines. So let's put VMs in that ecosystem and put them in pods, which sounds ridiculous, and combine them both. So we'll use all the scheduling, all the nice features of management that we had on pods and the ecosystem to do the same thing for VMs. So this is Kuvert. And from then on, we will try to go into the hot plug thing. But first, we'll just expand it in order to define a virtual machine. In Kuvert, we just define a manifest, which is a specification. And the whole system creates for us the virtual machine inside the pod. It is powered, as usual, all the open stock and Overt in the past. And Kuvert as well is implementing virtual machine also using Libert and Kimo. Because you already know what it means. More rest, if you look at this slide here, we have three levels of abstraction. So we have the manifest. This is how Kuvert looks at the virtual machine. Then we have Libert, which manages the lifecycle and it's an abstraction API to KMU. And we have KMU itself, which is actually the application that emulates for us the virtual machine. This is how a manifest or virtual machine manifest looks like in Kuvert. We cannot get into the details, it's just an example here. It's declarative and the whole point of what Kubernetes is. So we got to the hot plug thing. In previously, if I take Overt as an example, we had their support for hot plugs. Can anyone, do you know why we needed hot plugs in the first place? Anyone? Four? No, not four. Why? Why do we even need it? But why is it useful for someone to just hot plug something in the middle of... If you take a physical machine and put a PCI inside of it while it is running? Yes. So I think one of the most use cases that someone wants to do a hot plug in general is to in networking, for example, you just want to connect to some other network on the fly or you want to change some network parameters that you cannot do it outside the VM, like in the external network. That's one option. Or maybe you want to add more disks to your virtual machine and all of this operation that you want, you don't want to disturb the application that runs in your guest. You don't want to shut down the VM and then power it on again, so you want to do it on the fly. It also allows you to scale things later, like you could start with something small and then maybe you find out that you need more things, like more disks, more storage, so you want to hot plug things in to get these features. This doesn't... It's not limited to interfaces or disks, it can be CPUs, it can be anything. So what are our challenges with hot plugs? There are a lot. So it starts with Kubernetes itself. So Kubernetes itself gives us... If we are talking about devices, like PCI devices, for example, the easiest one that I can give is SRV. So if I want to push in an SRV device inside, I need to first move it inside the pod, so the VM can consume it. So in Kubernetes, there is a device plugin, a component that allows us to specify a specific device and ask him to move it inside the pod, so it can be consumed. And in networking, for example, we also have another part, it's the CNI. The CNI defines, goes into the pod and the pod network in space and can configure it with all the needed tweaks to have the interface of the network inside the pod and for it to access the node. But this is a Kubernetes thing. The device plugin can be only done at the start of the pod. You cannot do it while it is running. Once the pod is already active, you cannot use the device plugin anymore. And the CNI recently, in the last, I think half a year or a year, there is now... Using multus, we can automatically... We can hot plug things into the pod while it is running. So this is a new thing, but it is hot plug. We have a way to overcome the device plugin problem, like with SRV, for example. In Kuvert, what we do is we unplug. I mean, we need migration. When we do migration in Kuvert, the destination pod is created and we can do everything else. So the device plugin can work on the target node. So what we do in Kuvert for SRV, for example, we unplug everything in the source, all the SRV interfaces, devices, and plug them on the target afterwards. It's kind of a workaround in this code plug, using migration. Now the Kuvert challenge. Kuvert challenge... I'm not going to get into this mess here, but it's... Kuvert has a lot of components and if you want to do one thing there, you will need to most likely synchronize them all. So, for example, here the request comes to a component called the VIRT API, and from there it goes to the manifest and VIRT controls, reconcile this manifest and ask the VIRT handler to start doing some privileged stuff on the node itself. For example, the VIRT handler needs to go inside the VIRT launcher that you see inside the pod and do the networking stuff, and then it reaches the VIRT launcher, which is the Kuvert representative in the pod that does all kind of things, and there the continuation of our talk today, there the domain, it touches the domain configuration, it talks with Livid in order to do whatever is needed. We are going to talk mainly on this part here, from now on. Andrea will continue that. Quick, it will not be quick probably. So, can you hear me fine? Yes, good. So, when it comes to hotplug at the liver level, by the way, if you have any questions, raise your hand. We'll have time for questions later as well. So, the problem when it gets to liver, the problem with PCI hotplug is that it requires planning. This is the case for the Q35 machine type, which is the default in Kuvert and the recommended one. You cannot just hotplug devices just willy-nilly, you need to prepare for it in advance. So, the way that this works is that you have your machine, and so the part, this part is the, we can consider part of the machine, and it cannot be anything here, cannot be hotplugged. So, you have your root bus, it's a PCI bus, it cannot be hotplugged. You can plug devices into it, but the devices will also be considered integrated devices, and so you will not be able to hotplug them or unplug them. In order to have hotplugged working, what you need to do is you need to add some additional PCI controllers called root ports. You plug those into the root bus, and those cannot be unplugged or plugged, but the devices connected to them can, and at that point you have hotplugged, which is what you want. So, here we have two devices that can be potentially unplugged at runtime. If you want to have the ability to expand your virtual machine later down the line, you just create a few spare ones, as many as you need, and then you can do hotplug. When it comes to Liverd and how it facilitates the hotplug on Q35, it is by doing a bunch of stuff for you. So, if you have this XML, which is a very simple XML that describes like a single network interface, you can take this and provide it to Liverd, and Liverd will add some other XML to it. All of the stuff in yellow is stuff that Liverd will add automatically. It's a bit complicated, so we're gonna go through it, like step-by-step. So, the first controller is PCI root bus that we were talking about in blue. On top of that, you have one root port, and then you have your device. And so, all of the stuff with address type, all of that is just information that Liverd needs to record the relationship between the various devices and controllers, and so basically the vertical lines. So, this happens automatically. You provide the device, you get the PCI controllers. So, that means that hop-back works easy, right? No, of course it's not the case. There is a problem with this, and can anyone guess? Did anyone spot the problem? Go on? Right, close. So, yeah, right, I'm gonna repeat the question. He said that there's a limited amount of slots that you can hop-plug. They could use for a hop-plug. Yes, it is correct, like more precisely, or like more to the point, Liverd can only automatically add PCI controllers for devices that it knows about, and the devices that you are going to hop-plug, by definition Liverd cannot know about ahead of time. So, it cannot automatically add the controllers for that. That's why I'm saying that you need planning. So, the question is how do we solve this? How do we manage to convince Liverd to give us all of this PCI controller goodness without it knowing the devices in advance? The solution that we have come up with is that of using placeholder devices. So, we'll have an example here. This is a standard, like very simple Qver virtual machine with just one single network interface. And this will result in Qver generating this XML, which is the same as we've seen before. Qverd will also add another interface that is marked as a placeholder. You can see here, placeholder. So, when this definition is fed into Liverd, the result is that Liverd will add a bunch of controllers. So, resulting in this PCI topology. And you can see that there are two reports, because Liverd realizes that in its room for two devices. At this point, we take this definition that Liverd has augmented with additional information and we remove the placeholder. But we don't touch any of the PCI controllers. So, now we have one empty slot, which is the goal that we had in mind. So, this virtual machine can now be booted, and it has room for plugging in one device at runtime. We have decided that four is the magic number. Like, you're gonna get four. There is no particular meaning behind this number. It's just a small number that we feel will be useful to people without being overwhelming, and we can change it later. So, for now it's four. This is what we have implemented today in Qverd. Before going with this route, we went to a number of approaches that we consider, and, ultimately, decided not to follow. So, the first one was to ask the user to manage the controllers explicitly, which is what Liverd users have to do. That is fine for Liverd, where you need to have very detailed control of the PCI topology of your virtual machine, but Qverd is a much higher level tool. So, we didn't feel like it would be asking too much of users. Users of Qverd should just be concerned about how many network devices they want, not whether those are going to be plugged in whatever kind of controller and all of the requirements that come with it. So, we rejected this idea pretty quickly. Another approach that we consider is the use of the PCI bridge controller, which is a PCI controller that looks a bit like the root port, but it has a number of slots in it, and all of them are capable of hot plug. So, it sounds like it would be a great solution for this problem. However, the slots on a PCI bridge are not PCI express, they're conventional PCI, and Liverd will not use them by default on a Q35 machine type. You can convince Liverd to use them, but it basically requires you allocating all of the addresses manually. So, Qverd would have to get into the business of picking the PCI addresses for all of the devices, which is extremely complicated. Qverd, understandably, doesn't want to get into the business of doing that when Liverd already has all of this logic implemented. So, and then plus the devices would not show up as PCI express inside the guest. So, there's a number of drawbacks we rejected this option as well. Another option that we consider was to, instead of just saying four, just to making, allowing the user to specify exactly how many placeholder they wanted to have. This is actually what we implemented at first, and then we decided that most user would not have to worry about this. We didn't want them to need to worry about it. So, we scrapped the interface in just hardcoded four. This is up for debate. Maybe we will change our minds. We will see. In terms of future work, this has been merged, as I mentioned. So, this works today in Qverd. One thing that we could do in the future is to make this general concept of using placeholder devices to create PCI slots for hotplug. Could be extended to other kinds of PCI devices. The first example that comes to mind is disks, as the most obvious one. Today, Qverd implements hotplug for disks. The way they do it is through the use of the Virtayo SCSI controller, which works fine. But there are some drawbacks to it, as well as some advantages. So, it's kind of a toss-up. Thank you. So, maybe we could have these extended to disks and make it possible for the user to choose to use Virtayo block instead of Virtayo SCSI. It's interesting, we will explore it and see what comes out of it. Another idea is to use a much larger number of PCI slots, like 32, to sort of match what you would get out of the box on the PC machine type. This sounds like a good idea. There are, however, some drawbacks in terms of resource usage. Every time you add a PCI controller to your virtual machine, you affect negatively the memory usage, the boot time for the guest operating system. So, maybe four is enough and 32 would be too much. But maybe the overhead is not big enough that it matters in the context of Qverd and a number like 32 would give enough headroom that most people will never have to worry about it while not having such a big impact on performance. So, it could be a good development. Again, we're going to explore this and see what comes out of it. And this is the end of the presentation. So, any questions? So, the question is about memory ballooning. Memory ballooning is a completely different topic because the memory balloon is a PCI device but it's a device that you provide to the virtual machine and then the ballooning doesn't happen through plugging in and plugging out PCI devices. You just inflate and deflate the balloon. So, you can create the balloon when you define the machine and it will just be there throughout the lifetime of the virtual machine as you inflate it and deflate it. So, you have to do zero of this shenanigans. Yes, please. So, the question is, can you change the number of PCI slots when you do migrations? Not as implemented today. Theoretically, you could do it. You could have a migration hook that changes the... You could get pretty deep into the inner workings of PCI topology and you get a very low level. So, you're basically on your own, kind of. But you can do it. Question, is it a limitation of Q-Vert or Live-Vert? This is a limitation, I believe, of the PCI spec. You cannot hot plug root ports at the QME level and as far as I understand, this is because of the way PCI works. The number of PCI controllers is detected by the guest operating system at boot time. And once you are inside the guest operating system, it is capable of detecting that new devices are attached to an existing controller, so you cannot figure out that there is a new controller coming in. So, PCI spec, as far as I... I could be wrong about this. I'm not 100% sure, but I think it is correct. What about CPU hot plug? Again, like ballooning, completely different topic because the CPUs are not handled by plugging in PCI devices. We are aware of some work that is happening with regards to enabling CPU hot plug in Q-Vert. So, we know that it's happening, but neither of us knows the details. So, I'm sorry. But you can search, probably, find the open merge request or pull request or maybe it's been merged already. I don't know. It's merged. OK. So, CPU hot plug is a thing. Just not this thing. Yes. What about making the PCI bridge I think that was raised at some point as why don't we do that? I don't know the answer. There are various PCI controllers. I think that... OK. I'm probably misremember details, but I think the idea behind all of that was that the conventional PCI and PCI Express, although they share most of their name, are actually extremely different technologies. And so, PCI is like a bus-based topology, whereas PCI Express is a point-to-point topology. Technology. OK. And so, what you would get by having a PCI bridge that is PCI Express, it would ultimately be 32 PCI reports in a row and nothing more than that. In that sense, this idea would be the idea of having 32 PCI reports in every virtual machine instead of four. Right? So, in a sense, we're considering. It's just different controller. So, the question is whether these limitations are inherent to Qvert, or since they are in Liver, they are shared by any virtualization option built on Livert. And the answer is yes. They are shared by any virtualization solution built on Livert and QMU. And, again, I think anything that uses PCI has this limitation because it's not really an implementation problem in Liver. In Liver we are exposing the limitation in QMU and my understanding is that QMU is simply complying with the limitations of the spec. So, as long as you're still on PCI, this is what you have to deal with. So, we are planning ahead of time. I think there was a question there. The question is how do you plug a new volume inside a running pod, and I'm not the good person to answer that, so I will pass it over to Ed. And not even Ed. So, apologies. He's a network guy, mostly. So, we know it's possible to do it. We just don't know the details of it. I know a bit how it works at the Liver level, but not at the pod level. Sorry. So, the comment from Ed is that it's challenging. You want me? No, maybe you will translate it. Sure. Right. So, basically, as I was mentioning earlier with the PCI controllers, there's an escape hatch for all of this stuff. In Qiver you can expose a domain. It's a migration hook. It's just for migration. It's just a sidecar. So, you can do sort of arbitrary transformation in XML, inject your own custom things. Of course, done. If you break something, you get to keep all of the pieces. So, ideally, you will not need to do it, but the option is there in case you have no other options. I think we might be good unless... Let's call. Any last minute questions? No. OK.