 I think you can hear me. Yeah. So good morning, everyone. My name is Kashyap Chamruti and I work as part of Rahats Cloud Engineering team. And I spend time focusing on integrating KVM-based virtualization components into OpenStack Nova. So I split my time between multiple communities. So what's the motivation for this talk? As most of us here now, since the beginning of this year, there have been a flurry of disclosures involving site channel attacks. And the disclosures kept coming in all the way until earlier this month. The most recent one being Portsmash, all the fancy names. So that impacts hyperthreading, processors with hyperthreading. And this is made choosing CPU models for your virtual machines a bit more cumbersome. So this talk aims to provide some sense of clarity, hopefully, about how do you go about arriving at an optimal virtual CPU configuration. So before we proceed, a quick note on what we won't cover in this talk. So we won't be touching any of the internals about site channel attacks. Meltdown and Spectre, or how to exploit them, or any of the detailed performance implications involving these. So for those, there have been several talks throughout this year, beginning from Fastem and kernel recipes, and last month in Edinburgh at KVM Forum and Open Source Summit. So I refer you to check out those talks if you're more interested in the gory stuff there. OK, a bit of a refresher. Most of you here might be aware of this, but for those who are not, a bit of a refresher on the KVM-based virtualization stack. So the bottom most layer is the Linux kernel with the KVM module sitting in there. And top of it is the QMI process, which sits along with your other processes on the host. And QMI has its disk images associated with it, and it interacts with the KVM via system calls. However, QMI scope is quite limited. So it has a per process view, only it knows about just one QMI instance. So that's where the LibWord project comes in, that provides a hypervisor agnostic API to manage multiple virtual machines and the lifecycle associated with that. And LibWord interacts with QMI via a protocol called QMP. It's a QMI monitor protocol. It's a JSON-RPC-based mechanism. And top of it is OpenStack Nova that has its own virtualization drivers, KVM, Zen, several of them. So it interacts with LibWordD and then launches virtual machines on your compute nodes. And then there's external tools like LibGest Office, which provide a set of neat utilities to be able to examine your disk images via LibGest Office's its own custom uplands. It's just a QMI process. So I should mention here that LibGest Office has some safety controls in place so that no two processes can write to the same disk image. OK, when you launch your Nova instance on compute nodes, most of the heavy lifting under the hood is done by QMI and KVM, assuming you're using word type as KVM or QMI in your Nova configuration. So QMI provides a lot of emulated devices. Some say too many of them. And it also has the guest memory M mapped into it. And it provides several kinds of SCSI controllers, network cards, graphic displays, and so on and so forth. KVM, the kernel module, it has two vendor-specific modules, Intel and AMD. And it does a bunch of things. For example, one is to it safely executes the guest code directly on your hardware through the hardware extensions provided by processors from Intel or AMD. They all have virtualization extensions. They work similarly. So KVM handles the switching between guest and the host mode. And KVM itself does some emulation of devices in the kernel. Things like clock or a CPU ID instruction so that it doesn't have to do a heavy weight exit all the way to QMU. We'll see what that is in a second. So since QMU is just another process on your host, you can use your standard Linux tools like PS or task set, kill, et cetera, to manage or examine, inspect your QMU and novel-based QMU instances. So this is the classic guest execution loop involving KVM. So on your right mouse, you see QMU. It sets up self-system calls asking QMU to a KVM to create a vCPU, get the file descriptor back, and then it runs the iArch tool called KVM run. At that time, KVM will prepare to execute guest code directly on the hardware via so-called guest mode that is introduced by the virtualization extensions from Intel or AMD. And at that point, guest code is executing happily on the hardware until a point where the hardware cannot handle a certain device emulation. At that point, it asks, hey, can you please handle that for me, KVM or kernel? If KVM can handle that, for example, the CPU ID emulation, it will emulate in kernel and then, again, starts to execute guest code, prepares to execute guest code directly on the hardware. And that continues. However, if KVM cannot execute or emulate a particular device, then it asks QMU to handle that. And the loop continues. So QMU handles it, and the loop goes on. So going all the way to QMU is called the heavy weight exit. And if kernel itself can handle that, it's so-called lightweight exit. So that's the introduction. So let's see some interfaces that QMU and Libre provide to configure virtual machines. But before we go there, a bit of motivation why we should see them. So the default guest CPU models provided by Libre, actually not Libre, it's QMU, they're designed to work on any host CPU, but they're not really the ideal choice in your production environment. When I say work on any host CPU, that means that you don't have to do any compatibility checks that it will work on for live migration, things like that. But the default CPU models are really awful, because they lack several CPU instructions that are critical for initializing entropy for your guests, for example. Or more obscure flags like PCID that have become critical for performance and security, thanks to Meldown. Or the AES instruction set that is critical to have your TLS to be very performant. So thankfully, Nova doesn't use these things, because before we go there, before we see what Nova does. And the output you see there is from a virtual machine that is launched with the default QMU64 CPU model. And then if I traverse to the CISFS directory of dash CPU vulnerabilities there, if I graph for what all mitigations have in place, you still see that it's vulnerable to a variant of Spectre. So the point being that you don't want to use default CPU models absolutely at all if you care about little on performance and security. So always try to use an explicit CPU model if you know your CPUs, or use the default provided by Nova, which is the host model, and which we'll see in a bit what that exactly is. So what about the defaults of other architectures? We saw a default of x86. So on the AX64 ARM ecosystem, they don't, sensibly enough, provide a default CPU. But the default depends on what kind of machine type you configure. So think of machine type as a virtual chipset that provides certain devices. We'll get to the topic of machine type briefly in a later section. But for now, think of it as a virtual chipset. So there's different kinds of boards that ARM emulates. So depending on that, the default CPU is configured. And other architectures like S29DX and PPC have their own defaults based on whether you're using pure software emulation or hardware accelerated drivers like KVM. OK, the most simple interface for configuring these CPUs is the command line. And on QMU, if you don't do anything, if you don't provide a CPU option, then the default is QMU64. As we've seen, we don't want that. So we want to provide an explicit CPU model for that. The simple thing is to provide the CPU command line. And followed by that, a particular named model like IVBridge, Haswell, whatever, whatever, that are supported by the QMU binary on your host. So along with providing named specific CPU models, you can also control what CPU features you can expose to your virtual machine. So in this variant, you see the Skylake variant of a CPU, in this example. And followed by that, a specific set of features we are turning on and off. So if you want to find out what are the supported models on your host, you can refer to QMU's Help or Libert's command line tool Varsh CPU models. And followed by the architecture that you're running. So what about runtime interfaces? So QMU provides several runtime interfaces that Libert uses at its Daemon launch time so that it can cache those capabilities because the capabilities won't change for a given QMU version. So it makes sense to cache them at the start of Daemon start-up time. And then you don't have to do that again and again. So there's a bunch of interfaces. We won't delve into the details. That's out of scope. But as a quick example, here is one runtime interface to probe QMU for the details about CPU models that a given QMU binary on your host offers. So here's query CPU definitions. It provides all the models that QMU supports. And it thinks like, what are the features that are unavailable? Meaning, here you see an empty array for the unavailable features. What does that mean? It means the Vesemir CPU model will run without any modifications to it, meaning you don't have to disable or enable any features to run on a given host. However, if you see a list of features enumerated in the unavailable features section, then you have to disable them explicitly to be able to run that model on the host. And it also provides other information like whether the CPU model is migration safe. It means that QMU won't add additional CPU features or things like that so that it will break migration compatible. So in this part, let's see, what are the different kinds of CPU configuration modes? And how do you go about configuring specific CPU models and set of features? So some of you here may already know this. And I've talked to some operators who use these things. First, the host pass-through. As the name implies, it just passes through the host CPU's capabilities, CPU ID bits, actually, QMU, that's what it does behind the scenes, as it is to the guest. And that's provided by CPU-host command line on QMU. But that's taken care of for you when we use Nova's host pass-through config attribute. So this is just to show what's going on behind the scenes. However, the host pass-through is not really free of caveats. So the first thing is that neither QMU nor Libret will provide a predictable CPU for your virtual machine, for your guest. So you can't rely on having a predictable CPU there. The other thing is, if you have a mixed set of CPUs in your environment, then live migration with host pass-through is an absolute no-go. So host pass-through is quite performance. But if you have live migration as a strict requirement in your environment, like most scenarios, then it's not really recommended, assuming if you have mixed set of CPUs there. And when else can you use host pass-through? So if you're lucky enough to have a data center full of identical CPUs, then it makes sense to use host pass-through. But also, bear in mind that, along with identical CPUs, you need precise microcode, matching microcode version on your host, your source and destination host to be able to live migrate, and also identical kernel versions on your host, thanks to all the vulnerabilities that have been rolling in from the beginning of the year. So it's easy to miss that. Sometimes people may wonder, why do you even have to have microcode version matching precisely? Because you may think, if there is a microcode version 57 on source and 58 on destination host, if both of them provide identical CPU features, then it should be OK, right? No, because performance counters may differ between the versions of these two microcodes. So even though the CPU features are same, so even that can impact live migrate ability. So that's the thing that I learned from a colleague about when I was wondering, is microcode, should microcode version match as well? So there's that. See, the other kind of thing that QMU and Libre offer is the plain name-to-CPE models that we just saw, which are just vendor-specific models that are named after particular generations that Intel and AMD and others release. So from a typical Nova instances QMU log, you see the output there with QMU command line followed by a bunch of CPU features enabled or disabled. The names-CPE models are a bit more flexible in live migration than hostmaster, because you can custom prepare a CPU model and the flags in your environment so that you can have live migration compatibility across your disparate set of hosts. QMU comes in with a set of built-in default CPU models and custom CPU models, specific named versions, and the set of CPU ID flags that it recognizes. And Libvert also exposes these via its own XML representation. So you can refer to those. And the third way is this thing called host model that is a Libvert abstraction. So it tackles a few problems. One is that it tries to provide maximum set of CPU features from the host CPU to your guest, and while retaining live migration capability. But it has some caveats. We'll see what the caveat is in a minute. And the third thing that host model provides is that assuming you have the right set of updates on your host, the microcode, kernel, et cetera, and so forth, it will add the critical CPU flags to mitigate your guests from vulnerabilities, some of the maildown spectra variant, like the spec CTRL flag is the example there, automatically when using host model. So that's a nice thing that it does. So host model tries to provide the best of both the host pass through and the named CPU models that we saw. And for better or worse, I think it's for the better. It's the default of Nova, at least Nova's Libvert driver. When I say default of Nova, I'm referring to Libvert KVM QM drivers. That's a simple example that you see from a Nova guest definition. How do you enable host model? You just mentioned the CPU mode as host model. And optionally, you can enable or disable a specific set of features. And Libvert will translate that into a certain suitable CPU model based on its own internal representation in XML. So what about the caveat that I mentioned about host model? So with line migration, when you're using host model and trying to do line migration, what happens there is the source guest CPU definition is transferred as it is to the target host. And once the guest is migrated, it will see the identical CPU that it saw on the source host, regardless of the capabilities that the target has, even though they are more capable. But when the guest call reboots, meaning you do an explicit stop followed by a start, it can pick up extra CPU features. Because if the processor is a newer version on the target host, then host model will pick up those features. Because recall that host model tries to provide maximum set of features from the host. So that will prevent you from migrating back to the original source host. So if line migration in both directions is an absolute requirement for you, then host model may not be the best option. So if that's not a requirement, then you can happily go with the host model. What about Nova? Nova provides configuration interfaces to drive all these things that we were talking about. So it provides a set of configuration attributes for the Libvert driver. So one is the CPU mode. It's straightforward. It can be any of the three modes. Host pass-through, the first one we talked about, or custom, meaning the name of the CPU models, or the host model, the Libvert invention that provides the best of both host model and provides the best of host pass-through and name CPU models. And you can explicitly provide CPU models via the CPU model config attribute and the extra features with the other one on the config attribute. So what are the possible values for that? You can refer to the help outputs of different commands out there, cumming, or the CPU map, XML file. And we've also written some documentation for Nova itself. So Nova has recommendations as to what you should use and which scenario. Not really comprehensive, but good enough to get started. So I refer to that for the Rocky release. I think that's where we introduced that. So this is a simple example from a compute node. So you just set these config attributes in the Libvert section of your node.conf saying, in this example, we're just seeing a custom named the CPU model. Hey, give me a fixed version of Ivy Bridge CPU and also enable these CPU flags to mitigate the guests from all the guests that are running on that compute node from CPU flaws. So lastly, what should we consider? What aspects do we want to consider when configuring CPU models and features? So the first thing is that if your scenario is like most any others, you have a heterogeneous set of environment with different kinds of CPUs in your data center. Here I'm just showing Intel as an example there. So you have Westmere, Broadwell, Nehalem, whatever, whatever. So how do you find a compatible CPU model among those variants? So this is where the Libvert comes in and does the heavy lifting for us. And these are the APIs that Nova uses to compare different CPU models and find a compatible model for your cloud environment with a given set of host CPUs. So the two APIs there, Comparis-CPU and Baseline-CPU, those are the two ones that Nova uses today. But recently, in 4.4 release a couple of months ago, I think, Libvert came up with a new variant of these APIs. So what is the reason for this? The older version of these APIs, Comparis-CPU and Baseline-CPU, they don't take into account what your host, the KVM and QMU, the hypervisor, is capable of. So these newer ones, hence the name Hypervisor-CPU in the API's name. That takes into account what your host hypervisor is actually capable of. So it is supposedly a bit more smarter. And I did some tests. But I want to see some more bug reports before we wire this up into Nova in a future release. So as a quick example, in this XML, I won't show a lot of XML. This is the only XML you'll see. In this example, you're seeing two CPU models in a single XML file that I call. So there's a hassle variant and there's a Skylake variant. In the hassle variant, we're asking to require a set of CPU features. And for the Skylake variant, we're asking to disable a set of CPU features. And how do you find what is the intersection there? So that's where the Liberty baseline hypervisor CPU will compute what is the intersecting set of features, what should be enabled explicitly, what should be disabled. So those are the set of features that are enabled and disabled after computing the baseline. So that's the intersecting set of features given our Skylake and hassle variants. So the point being that it's the so-called baseline CPU model that will allow line migration across your hassle and Skylake variants. So that's what Nova uses, the older version of this API. So I just wanted to show what's behind the scenes. Of course, you don't have to run this manually. So what's happening behind the scenes is sometimes instructive to see. So what else should we consider? So we briefly talked about machine types. So machine type, to refresh your game, it's a virtual chipset that provides default devices. And it has two main goals. So one is that a single CUMI binary will emulate different chipsets, for example, Intel's I440FX or more popularly called PC machine type. And the other one is a Q35, that's slightly more recent one. And the other thing, which is the most important one, is to provide a stable guest ABI. What does that mean? So it means the virtual hardware that is exposed to the guest remains identical, regardless if you upgrade your whole software, or hardware even changed that. The hardware that the guest sees remains same. So there's no change in that, or even if you line migrate. So that's a critical thing that machine types provide. Whenever there's a new CUMI release, it comes with a new machine type matching the version of that release. So here in this, you see the PC machine type. And it is alias to the latest released versioned PC machine type. And I don't know if you can see that, but it's released in 1996. So it's about 20-plus years old. So I first called it legacy, but legacy can trigger an allergic reaction in people's brains. So I used traditional. The other one is Q35. It's the so-called more recent one, but it's still quite old. If you can see the date again there, it's 2009. So nearly 10 years old. It has a set of features. I can't do justice to the machine types in one target resource. Several talks. There have been some talks at KBM Forum in the past. So the point being, the version machine types provide the so-called stable guest API to your virtual machines. So why talk about machine types? Why should we care about that? Because sometimes, machine types alter CPU features. Unfortunately, it doesn't happen very often, but sometimes it does. And changing machine types is really guest visible, meaning you can look at it as an analogy is that if you replace, rip your hardware, random hardware from your host, and plug in something else, it's equivalent to that, except in the virtual world. So machine types replacing changing machine types is guest visible. So when you upgrade QMU and using Libver, assuming you're using Libver Driver, you have to explicitly make us know about how to change the machine type. So the point of the stable guest API is that even though you upgrade your QMU, even though it's new enough and bringing new machine type version, Libver or Nova won't gratuitously update that version for you. You have to explicitly ask that so that to preserve the stable guest API. So once you ask Nova to change it, you need an explicit call reboot of the guest, and only then will it pick it up. So the main point to take home there is that before changing machine types, you might want to evaluate your guest workloads. And then, because the CPU features can differ in some cases, not all, but very few cases. So how do you go about updating to patched virtual CP models? We're talking about x86 here, obviously. So the first thing, I've talked to some operators who already did this, so I'm sure those who are here would know that. But the first step here is to update your microcode on your host kernels, on your host, and the host kernel as well, and the guest kernel. You can check the CISFS directory to see what mitigations you have in place after doing that. Once you're done with that, then you can update Libver and QMU, so they'll pick up the new CPU models. Followed by that, you tell Nova to please update me to the patched variant of whatever CP model that is relevant in your environment. The IBRs models are the ones that have the fixes for some of the Spectre or Meldon. I'm already drawing a blank, which one it is. So check those. And there's guidance over there. You see a taxi file, but you don't have to read that in QMU source. And there's a nice document written by my colleague Daniel Barangé. There's a rendered version of this, an HTML version on his blog. There's a reference at the end. So there's a bunch of details there. Before you upgrade, you might want to refer to that. So once you tell Nova to update to a relevant CP model, the last step is to you can't avoid cold rebooting the guest, no instances, so that they'll pick up the CPU ID bits, right CP ID bits. So that's the very high level problem. Maybe there are some steps in between that I'm missing, but I'm sure you'll figure out as you do them. But this is the core of things that need to keep in mind. What about CP flags? There are multiple of them. I'm not going to read them all out loud. But to mitigate some of Spectre and Meldon, there's several flags that you need in actually not a whole lot of them actually, just a few. Some of them are built into the fixed variant of CP models for QMU. And we'll pick them up as well so you can use them when you're using named CP models in your compute environment. Again, for that, refer to the blog post that was written by QMU upstream and the other one as well, the QMU CP model details that's in the QMU source itself. So what about the expectations from applications like Nova and expectations from whom? From projects like QMU and Libbird, which are doing the heavy lifting behind the scenes. So up until now, whenever there's a new CVE, QMU added a fixed CP model with a set of features in it. And Libbird tags along and adds a QL and CP model in its XML representation. But that's not really working out so well, given the flurry of CVEs that's been landing since January. So QMU and Libbird upstream have decided that they're not going to add any more new named CP models whenever CVE comes out. So instead, tools like Nova or Overt or whatever management applications, they should explicitly enable CPU features as required. So Nova today takes care of this already, so no problem for us. For other management applications that haven't provided a facility to supply granular CPU features, that's considered as broken. So there's a more interesting thread there on Libbird and QMU mailing list. So I just quoted that from there. So to kind of sum up, don't go with the default CP models and we don't even need to go when it is Nova because it has the same default. So you want, if you're lucky enough to have identical CPUs in your host environments, for maximum performance, go with Host Pastry. But if your case is like anybody else's with a mixed set of CPUs, choose Nova's default. But if you know your host CPUs inside out, then you can figure out a custom baseline and some operators already do that. Just this morning I talked to somebody on the train. So they already compute a custom baseline for their own environment. People already do that. And before changing machine types, you want to evaluate workloads. This is not like super important one, but just important enough to merit a mention. Because not all machine types change CPU features but still deserves a mention there. And finally, you want to have like a systematic procedure to go through your updates across all the relevant components and then you can do the final step of using the patched CP models and flags. I think that's pretty much it. These are the references that I mentioned, some of them. I didn't update it with the KVM forum links but you can find them on the internet over there. So that's that. So if you have any questions or comments, well, welcome. I think there's a mic over there. I'm not sure it's working, but... So under one of the first slides, you showed that Chemo 64 is available for all KPUs. And actually it's not always true because Chemo 64 includes SVM instruction which is not supported on some Intel processors. That's a historical accident. And my question was like you showed that it's possible in NovaConfig to add extra capabilities. Is it possible as well to disable them in NovaConfig? That's a good question. So no, that's a to-do item on my plate. Currently you can only enable CP features but you can't disable them because the enabling is more pressing requirement from now so that with all the fixed CP models you have to turn on specific set of features. But yes, that's not there. In NovaConfig currently today you can't disable specific set of features. So that's a bug on that for me. So you can put me on the hook for that later. Yeah, okay, thanks. Hi, I'm Francisco. Just a quick note that this extra CPU flex... Can you please come closer? Oh, sorry. So I just want to note that this extra CPU flex stuff is just relatively new so you won't have that in Newton or something like that. I can't pass you. Maybe can you say if you can't hear me? Yeah, I can hear you but it's hard to follow. Go on. Oh, so the note I have is that the extra CPU flex NovaConfig option is something appeared in NoCata. So you won't have in Newton or Liberty or some other version. We back ported it all the way back actually. At least I back ported it three releases. I forget which versions. But if you're using any of the, at least rehab versions, so it goes all the way back to them six or seven. Okay, I'm using Ubuntu, so that's probably the problem I have. Yeah, the other question I have is related to these machine types. So that's, I have the impression that the machine types are practically, I mean, on Ubuntu 16 this is using something like a default, so this PC. Is there a way to configure that from Nova? Yes, and Nova has two ways to configure machine types. One is you can configure per compute node in a config attribute. So all the Nova instances launching on that particular compute node will be given that Q35 machine type or whatever version machine type of PC. The other way is per image template. So on glance there's a metadata property. So you can use a metadata property for a template image and then use the machine type for that. So, yes. Oh, make sense. Thank you. Yeah, thanks. Is there a way to configure QMU to use a specific default without having to pass them on the command line? Sorry, is there a way to what? Let's say I want to have a specific machine type. Is it possible to configure specific machine types in QMU? By default. By default? Without having to pass them on the command line. The default is PC. Yeah, so like, I have an environment where my command line is limited in size and I'm reaching that size. And like. QMU has several different kinds of configurations with machine types. It's kind of hard to hear in this room and maybe it's just me. I don't know, maybe I'm a bit deaf. But we can talk about that. If there are any more questions, comments. Okay, thank you for the time.