 Hello. Hey, everyone. Thanks for joining my talk on using VGPUs in OpenStack. This is going to be a speed run through a lot of the things I wish I would have known when we first started using or implementing VGPUs for our platform. And so at the end of the talk, there will be a link to download the slides as well as a QR code, because there's a lot of stuff in here, and we'll be moving pretty quickly. So my name is Jacob. I'm the principal engineer for OpenMetal. And at OpenMetal, we are primarily an infrastructure services company. And so our goal is to lower the barrier of entry for any size companies to start using OpenStack, Ceph, and any other open source cloud systems. And that includes using GPUs and any other services they need to use. So quickly, what is a VGPU? It's a virtualized portion of a physical GPU, similar to a VGPU, but you can't oversubscribe them like you can for VGPUs. So it allows one or more virtual machines to share a single physical GPU, as well as easier to manage resources and share them between different. And so this talk is primarily going to focus on the NVIDIA side of things. This is the most mature, is the most well-supported. I think AMD and Intel will be coming out with something soon. But they're primarily the ones that this is going to focus on for now. We're going to cover a couple of use cases and examples. And when you probably don't want to use VGPUs or they might not make sense. As well as some hardware requirements, how to actually configure the BIOS to make things work, as well as on the kernel and OS side of things. And then we're going to be setting up the host node, making sure the drivers are set up, and then go into actually configuring Nova. And then finally, actually spinning up a VCPU. So quickly, some use cases. Number one would be like CI or transient tasks, where you can spin up an instance from a P-built image using something that requires like CUDA or optics. And then you can potentially have multiple persistent workers. And then go back through, and once it's torn down, those resources will be available to use again. The second example will be like rendering or graphical tasks. One thing to keep in mind about this is that depending on the GPU that you use, there are different capabilities. So there's a matrix on the NVIDIA website that goes through which ones can support rendering workloads, which ones are compute only. For OpenMetal, we primarily went with the A100, which is compute only for now. So that's what we're primarily going to cover here. And then, when you may not want to use VGPUs, so anything that requires a high amount of VRAM, so like a large machine learning training task, or other long live applications where you can control everything inside of your application easier, as well as video transcoding. Some of the different flavors that you can use with VGPUs don't include a lot of the actual encoding cores. So something to keep in mind when you're looking through if you're trying to implement this yourself. All right, so hardware setup. Obviously, we'll need a GPU that has support for each GPUs. And this is primarily the NVIDIA Enterprise series, like the Ampere, the L-series grid, and some of the RTX 6 and 8000 GPUs. And the BIOS settings are pretty critical here. So this is something that caused me a lot of pain when I was trying to set things up. So there's a list on the right-hand side here of things that we had to do, at least on our, this is on a super microserver. So basically everything that's related to SRIOV and IOMMU, as well as all the supporting options there, like ACS, AER, if you, one of the issues I had is if the AER wasn't enabled, I would still get most of the stuff to work, but it would give me very confusing errors, like the IOMMU groups not incrementing. So just keep that in mind to make sure all the options are enabled if you're gonna try this. All right, and then next, the actual setting up the GPU portion with driver. On the kernel side of things, you wanna make sure IOMMU is enabled. This is usually done through a kernel boot flag for Intel or AMD. And then you don't have to use DKMS, but it definitely makes things easier. And then of course you need to actually install the host driver. So on the NVIDIA side, this is done through logging into their portal, grabbing the Linux KVM package, which includes all of the different drivers for hosts and guests. You may also need to blacklist the Nuevo driver. If you're building like golden images, there's a, at least for us, we had to use the Nodeer M flag. This, it doesn't really matter for instances that aren't gonna be using rendering, but it makes our build process a lot easier. And then just make sure that the associated services are enabled and running once you reboot. And then obviously reboot so you can enable all of your options for IOMMU and the drivers. After that, once you come back up, there is this command the SRIOV manage. The way that we do it for our customers, if we just put this inside of a system D one time service that starts up on boot, so it handles that for them. What this does is it spawns all of the virtual functions through SRIOV. You can see in the screenshot over there where it shows all of the devices being registered and then the NVIDIA driver actually enabling them. So once that's done, you should be able to check it. Using, you should see 16 or 32 depending on the GPU, different virtual functions that pop up in your PCI device list. And you'll see all of those messages through Dmessage or wherever your kernel log is going. And then finally, you can look through the MDEV types over there and you can see the different types. This will come up later because these NVIDIA dash, whatever numbers, what we're gonna be using with Nova to actually spawn up the VGPUs. All right, and we'll quickly cover some of the VGPU profiles. So there's primarily two types, the temporal or timeshared as well as MIG. Timeshared has a slight benefit in that it can, it supports more different types of workloads so it supports rendering as well as compute and a few other things. The downside though is that if you have multiple users using that system, they can potentially see differing amounts of, you know, it's not gonna be a consistent experience for them whereas MIG, they get dedicated resources as well as dedicated pass on the hardware. So they are different profiles for each depending on what you wanna use. The right hand side of the screen has like this huge matrix. I think this is just for the A100. So it can be pretty daunting to try and figure out which profile you wanna use. You can see some of it through the NVIDIA SMI tool by listing the different profiles or you can go on the NVIDIA site and they have a huge PDF with all the different profiles for all the different VGPUs or all the different physical GPUs and it differs depending on your GPU you're using. So in our example here, we're gonna use a A100-3-20 which I believe is three gigs and then 20 compute units. And so these are just the commands to enable MIG which you have to do every time the machine starts up or the GPU is reset. So that's something to take into account and then just some commands to list out the profiles and create them. All right, and then finally configuring Nova. This step is pretty straightforward once everything else is set up. So I mentioned the MDEV pass earlier so you just wanna, the easiest way is just to look through there and find or grep through there and find which types have the available instances equals one. There's a grep command that you can do that. You can also use the MDEV CTL tool which makes it a little bit easier but basically what you wanna do is figure out which one of those NVIDIA-4 whatever numbers has your VGPUs and then you can put that into Nova. So depending on which release of OpenStack you're running, Wallaby and Neewer will be using the enabled MDEV types and then Victorian below will use the VGPU types. So something to keep in mind depending on which release you're running and then of course once you make those changes restart Nova Compute. Once that's done, if Nova has detected everything correctly you should be able to use placement to see any of the different candidates. Typically what I see this do is it'll list out 16 different candidates I believe for each of the different virtual functions even though you can't actually use all of those. So I'm not sure if this is an issue or not at least when we tested it on yoga. So one way around this is to just set custom traits on the different providers which is the example that I showed here. We just created a custom trait type and this is also one way you could potentially have different VGPU types on different servers and still create instance flavors with them provision the correct place. All right and then finally provisioning a VM with a VGPU. So once your flavors are done hopefully everything is working inside of placement with giving you actual candidates for provisioning you can finally provision an instance with VGPU resource and so this is done with the resources colon VGPU. I haven't tested it with multiple VGPUs I'm not actually sure if that works but typically we do use one. You can do this through like your images through your flavors or just ad hoc which is being showed here. Once that's done, once your VM is up you would need to install the actual guest driver and this is a special driver similar to the other one you get from the enterprise portal at NVIDIA and also make sure there's a special damon that needs to be running for licensing which I'm not gonna cover here but essentially there needs to be a license for the type of workload you're running compute et cetera and then essentially that's just configuring a file to point to the web endpoint that it's using so basically once that's done you need to inside of the guest create the compute instance and then verify it's working and after that you should be able to run your CUDA or optics or whatever workloads that you need to do so this is this screenshot on the right is just showing this inside of the guest container and so typically at OpenMetal what we do is we pre-create these images for the customers and then they can choose that image inside of OpenStack and it'll have the driver and everything else pre-configured for them so it's ready to go. But yeah, I think that's it. We have a couple minutes to go. This is the link to the slide deck if you need anything from there as well as the URL. Yeah, thank you. I think we have a couple of minutes if anybody has any questions. Cool, thank you guys.