 Hi everyone, thank you for joining us and welcome to Open InfraLive. The Open Infra Foundation's hour long interactive show sharing production case studies, open source demos, industry conversations and the latest updates from the global infrastructure community. We're live here on Thursdays at 1500 UTC streaming on YouTube, LinkedIn and Facebook. My name is Jimmy MacArthur from the Open Infra Foundation and I'll be your host today. As mentioned, this is live and we'll be saving some time at the end of the episode for Q&A. So please drop all your questions into the comment section regardless of the platform you're on and we will answer as many as we can after the presentations. Now onto the show. We're seeing a dramatic transformation of the data center thanks to advances in AI, HPC and increasingly complex workloads. As Open Infrastructure adapts to incorporate everything from VP to use to smart nicks, acceleration management is a necessary step to help improve performance and efficiency in the data center. Today we're going to be talking about the ways OpenSec projects Cyborg and Nova are rising to tackle these challenges and then we're going to dive into a real world use case. So we're going to kick it off with Alex Song and Bryn Jang, both from Innsbruh, a Gold member of the Open Infra Foundation and Lee Lu with Snowflake. Let's go ahead and get started. Alex, Bryn and Lee, I'll turn it over to you. Okay, let's begin. Please open your camera. Okay, please start. So let's begin today's presentation. My name is Alex Song and I'm the PTR of Cyborg. I come from Innsbruh Group, China. Today my colleagues Bryn and my friend Lee will all together share Cyborg project and VGPU management. The content will include Cyborg project and GPU management by Nova and GPU management by Cyborg. And the last is Nova VS Cyborg. Cyborg is an OpenSec project aimed to provide a general purpose management framework for acceleration resources such as GPU, IPDA, MME, SSD, Spatnik and so on. Starting from micro version 2.82, a user, Nova support creating servers with accelerators provisioned with Cyborg service, which provides lifecycle management of accelerators. From the picture on the right is the relationships between Cyborg, Nova and placement. First, Cyborg will report the discovery of accelerators of GPU, IPDA and MME, SSD and the report resources to placement. Then Nova will schedule devices from placement. Then Cyborg will attach and detach devices for Nova. Next. So this is Cyborg architecture. Cyborg mainly contains three services, Cyborg API, Cyborg Conductor and Cyborg agent. The Cyborg API provides interfaces for communicating with other components. Cyborg Conductor connects the database to store device info, then report data to placement for scheduler. Cyborg agents run on nodes that have accelerators and for period discovery accelerators and store data info to Cyborg Conductor. The picture on the right shows the service deployment. Cyborg DB and Cyborg Conductor and Cyborg API will deploy it on the controller node and Cyborg agent. Cyborg general driver will deploy it on computer node. Next. This is Cyborg report device steps. So we should first set the device driver in the Cyborg agent group. We should set the enable drivers with Inspire IPDA driver, Inter-IPG driver and Navidad GPU driver. Then get device info. Cyborg agent period. About one minute, discovery accelerators info of GPU, IPDA, SSD and so on. Then the device info. Cyborg agent sends device info to Cyborg Conductor. Then Cyborg Conductor receives device info and writes changes to the database. Cyborg Conductor sends device info to placement for VM scheduling. The picture on the right shows the Inspire driver will discover the Inspire IPDA device and the Inter driver will discover the Inter IPDA device. For Navidad GPU, we will have two drivers about the PGPU, VGPU driver and the MIG driver. The PGPU and VGPU driver will discover the physical GPU and virtual GPU. And the MIG driver will discover the MIG devices. Next. This is the PGPU management by NOAA. So we should first configure NOAA computer by specific PCI address. In the PCI groups, we should set the password or white list with address and the list of addresses. If using vendor and product ID, all PCI devices match the vendor ID and product ID are added to the pool of PCI devices available for pass-through to VMs. So in the PCI groups, the pass-through white list will be set by vendor ID, product ID, or the list of vendor ID and product ID. Then we should configure NOAA scheduler in the filter scheduler group. We should set the PCI pass-through filter in the enable filters. Then we configure NOAA API. In the PCI groups, we should set the alias with vendor ID, product ID, and address name. All the list of vendor ID, product ID, and address name. Then we should configure Flavor with OpenStack Flavor set GPU property. The PCI pass-through alias with alias name on the counter of devices. Finally, we should create instances with GPU devices. I recommend OpenStack, OpenStack Starvore create Starvore boot with immediate Flavors and instance name. Next, this is a VGPU managed by NOAA. We should enable the GPU type for NOAA computer. In a device, enable the M-Device type. We specify a specific GPU type in device group. We instead enable the M-Device type with David35. If you want to support more than a single GPU type, you need to provide a separate configuration section for each device. In the device group, we can start the enable M-Device type with David65 and David36. We should set M-Device David35 with device addresses and set M-Device David36 with different device addresses. The device addresses is the GPU PCIe address. One M-Device type can have more than one PCIe address. One PCIe address only belongs to one M-Device type. From the configuration, we can see that the 85 GPU will use the same David35 VGPU type. As done, we should configure Flavor with OpenStack Flavor set with VGPU property. Results of VGPU with the counter is one. Create an instance with the virtual GPU devices by OpenStack server creator with Flavor and the instance name. If we create an instance with a specific GPU type, we can define the custom traits that will correspond for each GPU type. By OpenStack trait creator, custom navid-eleven will add the corresponding trait to the resource provider matching the GPU. By OpenStack resource provider trait set, custom navid-eleven and resource provider ID. Then we manage Flavor to add the request trait. By OpenStack Flavor set property, the trait custom navid-eleven is required. Then we can create a VGPU VM with defined trait, custom navid-eleven. Next. So this is the main management by NOAA. First, we should simulate the GPU A180GB PCI address is 84, for example. Now into the GPU instance for different size with 1G, 10GB, 3G and 10G. Then add the device enable M-Device type for a specified GPU type. In the device groups, we should set the enable M-Device type with navid-699, navid-700. So M-Device navid-699 with device address and M-Device navid-700 with different device addresses. But the M-Device types all come from one GPU device. If you have a second GPU A180GB PCI address is 3B on the same host and split it into 1G and 10GB. You need to provide a separate configuration section for each device. So the device group will be set with navid-699, navid-700 and navid-710 and 2. M-Device navid, we should set the different M-Device type with different device addresses. The create a big VM with similar with the VGPU. We should set the flavor and then create a VM with a set flavor. Then we'll create a big VM. Next is the PGPU management by Cyberg. The PGPU management by Cyberg is easier than NOVA. We just set the agent-seller-device driver in the agent group with enable driver of navid-GPU driver. Navid-GPU driver automatically discover all GPUs with different product IDs and addresses by the commander, RSVCR group, 3B controller group, navid-vendor ID, 10db. In default, the driver will discover all GPU devices. We have the plan to add a disabled enabled device interfaces to make the device available for VM. The link is the blueprint to complete. Next, then we create a device profile. The device profile is the combination of different types of accelerators to request to create a VM-like flavor. The group of device profiles may like result PGPU with account 1 and treat customer GPU navid. It means navid-device and treat customer GPU product ID 1db6, which requires 1 navid-tester V100 GPU. All the list of the PGPU resource and treats will require the 1 navid-tester V100 and 2 AA100 GPUs. Then configure a flavor with an overstack flavor set GPU property, celebrate her device profile with the profile name, and create instance with the GPU device by the commander of overstack server create with the flavor GPU unimmediated and instance name. The picture on the top is the accelerator and the device profile help. I can see the name and the group is needed for the device profile. So next, this is a VGPU management by Cyborg. We first should set the device driver in agent group with navid-gpu driver. VGPU discovery driver is navid-gpu driver same as PGPU. By the commander RS system bus PCI devices PCI address and device support types. We can get all VGPU types of GPU. In default, we set the VGPU type with the first item from device support types with support changes by Cyborg API and simultaneously acute period discovery task to report a new data. Every GPU can be set a different VGPU type. So create the device profile with overstack server. The device profile create VGPU-DP with resource VGPU account is one and the trait of custom GPU product ID 1db6 v100d and 1q and trait of custom GPU navid means the navid device. As the device profile required, it means the device profile required one navid-tosla-v100 VGPU with one GB memory. As normal only support allocated one VGPU for VM. So from Cyborg side, we also set resource VGPU to one constantly. Then configure flavor and create the VGPU VM is same as a PCPU. This is the main measurement by Cyborg. So the main feature is navid-tosla-v100d GPU new feature is better than the VGPU feature. So first set the device driver in agent group. The enable driver is navid-mig driver. MIG discovery driver is navid-mig driver is similar with smart link as it has also has PF and VFs. So by command RS or system bus PCI device and PCI address are going to be word after we can get all VFs of the GPU. So in default, we discover GPU A180 GB into 1G 10 GB multiple 7. I set the VGPU type with 1G 10 GB multiple 7. We enumerate 10 speed combination by several API and simultaneously it's killed the PRD discovery task to report the change of data. So create a device profile of stack accelerator. Device profile create a MIG DP with resource VGPU and contains VGPU is one and trigger is custom GPU product ID 20B7 and A100D340C are required. And the custom GPU navid is also the navid device required. This device profile required one navid test are A100 MIG with 3G and 40 GB memory. Of course VGPU also we set one constant. Then configure the flavor and create a MIG VM as same as the VGPU. So next. This is the NOVA and several interactions. So if we create a VM with GPU by several. So the interaction will be NOVA API first called Cyborg to get back the device profile. Request group in the device profile are added to the request select. Then NOVA scheduler invoke the placement and get a list of location candidate. NOVA Conductor calls the Cyborg API to create a set of unbound ARQs for the device profile and then return them to NOVA. NOVA matches each ARQ to the result provider picked for jet accelerator. The NOVA computer calls Cyborg to bind the ARQ with the host label device R address provider UID and the instance UID. This is a synchronized code which you prepare the reconfigured device. Cyborg uncompleted completion of the binding successful source of a bound or ARS field bound called NOVA to send an event. The NOVA computer manager wait for the notification then called Cyborg to get the result of the ARQs. Then the NOVA labor driver use the bound ARQ returned from Cyborg then called the code to compress the PCI password device into the VMs definition. If there is any error of the binding has been needed, NOVA must unbound the related relevant ARQs by calling Cyborg API. It will be then retried on another host to delete the unbound ARQs for instance. This is our product shows of the page. The first is the device list. The list of device pages is displayed. The ID, name, result pool, and host in use type, VGP type. Under product ID, the instance UID bounded of the accelerator. If the type of PGP, the VGP type is not. If the type is the VGP, the VGP type will display the name, memory, and the content. So the name of the N181GB means every VGP have the 1GB memory. The nav is 799, 10GB multiple storage. Nav is 702. 40GB means there are 3. 10GB and 140GB VGP. So that is the thing below is the device VGP type update. For GPUs that support VGP such as V1104, we can display the old VGP type from the device support type. For GPU support make, for example, a 100-way enumerate host with conditions. You can select the VGP type and change it. Several API updates, database, and notify several agents to report the changes. Next. This is a device profile management page. So the name and group, when we create the device profile, the name and group are needed. The type in group can be PGP, VGP, or APGA. It depends on the accelerators in your resource pool. Then we choose vendor, product ID, and numbers. We can add a group with another type of device for device profile. So for a VGP or device profile, we need to choose the memory. The number is 1, and the group length is also 1 because we limit a VM bound. Only one, limit a VM bound, only one VGP device. So next. This is a no-RVS type of device. For a PGP device, no one can support create a VM with one type of GPU. But with type of, we can support create a VM with different type of GPU. For VGPOS, no one should configure VGP type and addresses, which is very complex. I need to start the computer service if we modify the configuration. By the way, we don't need to configure VGP type and addresses. And we support the dynamic-modified VGP type without restart stable agent service. For a MIG device, no one manages the MIG device same as VGP. But by a simple goal, we can also automatically discover a valuable VF and support the MIG without restart stable agent. Next. This is our works to be continued. There are a lot of VPs, Blueprints to complete. From no one's side, they are as a supporter for VGPOS, for placement with an accelerator and the utility of new owner, from no one's side, we think of it as a resource reported from no one and several. From several side, we have this enable device API and the VGP management support and the Navi-A100 VGP management and the support change VGP type of VGP API. So that's all. Thanks for your attention. Thank you, Alex, Bren and Lee for that deep dive into the Cyborg project. And a reminder, as the slide says, we will have a Q&A after the presentation, after the next presentation. So throw the questions in the comments and we will get to them as quickly as possible. Next up, we've got Dmitri Rabatyukov from Klara, another Gold member of the Open Infra Foundation and a long-running open-stack-powered public cloud in Sweden. Dmitri, talk to us about Klara's journey down the rabbit hole with virtual GPUs and first, did you take the red pill or the blue pill? Yes, and Jimmy, yeah, I guess we took both basically during this brief. All right, take it away, sir. Thank you. Yeah, so as Jimmy said, my name is Dmitri, I'm the main leader of cloud engineering at Klara and also I'm a project leader of OpenStack Ansible for the last couple of cycles. So, for those who have not heard about Klara, let me do a brief introduction. In Europe, previously named as City Network, is a European cloud provider. We provide a bunch of different services, including public, compliant private clouds. We also certify it according to ISO standards in information security, resilience, environmental protection and quality assurance. So, at the moment of speaking, we manage six public regions all over the globe, including you, North America, Asia and even Middle East. Next, please. So, eventually, we have set of criterias we've defined that should be met by final implementation. Having proper and safe multi-tenancy is quite crucial for us since GPU workloads can be obtained and released quite dynamically. Each time, we must be absolutely sure that we're using the same GPU is safe for our clients. Everyone loves to save money on things and operations are not exception. So, it's always beneficial to take into account operational costs whenever you bring in a new technology. Next, we don't want our users to think that OpenStack is too hard. So, being easy to consume, our services is another key. Majority of requests that were coming from our customers regarding support of GPU workloads were mentioned through the support. Such workloads were going to be used for AI trainings and machine learning mostly. However, we got some other requests as well to provide support of GPU-powered desktops, which are Windows-based, obviously. So, we had to add it to the list as well. Next, please. Yeah, so, now when I briefly explained our use case, I want to do some brief look on the market options. Let's call it this way and see what we have today. So, there are two columns as you may see. Left one, compact technologies that can be used for providing GPUs and cyberfalls already touch this ground. And on the left, we see hardware vendors, bigger ones that can be used to provide GPUs, basically. So, let's start with PCI pass-through. It's quite basic at most, easy to operate from my perspective, and it's supported for ages now in Nova. So, you're basically taking GPU-SPCI device and passing it to the advert using device address. Then, slightly, it's just a bare-metal host and you can even use open-source drivers to interact with it. However, since user gets full access to the device, you need to ensure that Fiverr state is good and is not compromised once device is released and before allowing to reuse it, basically. So, depending on GPU cards, there are options for adding external root of trust, for locking ability for that to upgrade or rewrite Fiverr. It can be highly automated, basically, and it's quite not that easy to operate. So, operation costs may raise at scale if going through this pass. It might not be an issue for private cloud at all, or for clouds with single customer, but I mean, ensuring that hardware is safe to reuse is quite a big deal for public clouds. Also, if you're not using the PCI Express over Ethernet and don't have standalone rack full of GPUs, then you will end up with two GPU VMs per computer host at best because you are limited with PCI Express slots and you can't usually you cannot place more hardware GPUs into the same server, so it's quite a limitation. A server we from other side can help with both of these issues. So, you don't need to worry about Fiverr state as GPUs that pass to domains are now digitalized. Well, you can use a server to make one big virtual GPU just for security reasons or to ease your operational life. It also enables you to split GPU into multiple smaller ones. Depending on the GPU model line sets or how crazy you are, you can get up to 32 virtual GPUs from single physical one. So, you can utilize hardware in more efficient way using a server but it's a bit harder to operate because you need proprietary drivers on hypervisors that allow you to slice GPU and then you should also take care of creating these virtual devices and configure nova and placement accordingly unless you use cyborg for that obviously. But when talking about hardware vendors in video looks like a most common peak across community and that's most battle tested with operators around who can give you an advice on how to do things how to avoid some bugs or configurations in a better way or more optimal way. From other side you should never forget about inconveniences that proprietary drivers bring with them and extra costs on licensing if you choose to use a server. On the other hand, I propose to use a server kind of free of charge and overall limitation looks a bit smooth on paper but eventually I personally don't know anybody who does run AMD in their production so it's a bit hard with community support and with knowledge in the community some people say that it's a bit tricky to get their drivers but to be frank it wasn't super easy for us to get drivers from NVIDIA either so I would say they kind of equal here but the main reason why AMD was no go for us is absent CUDA support which is another proprietary NVIDIA technology I honestly wish people were using open CR for a little more but we have what we have. Next please. As a result we've picked NVIDIA GPUs and decided to go with a server but found ourselves on yet another crossroad. As since NVIDIA architecture NVIDIA has introduced MiG mode in addition to more I would say conventional VGPUs so you can easily switch between GPU and MiG mode on supported GPUs it's basically a common to one single common for NVIDIA SMI but you should also keep in mind that not all NPR series does support MiG. Main key difference is that MiG mode computational cores are statically allocated so as a user you are better protected from neighbors that share the same hardware but this also gives quite significant improvement in performance for computational workloads. However if workloads also involve storage operations or utilizing CPU as well which is kind of common VGPU might give slightly better performance in the end of the day it's also preferable to use VGPU when you have more than four VGPUs basically or four partitions on the same physical card first mention that in MiG mode you can create less partitions or do less slicing from the same physical card than with VGPU next so lighting scene the most tough and complicated topic from our journey I would say so there are a couple of license types that are available that basically you can separate the ones that are designed for built-in desktops for computational workloads and that do support both for all lights it's except the ones that are designed for computing which are named VCS are paid per user basis VCS on its term is at least on video documents it said that it's paid per GPU and in fact yeah you pay per physical device but at the end of the day license must be applied on client side so in fact with VCS license what you are receiving is just eight light licenses well it depends on GPU model and how much VGPU you can get but in general it's somewhere around a licenses license on clients also means that guest VM must be provided with the license file and should be able to communicate to the license server and transfer some data to it and that is performed by property drivers that must be installed for each client in case license cannot be verified in time your GPU will be throttle virtual GPU so after 20 minutes without license you will get throttle FPS and CUDA operations and in 34 hours CUDA is fully disabled and you're locked to if I recall correctly it's something like 5 frames per second so it's basically unusable for the client on top of that client can't fetch or update to the lightest version of the driver on their own because simply VGPU drivers are hidden behind Enterprise Portal which is available only to service providers so you need an account that is not easy to get and it's not really easy to register outside of asking for a demo account for example and eventually you might want or you might need to have an on-premise license server because yeah because it's property drivers and your client is connecting somewhere so you might want to control where it's connecting and you don't want for some data leaving the premises for example next slide please yes so how we basically do that well we had to write a deep element and we are building images for GPU powered VMs and we pre-installed drivers there and place a license token we also test these built images with a tapas plugin to show at least basic functionality of provided drivers since we don't use Cyberg and to use just nova and placement we've created a system-based server that does create bunch of mediated devices or the GPUs in other worlds that are picked by nova whenever needed and of course we have our on-premise DLS server it's five volt and accept connections only from public networks that we do own but it's still far from being perfect solution because in order to create a GPU powered VM that will be properly licensed you need to use specific set of image and flavours but as of today it's not really possible in OpenStack to limit selection of images for given flavour another thing is that versions of drivers installed for hypervisor and client must be compatible so upgrading drivers is quite tricky process and might become a real issue at day two operations speaking about drivers while they're packaged and you can install them with preferred package manager they contain version as a package name so when you expect for package to be upgraded it's just erring out with a concrete so you must explicitly uninstall all the version of drivers and install newer one and of course doing that on hypervisor will require to remove all created GPUs and it's already possible to life me great VMs with GPUs since they're on newer camera versions but if you run in some older software it might be still not possible for you also if only one GPU type is supported so basically you can figure it you want to use only one profile for all GPUs and you are not using a side book so you will need to add a new device type into another configuration because otherwise config sections are simply ignored when just one type is provided in the config so it checks literally for lengths of available types yes and eventually in video drivers that were released I can't recall it was December or January this year so they basically enable you to map multiple VGPUs or from the same GPU it was a limitation by Invidia previously but we still a bit struggle to implement that and to make it working maybe because of placement limitations or I'm not really sure at the moment of speaking but we are trying to get this working now thank you thanks Dimitri one last scene I was talking about deep element and tempest plugin so we have open source them you can check your codes here and use them hopefully they will save some time and will be useful so feel free to use report box and yeah perfect so I have a question the licensing seems like a bit restrictive do you think the GPU licensing is going to loosen long term or is that something that's here to stay so given we are talking about Invidia I won't expect it to go anywhere however to be frank licensing is quite unique to Invidia so basically if we will well Intel cards are not yet to come so we are expecting for them but basically neither Intel nor AMD do not have licensing but I think as long as developers will use CUDA instead of OpenCL or more free and open alternatives they will have their power to enforce licensing basically interesting so let's bring everybody else back on I think we have Bren and Alex and Lee and Xieping so we will wait for them to turn their cameras I'll go ahead and throw out the first question does Cyborg do the MIG configuration or is that done manually outside of OpenStack I think it's outside of OpenStack right now but what I heard is Inspire folks has implemented some API to help with the MIG configuration and they will be contributing those APIs back to the community that's what I heard awesome okay great let's see next question do they need at least usury on the host for NOVA is it a cyborg specific question I believe so yes it is at least usury release because we started usury release marked the cyborg feature in NOVA and we can interact with NOVA management of the VGPU instance okay I can't actually recall regarding just VGPU and NOVA as well but I think it was implemented not that long ago so maybe before usury only like PCI pass through will be available just on NOVA but yeah to be frank we have way newer release running so I don't really know well if you want to use cyborg before usury I think maybe you can pick the NOVA specific changes and then use cyborg with it but we used to have some adapters for it but yeah that would be another story okay I think that is all the questions we have from the audience at this time I don't know if you all have any closing remarks if not we will close it out until next time and I think you missed one oh did we yeah there was one from Dewey Nguyen do you need additional license from NVIDIA to several VGPU to VGPU that one yes sorry I didn't cover that one because Dimitri pretty much covered it on his slide but Dimitri do you have anything else you wanted to add to that no I guess no except yes except actually yes you do need to have licensing to use VGPUs so you can test so if you are deploying and you want to test the functionality you kind of can omit licensing for a while because it gets sorted only in 20 minutes so basically you have some time to test out if VM is working if it's responding if everything looks good but at the end of the day yes you will need to license them to go properly okay and we just had one more question in the comments from Graham Moss does Cyborg also help with SRIOV network with MIG and pass-through hosts I have no idea what any of that means so I'm going to throw it to someone over there Brad can you help with this this way we were trying to do by this way to do it by this way it still works but it's it's not friendly for the VGPUs distance because if we want to live if we want to migration on instance we should detach the MIGGPU and then reattach it to another server and this pass if we use the pass-through if we use this way it's not okay for for for the for the instance to run but it's supported it's supported this way because MIG MIG config MIG GPU and VGPU it's our supported SRIOV network yeah so I think what Brad mentions is that since MIG already supported SRIOV pretty great so Cyborg does support that but if you do migration he didn't suggest that you do the pass-through mode I think SRIOV mode is recommended if I understand correctly we've got one more coming in here from Andrew I wonder if or will developers move forward from CUDA to OpenCL for AMD and Intel yeah I think it would be great but at the same time as far as I know NVIDIA does quite good job to motivate developers to stay with them including giving like free cards for development purposes and yeah doing trainings learnings and good marketing basically so I'm not sure if this is going to happen at least I don't at least I don't see any tendency in moving forward maybe when Intel will become a production ready scene since we'll change I hope for that but who knows alright any other questions for each other I know the anything for the cyborg team while they're here I don't think I have any questions okay well we'll close it down and I would like to thank everyone for showing up today and thank you for the presentations it was incredibly informative if you ever wanted a deep dive into cyborg or hear the trials and tribulations of video GPUs you came to the right place so thank you to all of our speakers and we appreciate you joining us and to the audience for some great questions stay tuned open and for live for announcements on the next episode meantime please don't forget to join us June 13th through 15th for the open and for summit in Vancouver registration is live and early bird services and today if you're interested in sponsoring you can talk to me uh jimmy at openinfo.dev and remember if you have an idea for a future episode we want to hear from you submit your ideas at ideas.openinfo live and perhaps we'll see you on a future show and finally I'd like to thank the open info foundation members that make all of this possible the organization will like to join the open info foundation take a look at openinfo.dev slash join or give me a call thank you very much and have a great day