 Hello everyone, my name is Liang Yan. I'm here to give you a session about in April, WGPU in Firecracker. And I'm really excited to do this presentation here. Enjoy. So before we start, I'll just give you some introduction, for example to myself. So I'm a virtualization engineer from SUSEA. So my working area is mainly focused on the IO virtualization, GPU and network before. And architecture only focused on ARM64 and S390, the 9x86 architecture. And also as you know, SUSEA dropped OpenStack Cloud last year and purchased the Rancher this year. So we had some transition. I started working on the Firecracker here and I like this stuff. And also I work at home for four years now. I think I know most of you are also working at home because of the coronavirus pandemic. I hope you can enjoy your work at home timing. And I live in Louisville, Kentucky, United States. So usually people don't know it. I just say, this is the hometown of Abraham Lincoln. People still don't know it. Then Mohammed Ali, maybe still. Then I said, just Kentucky Friday Chicken, KFC. Now they know it. So sounds like people enjoy food better than sports, better than politics. Anyway, this is just a joke. So let's back into today's topic. So for today, I'm going to give you a quick background about the micro-VM backtrack here and also give you some introduction about GPU utilization, especially some latest update and also the motivation. Why do we want to do it? And how do we do it? What should we do? Since this is a journey, exploration journey, so I'll share some thoughts after that. So at last would be the QE part, but you can type in your question from our platform, the chat platform. So I would be glad to answer. So let's have a quick look on the micro-VM here. I think it's not a strength for you guys. And for me, I also followed this topic for a couple of years. There's a lot of a cop option here, like QMU-Lite, QMU-Lite, AMU or QMU-Micro-VM last year from last year. This is QMU-based and a Firecracker, Rust-VMM-based, but it's earlier than Rust-VMM. Anyway, and also some other like process level, the API, like a G-Wiser here, so it's not my intention here, so we just keep it. If you look at those VMs here, you may see something in common, like they're not considered as VM, but more like a container. And they also work as a container. There's only like one, maybe one workload under it and a very short life cycle. And just need it is safe and the architecture is light, the running is fast. Here's architecture for Firecracker. You could see it's highly integrated with container here. And I think that's the many uses for them too, for the micro-VM. And we'll take a quick look to the GPU side. So I think a GPU, everyone using GPU today, but looking at the GPU, you may see there's too many uses for it, graphic and computing, graphic like the game streaming, 3D rendering, and the computing is more like a popular today because of the AI machine learning. And you know that there's two different types for this kind of machine learning, the training, a model based on the huge data sets and the inference. Like you already have this trend model just to do some judgment. So, and then the GPU virtualization and I think a lot of people already done this topic from a low level and we couldn't know that, like you can see pass through, GPU pass through, that's a basic one. And there is also a full GPU virtualization and there's a thought of the way to do it, MDV, mediated device, Intel and NVIDIA doing it. There's a hard way like FRROV and AMD did it, but NVIDIA is also jump to it too. And ARM, ARM also, so you can see here, this is the 800 that's announced this May. And you can see there, this model is this GPU actually FRROV based. And one interesting part is that it could also be used for a code application directly. So, and the user case is like, you can use it in a container. So it looks like all about container today. And ARM, ARM is pretty new and they just announced its GPU virtualization solution last month, I think. And the interesting part is it's a user case. They are mostly working using, used for vehicle autonomous driving and basically for this automotive enhancement. And this is a new user case because of the training and inference scenario there. And it's kind of interesting to make things. Anyway, so they also have, and based on the architecture here, it also looks hardware based. And yeah, there's also other GPU. I've been working very closely with all those when there's in through the, like an NVIDIA ARM and also AMD Intel and then even some other hardware accelerator like FPGA and neural NPU like a GPU some stuff there. So, but we are talking about the GPU here. So let's just focus on here. And now comes to the motivation. Why do we want to do it? Like, I guess, generally because people want it, it's quite popular for AI and machine learning today. And some people just want more from Vycracker. Like, yes, you are wonderful, but why don't you provide this more? Now the other side is about the GPU virtualization. Like we could see that, IO virtualization has actually become the powerful, more powerful now. And it has its own virtualization capability and we see independent with the general VMM. And like we just saw the 800 there, like you could adjust to use its VGPU for application. So that's a good case. And also people are talking about it in, in Vycracker, the GitHub is here, like a couple here and generally people want it, but there's some conflict here. Like because Vycracker has its specific purpose, mainly for the thoroughness computing. And so in this case, they may need to run tons of workloads in a host. In that case, they want the oversubscription and also they want a better performance during the workload suites. And VGPU virtualization may seem not good, but anyway, I'll just show you how to do it here. Like I said, I just knew to Firecracker and Rust VMM stuff, but I know VGPU and also QMU. Good part is that most of its work is still reused like the host side, MDV or FROA and is created for the VGPU and we share this. And the KVM side by cracker is still using KVM for it. So don't fart, we don't need to care about it. And we only care about the VMM the part like if we're using the VFIO PTI pass through. Basically, VMM just needed to create a PTI device based on the host information by like accessed by VFIO function here. And the other part is that Cloud Hypervisor, this is a project based on Rust VMM. It implemented VFIO banding and the VFIO CTLs. I think that makes the whole process much easier, doable. From here, the VFIO banding is kind of a thermo head fires automatically generated by the band gen. And you can see there's a thermo head fires for the function definition from the KVM or host side. And the VFIO CTLs is the mainly implementation. So now things get clearly and easier. So what I did here, as I said earlier, I'm working on the GPU processing. So I have a lot of fancy hardware here, the VGPU, every year we handle Intel GPU and I also have this old AMD GPU here. And I'm working on based on the 315HP2 OF here and also running the VMM. So from my thought, I think I just, I needed to run through with the Cloud Hypervisor. That would make things easier. Like if it could work and if not, what's the problem? After that, I can just back part or refactor to the fire tracker. So things are pretty clear. And during my try on Cloud Hypervisor, I found most of them could work only like a VGPU. It's like once I passed the MDV inside it couldn't get through. So looks something wrong with the GPU driver during my debugging. But I'll follow that later. But anyway, I mainly use the Intel with GPU here because it's open source from either side. And now back to all these, like I checked how Cloud Hypervisor make it work. And in order to that, I need these kind of a queries here. We have our bonding and we have IOCTRs. This is the two main queries here for implementation and also need a PCI. As you know, fire tracker only has this MMM IO bus. So we needed to implement this PCI bus there. And then just some VMM like we needed to follow the whole process and then implement it in fire tracker. And also need some dependencies like the VM device, device manager and the VM locker and the full PCI implementation. I probably moved a much more codes here because I just as the first step, I just make it work. Then I can get through all the codes carefully. And also there are some other things like because over the GPU, the Thabetic over the GPU, we need some of the UFI for some of the bus, we both set up. Otherwise it will have the driver issue later. And also the kernel configuration, the original configuration is quite simple. And but we needed to involve extra modules for just to enable the graphic driver. And yeah, I plan to put all the code here, but that's really not necessary. And if you do that, you may see another benefit for the Rust is like the whole device create, it's much easier than the QOM model back in QMU. So what I'm doing here is like during the fire tracker, I first the parameters here, I just catch this device comes with the pass over the device. Like if you use the MDV, it becomes a UID number. And also if just a virtual function, it would be just the PTI numbers there. And then during the VM initialization, I needed to create this PTI bus. And then I needed to do, I needed to use the VFIO to create PTI devices. So I needed to access the hosting device with the device path or ID, you can see. And then it will access to the host CIS virus system and get the group and the device information from host. And then you just use this information to create your VFIO PTI device. And after that, you put it to the PTI bus. And after that, I think it will be detected by the guest we are. So far, even I said, I think I put a much more code here, but it could work with some basic functions and it could be detected by the guest kernel. Although just during the kernel, the driver couldn't work very well. There's a lot of issues like here, the bus table is not available, can't find this one and a lot of our DRM function failures there, the DMAT is kind of a mess. I needed to go through that anyway. So anyway, like I said, this is a journey is plough. So let me just share some thoughts during here. Like during all the back part working, I think it's fine, I like to read in code. But I just feel like during all the working, I'm just one question I'm keep asking, is it a good fit? Because first, the more I back part, the more it needs. Like I needed to back part of this, that and even after that, I the feedback looks quite different. The booting time and the memory footprint, it's bigger and I must have done something wrong. Also, like during the discussion there, so the issues there is still exists like the design fee, lot fee, authorities computing, like since there will be a lot of work load here. And so they want this over subscription and they also want the SWIFT, the SWIFT would be fast and efficient. However, the WGPU scheduler probably would be a concern. I didn't test that because I just test the one we are here, the workload is simple. So with this confusion, I'm just asking, is it a good fit? And so I just step back, like what do we want in the first place? So like fact-correct is for the serverless computing, for sure, and we don't want the graphic, we just want the computing. Even for the computing, we don't want the training because it will last long. We just thinking about the inference. So we still want it a safe, light and fast. So we check back to this working load, like the platform here. Basically, there are just CUDA or OpenCL mainly, like even for the ARM, the ARM NAA, it has its own platform, but eventually it will still use the OpenCL driver for GPU side. So maybe things would be easier thinking in another way. Like maybe we create a word CUDA here or word OpenCL, no matter what. So we can create it as the word ML device, and it's only for the general compute and the design based on the driver API, and then use the way ring to transparent to the host side. And so we don't have those concerns anymore, like the memory pin and the huge, complicated machine configuration and the kernel configuration. So maybe, so I think I'll probably working on it. There's already started the initial work here. Another way is, it occurs to me is that about the 800 MIG, that it could be used directly by container. So I did some research back there and I thought maybe I can reuse some idea about the NVIDIA container. But no, NVIDIA container is like a mount, the hosted driver into the container image, the container root file system. So it's different with the machine scenario. But maybe it's still thinking about another device type or like VFIO, MIO, I don't know. But yeah, I'll keep thinking. So for this whole journey, I'm thinking like some questions like could we do it? Sure, but should we? That depends. Like here, like a VGPU in Firecracker, it may be a very good fit, like because it's a better user case for Firecracker. But if we jump out of this scenario, thinking about the arm, the terminus driving there, that's a different scenario. Like maybe we, the workload maybe not that heavy, but we still want some GPU computing here. And still we still want it like a safe light first. So this may be a good match. So yeah, so that's totally depend on how you want to use it. So thank you for this one. I will still implement it eventually, even not for Firecracker. Maybe design a new VMM based on the Rust VMM, like just for some VGPU scenario or AI machine learning scenario, that would be fine. But I really needed to optimize the code carefully. Or just we discussed earlier, just to use a different device model. I think it's doable. I also did some research here. Some people, some paper from school already implemented it with a different way. Here we just needed to do it with Rust and KVM. One thing is I think that the API dependency would need to be really careful. We don't want this kind of API hair. And I think if it has both ways and we do a overall comparison would be interesting. I think maybe next year I'll share some upstate even with upstream. And the last thing is that, yeah, I think I like the Rust VMM very much, especially for a created device. It's much easier than QOM if you know what I mean. And I think that's it. So if you have questions and typing from the chat channel or email me. So anyway, thank you so much for your time. And it's really exciting. It's my honor to present here. Thanks.