 Hi everybody, my name is Bobby Eschelman and I'm a current engineer for the systems technology to the engineering team at by dance and I'm going to present to you. VD use supporting VDPA devices in user space. Now, here's the agenda for the talk, we're going to first look at the defining the problem and the solution. And they give a little intro to VDPA and VD use. Thank you for the explanation of why user space simulation matters. And then a design and implementation deep dive into VD use. And then finally, we're going to look at a use case. So first defining the problem. As we all know, modern systems today are deployed typically in hybrid environments of both containers and VMs. Many applications also have specific IO requirements. For example, this could be direct access to block devices or NICs, or hardware accelerators, GPUs, ASICs, FPGAs, but yet container IO interfaces and VMI interfaces are not the same. So for example, it's applying block devices to VM. So we're just going to simply use for a guy out. You're going to provision your VM with a bird eye or block device, the guest kernel will detect it and present a block device to user space via the bird eye or block device driver. And then the back end will just be emulated transparently for containers a story is different right. So we have block devices, but they're just real block devices. The kernel doesn't have any capabilities for virtualizing a block device. The container or bare metal really just has direct access to any block devices on the system. You can kind of work around this use NBD where the you have your container accessing a network block device and then the network block device server is just a user space. And then you can modify that to kind of act similarly as you would like a virtual back end. But it's not really designed for that. And so there's a major performance and kind of system complexity costs. So the main point to what I'm trying to drive here is that non generic interfaces with non generic back ends come at a development and maintenance cost. So now let's take a look at the proposed solution. First, we want uniform interfaces for both the ends and containers. We don't want applications to have to interact with different interfaces. And we don't want to have to provision systems in any specific specifically different way to to provide the services that we wanted to provide to our applications. Well, VDPA does this VDPA provides a way to give vertio devices to both VMs, which is not necessarily dependent on VDPA that you can go without VDPA. But the real value and a piece of VDPA allows you to provide vertio devices to applications running in your host user space. And then we also want to provide vertio device back ends to be uniform. And we also want them to be in user space, which is where VDUs comes in. So when you use VDUs and VDPA, you get a couple of benefits. One is greater portability for both applications and back ends. Another is easier maintenance of the plate systems, which we'll talk about a little bit. And then better performance compared to some alternatives. VDUs is very, very, very performant between zero copy pulling and just like minimal context switching VDUs is very, very, very performant. And we have compared it to some alternative setups. So let's look at a VDPA specifically. VDPA stands for vertio data path acceleration. For those not familiar with virtualization vertio specification for para virtual devices. What is it? It describes how virtual machines drivers are to talk to a emulated backing device. So VDPA specifies hardware devices that understand the data path of vertio. So first, we have in VDPA the hardware devices themselves, we have the VDPA bus that's actually, you know, it's kernel now. And we also have the VDPA vertio drivers. The benefits to VDPA are obviously the performance gains because the data path now completely bypasses the host. You don't have to trap in the host user space and have your device emulator and then make system calls into those kernel to, for example, persist a block or use a file backed scheme, writing through a file system. Or you don't have to track back into the host kernel. If you're using V host, for example, with VDPA, the virtual machines, virtues are shared directly with the VDPA device. So there is no need to involve the host. It also seeks to minimize development and complexity for hardware vendors by limiting the amount of extra work that they need to do to make a VDPA compliant device. This is mostly because of the form of only requiring implementation of the control. I mean, I'm sorry, the data path in vertio and not the control path. So private implementers have VPA bus functions that they need to implement in order to plug their device into the VDPA bus, but the actual control portion of the devices is vendor specific. But importantly for VD use is that it provides a uniform interface to bare metal container applications and to applications in the end user space. You can have a vertio device being provided to your host user space as well as a vertio device being presented to your VM user space. And that's very important. It simplifies. So here's a really, really simplified look at the VDPA data path. So first looking at host user space applications and these can be container or bare metal. What they see is normal vertio device and this vertio device is actually presented by the vertio VDPA bus driver. And the virtue that's owned by that driver is shared directly with the VPA device. So for guest applications in your VM, they also see a normal vertio device. And the back end is a VOS device that has been presented by the VOS VDPA driver. And in that, the virtue is also again shared with the VDPA device directly. That's where the performance benefits is being derived from the VDPA in hardware. And just a kind of double click on this point that control path is vendor specific. So the way VD use connects to VDPA is we're essentially just taking the hardware device and bringing it up into user space. The device is exported into the host user space. And the guest applications still see a normal device, your guest applications still see a normal device. But the VD use and VDPA device may be emulated with all the flexibility and utility of user space applications. And to point out that emulator still has full access to the vertio virtues. This is accessed via shared memory or a bounce buffer. And we'll go into that a little bit later. So why do we care about user space emulation? Why not just use like an internal emulator like the host or go down that path. One is faster building cycles compared to internal. Back to this point, your project trajectory can be pretty quick when you have an aggressive schedule. And yeah, just user space applications can be developed very quickly. There's easier maintenance and this is really in reference to live upgrades. They're just easier to do for user space application than for kernel module. And also more flexibility you can use you user space libraries and importantly you can reuse QMU code and rest VMM code. So now let's look into the design and implementation of VD use. Here's a high level high level overview of the VD use architecture. VD use consists of two components. One is the VD use daemon, which is in user space. And the other is the kernel module, which is obviously in kernel space. The daemon has two primary responsibilities. It's really device emulation and implementing the vertio data plane. One of the mechanisms that it uses to achieve those goals is one using iOctl on the character device that the kernel module provides to user space. Octl is used for device initialization and configuration. And the third is the reading and writing to that character device, which is for passing and receiving control messages. And then the third is a map for mapping in shared memory for from VMs to have access to the DMA buffers for the devices that are in VM. So I'm going to look a little bit more deeper into the data path and the control path here. So vertio vdpa, which is really referring to the bus driver that is used when your container or bare metal host user space application is accessing a vertio vdpa device. This is a context mountain DMs just just to clarify. That uses a bounce buffer so what the bounce buffer does is it performs a copy of the DMA buffer into a buffer that is mapped into the daemon. And one of the reasons that that was done is because the buffers, the buffer rights may not be page aligned and so just simply like mapping in a DMA buffer page directly into user space may be exposing kernel data that the application doesn't have any business having access to. And so this bounce buffer mechanism avoids that vulnerability by only copying the data that should actually be accessed by the daemon and that accessible by the daemon. And that's actually implemented using an MOU based software IO tail be for VMs except last example is for containers for VMs via as vdpa is used. And that's just straight up shared memory for the control path. As I mentioned before, I opt all is for things like vertio feature initialization update the config space of the device querying vert queue information. The forwarding control messages from the vdpa bus to user space. This is the reads and writes to that character device. This is used for setting device status getting vert queue state and so on and so forth. So now just a look at a specific use case for vd use. This is a distributed storage use case. We have a storage service. Internally there's a storage SDK that hooks up to all of our remote storage services that we we provide to our applications. So the storage daemon. First of all it needs to emulate vdpa block devices just as any video use daemon does. I request coming in from the various kernels. In this case, we have both our host kernel. You can think of it as being routed from the container accesses through the vertio and vdpa infrastructure through the vert queue, the software iotlv mechanism, and then up into being handled by the storage daemon. Or the shared memory with the VM. So the block I request are coming in from from those directions. And then I request being fulfilled using the remote storage back end, which in this case is a well maintained and documented well used storage SDK. And that can easily be used because the daemon exists in user user space. From the container perspective, it just sees a plane for a block device. And from the VM perspective, the same. The result of this from a performance perspective is that we see a one and a half times improved performance in both IOPS and average latency over alternative setup that we had prior using mbd which we provided to our containers access to a network block device, which was being serviced by a server application user space. This is one and a half times more performance than that. And then again we still have that flexibility and maintainability that we were looking for. Okay, thanks that that is my talk. I appreciate the time.