 Hello, everyone. My name is Charm Liu. I'm a cloud software engineer from Intel. Today, my colleague Xiaodong Liu and I will deliver this topic together. The title of this presentation is what-how device emulation in SBDK based on VFR user protocol. Here is today's agenda. We will give a brief introduction on VFR user components. Second, we will introduce the implementation of the emulator what-how device based on the third-party Libre VFR user library. And then we will compare the performance numbers of VFR user versus VHOS user in one single VM. Finally, we will list our future development plans and purchase links mentioned in this presentation. Okay, let's go through the first part. First, let me give a brief introduction on what VFR user is. VFR user is a protocol that allows a device to be emulated in a separate process outside of a VMM. The VFR user specification is largely based on Linux VFR-Aoyakto interface to implement them as messages to be sent over Unix domain socket. Here, VMM includes Cloud Hubvisor and Kimu. And SBDK also includes full support of VFR user client. The second part is VFR user server, which is used to emulate PCI device in a separate process. Then let's see VFR user client support in SBDK. The diagrams in the right give a brief view of it. We have a VFR user PCI device abstraction library which provides PCI device access APIs in the low level. This library accepts a Unix domain socket path addressed as input parameter and associated with a PCI device. Then, VFR and VMM client library provide support in dependent abstractions. Then we can add a new VFR user transport layer which uses two forward PCI bar accesses from VTR library to remote target via VFR user client PCI device access APIs. Based on the client VTR library, SBDK provide common block device layer on it. The user can test block device over specified Unix domain socket address. Currently, the device type can be VTR block, VTR SCSI or VME devices. At last, the users can use VFR user client to build their own applications. Then let's see the device emulation support in the server side. Device emulation in server side is based on third party library, we call it VFR user. In last year's QVM forum, we already presented the emulation of MMM device based on VFR user library. In this presentation, we will demo the emulation of VTR devices. The emulated VTR device library will response PCI bar accesses messages from the client side. The client could be VMM or SBDK client and it will process V-rins. Then on top of the emulated VTR device library, we provide VTR block and SCSI device emulation. They will response device configuration accesses based on VTR block and SCSI specification. And paste the block or SCSI request from the V-rins. And finally, this block or SCSI request will be processed via the block device layer APIs in SBDK as offload engine. Okay, since both VFR user and V-host user can provide VTR block and SCSI device emulation and accesses. So what's the common and difference between the two solutions? Let's see from the client side. SBDK VTR library has a very good abstraction layer. So the client side library can support both VFR user and V-host user as a communication channel with remote emulation processes. And you can write it inside VMM which act as a pulling mode driver via the PCI transport. Let's see from the server side view. At least a table below. The thread model is same for V-host user and VFR user. For VTR support currently, V-host SCSI cannot support the packed V-rin. This is one difference. And both packed V-rin and split V-rin are supported for VTR block in VFR user solution and in the V-host solution. For the live migration part, multi-sense and interrupt mode features are only supported in V-host user solution now. All these features are in development plan now for VFR user. Okay, well here since the V-host user and VFR user had too much in common, right? And V-host user can cover all the usage scenarios. People may ask why VFR user is still developed, right? One reason is to simplify the development and maintenance effort in client and server side for device emulation. For the client side, users can use one VFR user client driver to connect to remote device emulation process. The device types could be hotel devices or NVMe devices or even NIC devices. These are VFR user client driver support in CloudHavvisor and SPDG2. And the upstreaming of the VFR user client driver in T-MU is work in progress now. And another reason is a unified live migration framework could be possible. For example, developers don't need to design another software framework to enable NVMe device. At last, for the V-host user solution, it is designed only for hotel devices. It cannot cover NVMe device, but the VFR user model can cover hotel devices and the NVMe devices, right? And at last, compared with V-host user, the PCI device emulation is also in remote process, which makes the VFR user client in NVMe is much thinner than V-host user. I will talk this a bit more in the implementation slides later. Okay, let's see the detailed implementation of the emulated V-host block device and SCSI device in SBDK. As we already know, VFR user need to emulate PCI devices in the remote process. So VFR user server process needs to define V-host device live out as the first step. First, we will define MSX capability to use region 1 as an MS Activo, and use region 2 as MSX painting bit array for those two areas will both result one page. Then, according to the V-host specification, we need to define window-specific capabilities for each V-host device configuration sections. Here, the first three sections are defined as MMI access areas. For client drivers with access to regions, they will be forwarded via the unit domain socket as the messages. And for the first part, notification area, we will support dual mode in practice. It's up to users' configuration to define this area, use memory map origin or not. If we define this area as the memory mapable area, then employee mode in the target is used. Then we need to set callback functions for each VFR origin. For the most important VFR origin 4, this is mapped to PCI bar 4 in practice. The callback function is called with offset, let's rewrite flag. We will use the offset parameter to determine which VTR device configuration section. For example, if the offset is less than 4 kilobytes, it is used to access the VTR common configuration section. Based on the offset and the value, we will do features negotiation in the server-side. First, then map the VTR queues in the server-side and start the device finally. Currently, VTR block and SCSI devices are added in SBDQ, but we provide the common abstract layer to allow users and other types of VTR devices. Another important VTR device configuration section is device-specific configuration area. This section is device-type-specific. For example, if for the VTR block device, capacity and block size attributes are stored in this section, and for VTR SCSI device, number of queues attribute is stored in this section. Here is the thread model in SBDQ for VTR SCSI. VTR block uses the same one. Users need to specify the CPU core mask when starting a VTR SCSI device. Then an accept portal is started to listen to common socket connections. When Kimmel connects to the unit's domain socket, the accept portal will start a socket message portal on the same thread. After the connection is created, we can unregister the accept portal as an optimization. Currently, this is not implemented now. The socket message portal will receive all socket messages from the Kimmel, then deliver them via the VFR region access callback functions based on the offset and the region. When we start the device in the third side, we will start a ring puller finally to pull all the V-rings. Actually, that is the VTR puller tool in the same thread to process block R request. After we start the ring puller to process our command center from VM or VFR user client, then we need to do the actual command processing for the VTR block device. The command set defined in VTR block specification is very limited. All of them are supported. SPDK block layer already has different block APIs to support them. For the VTR SCSI device, as we know SCSI is a very large industry specifically. Fortunately, SPDK already has SCSI library which provides a monetary SPC and SPC command support. And we use this library in the SCSI target. So here we can still reuse this library to process SCSI command in the V-rings. Finally, there is a common work that can translate a V-ring descriptor into our vectors and translate guest physical memory to host the virtual memory. These features are common both for VHOS user and VFR user. We can do them together in future. Now, let's start the performance section from the implementation and thread model. VFR user and VHOS user for VTR devices are much in common. Here we still get some performance data as a comparison for audience as reference. In this test, we start one VM with two VTR block controllers. One block controller is provided by VFR user. The other one is provided by VHOS user. Inside the target, we use one Intel P5800 OpenDrile as the storage backend. It was split into two logical parts. One for the VHOS controller, the other one for the VFR user controller. We also use the four world queues in both client and server side. The size of each queue is 128. Also, the packed V-ring feature is enabled. We run FL inside the VM. Here is the FL parameters in VM. The difference between different running case is the ILO queue depth. It is started from 1 to 16. This is our test case 1. It shows the test result for all bulk configurations in VM and in SPDK from ILO queue depth 1 to 16. The performance number is almost same for the two solutions. From previous slides, we know that two additional polys are used in VFR user solution. But from the performance number of test case 1, we don't see the impact of the two polys. And we test another test case in test case 2. We replace the physical SSD with two now loopback block devices. The purpose is to test KVM efficiency and visualization over height. All other parameters are same with parameter of test case 1, except ILO queue depth and read-write mode. Read-write is used instead of read-write. We can see even using here queue depth, the two solutions are still almost same. The two additional polys almost have no impact to the result. The maximum LPS 1 host core can provide is limited by CPU capability. Here is another different test case. Its performance data are collected using SPDK VTR client. In server side, using same configuration as test case 2, that two now type BDIOs are created. One is exported by VFR user. One is exported by VHOS user. SPDK VTR client library is same for VHOS user and VFR user transpose. These two different parameters are applied to be doubt proof on each type of the VHOS devices. When running in this scenario, both client and server are running in polling mode. From the table, the performance number is still almost same. Okay, let's give a summary of this presentation. As we already mentioned in previous slides, SPDK VTR client library can be used both for VHOS user and VFR user. But in server side, currently we have two independent implementations for VHOS user. One is for VHOS user, keymail emulator, PCI device part, and SPDK mainly do the V-ring processing part. Include the DQ in queue processing and translate the descriptors to our vectors. And also for the V-ring processing part, they are most same for VHOS user and VFR user. They will abstract this part into a common library in the future. Another feature is interrupt mode support. In the thread modeling page, we said VFR user had two additional pullers, which can be optimized. Although we don't see performance drop with these two pullers. But still we can optimize the pullers into interrupt mode after starting device or enable the interrupt mode with all pullers. This is useful when running multiple VAMPs in one single CPU core. Here are all the patches used in this presentation. Some of the patches are still under code review in SPDK. And the VFR user client driver is also under review in keymail. There are some code already in the main branch for the VFR user client side. CLUB, HAVAVAD, and SPDK already have full support of it. That's all for today's presentation. Thank you.