 Hello, my name is Jan Wagenfuhr, I'm from NANIX. The topic of my presentation today is the receipt size scaling RSS with eBPF and QMU and VRTELnet. This presentation is based on the work that was done by Andrei Milichenko and Yuri Binditovich in QMU, Linux Kernel and Libre. So what are we going to discuss today? We will talk about what is RSS, receipt size scaling, how it's beneficial for us. We'll talk about some history of RSS and its implementation with VRTELnet devices and VRTELnet drivers. We'll talk about what is eBPF and how we can use eBPF for packet steering in VRTELnet. So first of all, what is RSS? Re receipt size scaling is one of the mechanisms to improve network device performance. The network device performance is improved by two things. So first of all, the packet processing will happen, it will be distributed among different CPUs, so the processing will be parallel. And second, we can have cache localization of the networking traffic processing of specific application running in the VM because the packets that will be sent by the application, the packets that will be received by the application could be received on the same CPU. How the mechanism works? So when the NIC receives the packet, it will try to classify by using some hash function and then use a hash value as an index in the indirection table and then the entry in the indirection table will receive specific queue that the packet should be processed in this queue and in the tied CPU to this queue. And of course, if you have MSI or MSI support, then you can also interrupt as a associated CPU with a specific queue. So we can have, we can see this picture, kind of a full implementation of RSS mechanism. There's packet classification, the packets are still two different hardware queues and different hardware queues that defies different specific CPU and then the driver handles the Rx on specific, Rx on the specific CPU. It will be diving in the classification itself. So again, depending on the packet type, the hash function goes on the packet header. So for example, on destination and source addresses, and then the value will be used as an index in the redirection table. So let's start with a history of RSS under Teonet. So why do we want to use RSS under Teonet? No reasons. So first of all, as I already mentioned, this is a mechanism to improve network performance and we want our device to perform as well as possible. The other thing is that we actually were forced by Microsoft to start supporting RSS in our real-time NetWindows driver because for high-speed devices, so that means the devices that support more than 10 gigabit per second, we actually have to support RSS. So what happened is actually before there was a multi-Q implementation in your Teonet, we started to implement, we had a software implementation in Windows guest driver for RSS. It's very similar to RFS in Linux, and you can see in this diagram how it was implemented. So we had actually one RxLirQ. It was interrupting the guest and interrupts went to specific only one CPU. Then the packet classification actually happened in the device driver and not in hardware as previously discussed. Then we rescheduled the actual packet completion to the needed CPU. So at some point, RIT Teonet became a multi-Q device, but then we had a problem. Windows actually supplies us their direction table. So the auto steering that is currently implemented in RIT Teonet with multi-Q was not good enough for us, because it was completely ignoring the settings by the guest. And also there are some cases that auto steering worked quite well, especially with the TCP, but sometimes it's actually was completely missing the Q or expectation of the flow from the guest side. So what we had to do, we had to implement some kind of hybrid model. On one hand, the multi-Q auto steering was distributing packets between the Qs, but then on the guest, we calculated the hash again, and we checked if it's at the correct Q. If it was not the correct Q, we had to reschedule the processing to the correct Q on the guest. We also support legacy, it's not so interesting today, but that was also one of the things that we did in our implementation. So here's a simplified diagram of what's going on before EBPF and before additions to RIT Teonet spec in Windows guest driver. I put here only two BCPUs in order for the picture not to be too convoluted, but what we can see here is although we have several RxVirtQs, still we are doing the packet classification in the guest, and we still might reschedule the packet processing to other CPU. So the next step was to make a RIT Teonet device aware of the RSS settings. And that's where we had RIT Teonet spec changes that were proposed and accepted to be able to set the steering mode, to be able to pass from the device to the direction tables, and also we want the hash value that is calculated on the host side to be reported in RIT Teonet header. This will allow us eventually not to have any inter-process interrupt due to rescheduling, and the vision is that at some point will be no extra calculation needed, not on the guest side and not on the host side, but the hardware will do all the heavy work. What implementation we have for this? So first of all, we implemented some kind of a software-only proof-of-concept in QM. And the second step was to implement steering with eBPF. And before we jump to eBPF, let's go over RIT Teonet spec changes, because they are very important to understand how the RSS in RIT Teonet is working. So first of all, we added the capability flag, and so we have RIT Teonet FRSS. This flag requires multi-Q to be enabled as well. We have changes in the device configuration, so if you will look at the RIT Teonet config, there are three fields related to the table and hash type, and there's also control Q message that can set RSS configuration. So RSS configuration can be set somewhere during the lifetime of the device and the driver. The guest can actually set it in any moment. Another thing that we changed is VIRT Teonet header, so VIRT Teonet header can now report the hash value to the guest. So now we can go and discuss eBPF. What's eBPF? eBPF is an ability to run sandboxes called the Linux kernel. This code can be loaded in runtime. The nice thing about this is that you can change the code without changing the Linux kernel itself, and it can help you to implement additional functionality or to make some kind of proof of concept or a new functionality that it might take quite a long time to push to Linux kernel. So how can eBPF help us? We want actually two things. So first of all, we want RSS hash calculation, and we want to have the index of the Q for each incoming packet to be done by the eBPF, and we want the hash value in the VIRT Teonet header to be populated as well from the eBPF program. So this is something that is still work in progress. How, what's the magic? We are loading eBPF program from QVU using a specific iOcto. And then in the Teonet, in Linux kernel, in Teon device, Teon structure have a steering prog field. So this is the loaded eBPF program. And when, and if it's loaded, then the Teon select Q will actually use it in order to calculate the Q index. The hash population, so we want to populate the hash from eBPF, but here we have some specific issues. So first of all, we implement, we added additional fields in the NetHeader, but it's not enough. And actually, a lot of times when we calculate the hash, the header still does not exist. So we have to keep the hash somewhere, probably in SQB, and copy it later to the NetHeader. We have to enlarge the NetHeader in all kernel models. So this is something that is still work in progress. The initial set of patches was sent to Linux kernel list, but we got some reviews and they're still working on the, on the feedbacks. Where you can see the eBPF program source in QMU. So under tools eBPF, we use Clang to compile it. During the compilation, eBPF to create a skeleton and populate the skeleton.h file. And then during the QMU compilation, the eBPF program will be part of QMU. There are also some helpers to initialize the maps. The maps is some kind of a mechanism to share data between the eBPF program and kernel and user space. Due to some changes in LibVert, you have actually a little bit different implementation between what is now in QMU already in the main branch and what's in the patches that were recently sent to the mailing list. So the accepted implementation has actually three maps, but we will discuss why in the recent implementation, we have only one map in pending patches. And the configuration map actually has all the parameters that are needed for the eBPF program to calculate the right flow, supported hash flows and direction table, default Q, hash key, et cetera. How do we load eBPF program? There are two mechanisms. So one, QMU will load the function from the skeleton file that was built during the QMU build using a eBPF syscode. Another way is using eBPF helper program. So it was created in order to be used by LibVert. And in this case, QMU will get file descriptors from LibVert with already loaded eBPF program and mapping of the eBPF. So this part is still under review. Loading eBPF is actually a tricky thing, and it can fail for different reasons. First of all, kernel support. Current solution require kernel 5.8. If you're loading from QMU, the QMU process need to have process capabilities, CUT, eBPF and CUTnet and DEMIN. A user can disable loading of the eBPF from the user space programs. And we also rely on the eBPF library to overcome some of those issues. So in case of helper usage, the problem can be mismatch between the helper and QM. Why? Because we need to be sure that the mapping, the eBPF mapping is the same and loaded from the same file. So we use a stamp as a hash of the skeleton include file. And then QMU verifies that the helper that is used is actually the helper that was built with this specific QMU version. What happens if for some reason we cannot use eBPF? There will be a fallback to built-in QMRSS steering. There are several reasons other than failed loading. So it's also can be triggered by live migration. And also currently, if we have hash population enabled in QMU command line, then again, we will actually fall back to the built-in QMRSS steering and not use eBPF because this part of the eBPF functionality is still not implemented. So some issues with live migration. First of all, we now have also dependency on the kernel version. So if we migrate to the host with old kernel version, then we will not be able to load eBPF. There are some other reasons, as we already saw, that eBPF might fail. So what will happen in this case, we will fall back to in QMRSS steering, where some changes introduced in QMRSS command line in order to enable RSS steering and eBPF usage. And other than changes for the device, we need to be sure that the multi-Q is enabled for the device. And of course, we have sufficient number of virtual CPUs for each queue. So the parameters are RSS on. And in this case, QMR will automatically try to load eBPF. If it will fail, it will fall back to built-in RSS steering, hash on will try to populate hash in the virtual NetHeader. And we can provide also file descriptors from the helper, from the Vibra, sorry, from Vibra. So some points about live integration. When Vibra runs QMU, the QMU runs with the least possible privileges. And in order to load eBPF, we need NetCupAdmin, which will not be available. In this case, we need eBPF helper that will prepare and load the eBPF program, prepare the mapping. And then Libre could pass the file descriptors to QMU, and QMU could use eBPF. Those pages are still under review, both in QMU and mainly in Libre. So what's the current status? Initial support was merged to QMU. As I mentioned, the Libre integration patches are under review. And hash population by eBPF program, those patches that were sent to Linux kernel, they are pending some additional work for next set of patches. Here you can find the links to the patches. So what's in the future? There are probably two things that we'll look to do next. First is the packet filtering for VHOS, and other some security features that could be implemented in eBPF. Thank you very much. Here to answer your questions.