 Okay, so now you're running known, that's easier. Is it in any way possible that this is just turned off? I don't know, it looks turned off, doesn't it? It's completely off. Touch this and I will not change anything, I just want to look at it. So VGA, okay VGA it is, no VGA, so probably VGA should work. Yeah it does. So I think we actually just needed to switch this to the amount that we are not allowed to do that. Okay, just run it from your laptop. Andrea, did that display just come out on its own? No, I touched it, but I didn't, I just touched the display to wake it up and I didn't touch any of the stuff that I am not going to do. No, in the right side it looks like VGA and it just magically works, right? Yeah, I think it's just configured to just use VGA. Switch that to HDMI, that would work. But we are not going to do that because we are not allowed to. And the VGA now, it's just sealed. Andrea? Yes? Oh, filming. Sorry. No, there's no sound. No, there's no sound. Okay. And now we're going to turn this on so I can hear that in the corner. Okay, so we just... What cover is that? Yeah. Okay. It doesn't speak to you for some reason, but remember for the guy, for the mission, it's done there. Okay, if we need it. Awesome. If you could set your external display to 4x3 resolution, not revolution. I know how to do that in none. Okay. Sorry. Yeah, say that slide about it. Did you get the clicker? Yeah. And then for the clicker, you just need to put this in. And it will just go through the... Okay. Let's write your... You need to put it into USB? Yeah, USB. So I guess we are one minute over... One minute late. You can unmute and start. Do you want me to introduce you or will you do it yourself? Maybe you can do it yourself if you have a certain time. Why would we? Yeah. Just unmute. Okay, get started. Okay. Okay, so... Hello, my name is Maxime Coquelain and we speak with Victor today. We talk about how virtual networking, virtual machine networking can benefit from DPDK. So first, we will start with an overview of DPDK and virtio and virtio user. Then we will talk about the challenges that we can face when implementing virtio into DPDK. And finally, we will talk about the new and upcoming features that are being introduced to address these challenges. So let's start with the overview. So DPDK is a set of user space libraries. Its goal is to achieve fast packet processing. So it means data plane development kit. And the idea is to benefit from the flexibility bring by software in order to achieve performance that is close to dedicated hardware solution like ASICs, for example. So the project is released under the BSD license. It supports multiple architectures like Intel X86, IBM Power 8, Teleride TechJX, and ARM V7 and V8 more recently. Most of the server grids are supported from main vendors. Also, it supports now over kind of hardware like crypto devices with a new crypto framework. And some parts of the DPDK project are being reused by a storage project called SPDK in order to have a virtio user backend emulating exclusive. So the talk will be about Linux but DPDK is also supported on BSD. So I draw this timeline. So the idea is that I draw the main milestone. So in 2012 Intel released the first version of DPDK. It was stored on their website. So multiple companies started to fork the project and created a kind of fragmentation. So we have some drawing problems. The year after in 2013 a French company called Sixwind initiated the DPDK.org community. So they put in place servers, mailing lists and git repositories so that over a company could start to contribute back to the project. In 2014 it was first released in DPDK, the first package into a Linux distribution. So it was Fedora. So before 2015 only Intel architecture was supported. But hardware diversification started in 2015 with IBM and Telera architectures being introduced. ARM followed the year after in 2016 and also with the crypto framework I took in the previous slide. And in 2017 it's not a technical milestone but since we know support the project is not supported on multiple architectures it's no more an Intel only project. There is a decision to move the project under the Linux Foundation umbrella. So I draw the versions below so I won't list all the versions. I wanted to highlight that since last year the numbering scheme changed to a year dot month to give more productivity. And starting last version 16.11 the decision was made to move to four releases a year because there is more and more contributions. So in last version the project represented about 750,000 lines of code split in 6,000 commits from more than 300 contributors. So the idea of this slide is to give a simplified comparison on how an application based on DPDK compares to a traditional application relying on kernel networking stack. So the idea is to bypass the kernel to do all the stuff in user space. So to access the NIC register directly we rely on VFIO so that the application can directly write into the NIC registers. And also the user space pages storing the packet are directly referenced by the NIC descriptors. There is no copy from user to kernel of the packets. So to achieve the best performance DPDK relies on number of system features from which we have it relies on CPU isolations and soon CPU partitioning. So the idea is to have dedicated physical cores that pulls the NIC. And to do this polling as I said earlier it relies on VFIO or we could use also UIO but for security reasons we would advise to use VFIO. To have the best performance we also rely on NUMA RNS. So the idea that all the resources, the memories, the PCI devices have to be on the same NUMA node as the physical CPU that is running the PMD threads. And also it uses huge pages so that we relax the TLB cache and also the advantages that it avoids swapping. But we couldn't have that we cannot afford. So the idea is to avoid inter-attangling, to do polling mode, to have best latencies. And as I said earlier to avoid switching from user to kernel mode. Because for example Syscall, some measurements made by Jesper Boer showed that the time budget needed to perform a Syscall is higher and the time budget we have to transmit a 64-byte packet at a 10-gig line rate. So this slide shows the different libraries that the DPDK project is composed of. So we have a first set of libraries that are the core libraries that are used to do abstraction layer, packet presentation, memory pools. The main part is the PMD drivers, so all the unique drivers. So they can be physical like Intel, Cisco or Melanoc Snicks. Or in our case of what interests us today are DPDK support, Virtio and Vost PMDs. And the Vost PMDs relies on the Vost library. So that applications can either use the Ethernet API directly. The Ethernet API to be able to use the Vost device as if it was a physical NIC. Or it can directly use the Vost API which maybe provides some more flexibility. So Victor, we talk about that. Thank you, Tim. So we have DPDK library which is all about high performance. And in general it's not related to virtual machines or virtual machines networking. It's a way to enable user space application on the host to perform very fast in networking by implementing network stack in the user space. So it just happens that DPDK also support some kind of virtual devices. But let's see how we can achieve this kind of high performance. And just to give you some understanding about the speed and bandwidth and latency that we are talking about. We're talking about sending and receiving 10 million packets per second. And we're talking about smallest packets. And we want to achieve the bandwidth of order of magnitude of 10 gigabits per second. And at this speed we have each packet to handle each packet about 200 nanoseconds which need to be used also to copy and to handle all the needed stuff. But before we see how DPDK feeds virtual networking in VMs, I want to show you some overview about how regular networking is done from virtual machines. So we have different options. One of them is device emulation when we just emulate some simple networking cart. And this approach works well but the speeds which we can achieve is about 50 megabit per seconds with this approach. Another approach is direct assignment of networking cart into VM and then to give network stack in the kernel of virtual machine to handle the network traffic. This way we achieve about 500 megabits per seconds and main overload is in the kernel of the guest. Another possibility is much faster and is using virtual device. Virtio is just a virtual device that is supported by KVM and QM. And modern kernels both in Windows and Linux have inside them the driver for virtual IO devices. And QM and KVM support these kinds of devices by Vhost. And what is the Vhost? Vhost is instead of implementing the network cart in QM and user space, we have in host, in kernel of host we have some entity which we call Vhost. It's just thread that running and it communicates directly from kernel of host to the VM. Usually it's kernel of VM and it can place packets and buffers to and from the memory of guest operating system. So in this installation it still runs the network stack in host machine. So it's quite large overhead but we eliminate the additional overhead which related to virtual machine layers. So we can achieve the same speeds of about 500 megabits per seconds with this installation. And to go to much higher speed we have to use solutions like DPDK which enable to the user application some kind of direct access. So how does it work? We have this virtual host in the host which can access directly the physical memory related to virtual machine. And we have a possibility to send and receive signals about new buffers that arrive to and from VM. So we achieve it in one direction by IO event FD, the usual mechanism in the Linux kernel. And to signal guest operating system about new buffers that arrives we use the protocol of IRQFD. And this internal virtual emulation is really the main mode which is currently used when you use today virtual machine with virtual drivers. It's what usually means that we have virtual emulation on the host operating system. The QM handles all the control and negotiation and all the things around that not related directly to the data pass. And the data and buffer themselves goes directly from the host operating kernel of host operating system to the guest. And this way we will bypass the QM and anything which lowers the speed. And in kernel we can just pass the buffers between guest to top device or directly to the NIC and all the switching possibilities exist. So it just was an overview of what is virtual networking and later I will explain how we want to introduce DPDK very high speeds into virtual machine networking. Another existing possibility which is very useful and is going much acceptance by enterprise and by industry is that we enable to third parties to supply their implementation of this V host. So we call this V host user and QM has a new interface which enables user applications from third parties to supply this back end and to supply such functionality. And DPDK have V host implementation in its library to support this and this enables all communication like OVS and software switches and other features. It works very similarly to V host in the kernel just instead of using IRQFDs and event SDs every signaling is going through unique sockets. So it's slightly slower but the main data pass is still going directly from one user application to the guest and all the buffers are just mapped to shared memory and everything works smoothly. So Maxime will talk about some challenges that we currently see and later we will explain how we are going to address these challenges. So yes we have a number of challenges that we need to address. So the first one and the most important one is the performance because DPDK is all about performance. This is what we want to achieve. So we can split changes in the trade-off of multiple aspects and three of these aspects are the bandwidth. So the idea is that we want to achieve line rate even for small packets, something that is not possible with the kernel today. Of course we want to have latency that is as low as possible, for example with the 5G norm that is coming, this is becoming really important. And of course the CPU utilization because the more CPUs you need to achieve the best bandwidth and latency, the higher the price the cost will be. So today we invest much on bandwidth and latency because that could be show stopper if we don't achieve the right performance at the expense of the CPU utilization. And to do this we need to take into account the hardware architectures of the CPUs as much as possible but the problem is that we have multiple architectures to support and so it's a challenge because one optimization that you will do for one architecture like Intel could cause regressions on over like ARM. The second challenge is reliability. So especially for NFV use cases we cannot afford packet loss. But this is something that is hard to achieve especially when we want to achieve the max performance because Vertejo is only relying on the CPU. So it's really CPU intensive and any scheduling glitch can cause packet drops. So you need to have your PMD thread isolated from over possible perturbations as much as possible. The over challenge is the migration because when we want to migrate from one host to another, QMU saves its states and restores them. But in the case of the host user we have a third party so that could be OpenVswitch or VPP and we need to ensure that the migration that will happen will be successful. So if you migrate from a new host to an old host and that doesn't have the same version of OVS for example, you could have started the first host with a feature set that is not available on the destination and that would break the migration. And to achieve this, so this is ongoing work but we need to rely on the management tool to ensure that when we start the host we start it with the right configuration so that migration is possible with the over ones. So Victor, talk about security change. Generally the security and high performance is a big trade off so since to achieve high performance you need some kind of direct assignment of hardware to the user space and you lose security. And using VM is mainly in the scope of applications that want this isolation and security between VMs and between processes. In context of NFV if you want every network function to be run in different virtual machine we need to supply both high performance networking between these virtual machines and high isolation and it's quite hard to achieve. And all current implementation of networking between VMs required some kind of mediator in the past between VMs or between VM to the network cart and this mediator mainly just copies buffers from one memory to another memory from one to another. And memory of one virtual machine to another and by the way just checks if there is writes to read or write that memory locations. And to achieve zero copy we need something else since zero copy is required for high bandwidth and it's problematic from security point of view and later we will see how we propose to address this point. Okay so now we will talk about the new and the upcoming features that we are working on to address these challenges. So receive mergeable buffers so this is not a new feature so this is here since quite some time but I will refer to this feature in the next slides so I wanted to give a brief overview of the feature. So the idea of receive mergeable buffers is to make possible to receive packets that are larger than the virtual descriptors buffer size. So for example here you have descriptor zero but transmit one packet so we have number first equal one and we want to transmit a packet after this that fits into three descriptors so we need to set the number first field of the virtual net header to three. So this is allowed to transmit large packets but it has one disadvantage is that it introduce some extra cache miss because even if you always only send small packets you will have to pass this number first field and even if we know that it will be equal to one. Second feature is indirect descriptors so it had been introduced in 16.11. When you have direct descriptor chaining if you want to transmit a large packet if the VM want to transmit a large packet it will use multiple descriptors that will be chained. So for example here we chained from descriptor two to descriptor zero etc etc. So the problem is that when you only send large buffers you will if all packets you will send text for descriptors you will divide your ring capacity by four. And with indirect descriptors we add a new table so that whatever the size we fill this table with the number of descriptors needed and whatever the size the ring capacity will be the same. So as I said the advantage is that the ring capacity is increased and we see some improvement when transmitting large number of large buffers, large packets. But we also noticed that it improved zero packet loss when your system is not fine tuned so as I talked earlier on the reliability challenges. Small scheduling glitches can cause a packet drop and so even for small packets we always use two descriptors today. One for the vector unit header and one for the buffer descriptor and by using indirect descriptor feature we double in fact the ring capacity. But it has a small cost because you have one more level of end direction. So if you don't care about packet loss you will notice performance degradation of about small performance degradation around three percent in my measurement. Another feature that has also been introduced in 16.11 is the Vost DQ zero copy. So by default what we do is that when we receive on Vost side a buffer from the guest, a descriptor from the guest we perform a copy from the buffer descriptor to the Mbuff buffer. Mbuff being the packet structure in dbdk. So the idea is simple is just to reference this descriptor buffer directly into the Mbuff. So for VM to VM we see a big performance improvement for large and standard packets. But it has a very limited scope today so it's a little hacky because the Mbuff structure is not a thing today to point to external buffers. So it works only for VM to VM today and because when we transmit to NIC the NIC does not release the packets as it releases them by batch. So sometimes we can run out of descriptors into the VotQ. So there are some work going on to improve this situation. Another feature is the MTU feature. So the idea is for the host to share the maximum supported MTU. So it can be used of course to be able to set the MTU value across all your infrastructure. But it can also be used to avoid the extra cache miss I talked earlier when we want to enable receive mergeable buffers. Because if we know that the host will never send packets larger than the descriptor buffer size, we can just disable the feature at initialization time. And so we save this extra cache miss. Another feature that is being reviewed on QMU is the host PCI. So maybe it will be in DPDK 70.5 but it's not here yet. The idea is simplified here. So when using a traditional way to transmit a packet from one VM to another, the packet goes to the host V-switch and V-switch will transmit it to the second VM. When using V-host PCI, the V-host PCI device maps directly to the second VM VotQ. So it's a little more complex than this because you have to set this up. And so it requires an introduction of a new protocol, the V-host PCI protocol. And QMU runs a server on the first VM and the second VM will connect to. The advantage is that it improves the performance, of course, because the two virtual machines will share a single VotQ. And so the packet does not have to be copied multiple times. And there is no change needed into the VotQ guest because of its design. But it has some disadvantages that we have in the first release. On the first version that is being reviewed today is our security challenges because the first VM will map all the second VM memory space. So if you don't trust the second VM, this is something you cannot use today. And this is something that could be solved with IoT-LB features. So Victor will talk about this after. And also there are some concerns about the migration because your two VMs, the two QMU instances are connected through a socket. So it might be at some complexity to perform a migration. Okay, so now I will let Victor talk about some security stuff. Okay, so to achieve high performance, we have several solutions. One of them VHOST PCI which maps directly part of memory from one guest to another guest and implement in DPDK library the functions to access virtual queues. Another solution is to enable networking from VM to host user space. It's VHOST user. But again, to achieve high performance, we map the whole physical space of guest to user process and to enable to access any buffer in any place. It's not secure, it's not isolated. If you think about two VMs, you cannot trust another VM from bugs or from some malicious software. When you talk to some process on the host, in theory it's better. You can consider it as trusted, but still if it's some third-party user process, it can have bugs and it's not healthy to give the access to the whole physical memory. So how we can fix it? On the host, we have a notion of IOMMU. IOMMU is one of its features is to restrict the access to the physical memory on the computer from devices, from physical devices. The physical device today can have firmware, can have bugs, and also if it's on default, it can overwrite some places on the physical memory of host. And modern processors, both AMD and Intel, each of them support different mechanisms that enables every access from the PCI bus to undergo some address translation. And it's IOMMU and it's implemented in real CPUs. So when we talk about virtual machine, for today there is no any support for translations of guest user space address to guest physical addresses. And if you want to supply such a feature, we have to implement IOMMU emulation in QM and KVM. And this feature is going to be added both to the KVM and DPDK. First of all, I would like to explain how it's going to work in case of the host when it's sitting in the kernel memory of host. So the basic idea is quite simple. QM represents the machine to the guest operating system like it does support IOMMU and then VM can start using it by means of IOMMU driver. And from that point, every physical descriptors to the network card will include guest logical addresses to the memory instead of physical addresses. And that means that when we work with buffers, every such address need to be translated from guest logical address to guest physical address, which is the same as host user space address. And the knowledge of such translation table is sitting in QMU since QMU handles TLB requests. And the actual accesses are performed by V host which sits in the kernel. So the kernel each time he sees the new address which needs to translate, just send the question to QMU about how to translate and for following accesses just caches this translation information. Another mechanism that need to be explored when VM unmapped the DMA memory, the QMU sends the invalidation request to Vhost just to let Vhost know that this address is not more in use. And next time Vhost need to ask again if he sees the same address. So it's the basic idea which also going to be included in the probably next release of QMU and KVM. And to support this for Vhost user backends, the basic idea is the same. The difference is that the means how we communicate information about IoT-LB translations and about cache invalidations. So again the backend has to keep some kind of IoT-LB cache and each time it see new IO virtual address which it don't know how to translate it just ask QMU but this time by means of Unix socket and also IoT will be cache invalidations are sent by means of Unix socket. It's slightly slow but it's still not in main data pass so it doesn't affect much the performance. So the conclusion, the DPDK is now under active development to support various features for virtual machines. We think that 10 megabit packets per second is really achievable with DPDK running in VM and on the host to supply VM to VM communication. And we think these features are crucial to support industrial transitions to NVF, SDN networks since both the security and high performance are very important for this. Thank you very much and now we have time for questions and answers. Was it interesting? Now we have time for questions. Repeat the question. So the question was just maybe we can use some copy on writes and eliminate this way the copying of buffers. And I think Maxime could explain about feature of zero copy which I think does pretty much the same. Actually this is something we haven't considered yet. So maybe if you have good ideas we could discuss about this. And I got the same numbers. So that showed me that VHOS user is light. It seems to be fast enough. But something else like the VHOS PCI is not working or not fast enough. So the question was about that actually the VHOS implementation is quite efficient since you can achieve about 3 million packets per second. It's still lower than we want to reach. We want to reach 8 or 10 million packet per second and we're looking for possibilities since your management and our management shows that 8 to 10 million packet per seconds are achievable when you use some shared memory. And both DPDK on both sides, right? So that's exactly the point. I think that you measured only 3 packets per second VM to VM because all current implementation still needs some imediator in OVS. And what we call the zero copy eliminates only copy from weird queues to M buffs. But there's still one copy from one weird queue to another weird queue. And this which lowers the performance. And we're going to address this by Virtio PCI devices which will enable us to map directly the ring buffers of one guest to another guest. And this way we eliminate this last copy of the elimination of OVS. In reality it's much more complicated since if we want still to have some switching capabilities we need to implement some small switch inside DPDK which will reroute this traffic directly to the buffers of another VM or down to full OVS on the host. But it is a way since if we want to have NVF and SDN on using virtualization and virtual machine we have to supply them with this kind of performance. Since the only things that just trade off between flexibility that NVF require and high performance which only till now only bare metal solutions could supply. And we still not there but we are going to that direction to supply bare metal performance with flexibility and security of VMs. Any other questions? Please. I don't know. I think a JSON is working on it. So you should add JSON. Okay. The question was if the host implementation, current host implementation supports busy polling to the channel. And if there's no more questions, so thank you very much for your interest.