 Okay. Welcome. Let's get started now. Yes. Then today's topic is hardware acceleration for the Edge network working. Yeah. Then my name is Fu Shengjiao coming from Intel open source technologies center. Yeah. This is today's agenda. I will first introduce the Edge computing and it's the demand on the end to end low latency. Yeah. Then I will have a quick talk about the costs in the traditional Linux networking processing. Then how these costs are eliminated by the kernel bypass way. Then afterwards I will talk about a range of hardware acceleration technologies with the purpose to speed up the network data packet processing one by one. Yeah. Yeah. The first page is why I can computing. As we know that in the context of central cloud, the latency can be reaching for around 25 millisecond to 200 milliseconds. Then there are nowadays there are many emerging usage cases that are demanding the lower latency and accelerated data processing at the edge side. For example, the NFV edge in infrastructure and the autonomous drive driving and AR VR and the industry IoT. Then they are all demanding the low latency less than 20 millisecond. Yeah. So for the next page, I will talk about the end to end low latency in the edge computing. The end to end low latency means that there should be very low latency every step of the data processing and the data transmission. And we can see that the first step is that it should be the low latency of the data acquisition on the device. Then for the second step, it requires a low latency of data transmission between device and edge side. And for the third step, it requires the high performance data processing on edge cloud. And then we can see that for the second step, the low latency of data transmission between device and edge cloud can be further split into three steps. The first step is the TXRX on the device. Then the second is the data transmission on the air. Then the third is the TXRX on the edge cloud side. Then we can see that the TXRX on the edge cloud side is highlighted. Then this is the focus of this topic. Okay. Yeah. The next slide, let's have a quick look at Linux kernel networking processing. After the big data packet comes into NIC, it will be copied into the socket buffers in the kernel space by DMA. Then NIC generates the interrupts to CPU. Then the device driver manages the socket buffers and feed the data into the networking stack. Finally, the data is copied to the user applications. Then as we can see that there are various costs in Linux kernel networking processing. The first cost is related to the IRQ handling. When a huge amount of data packet are received on NIC, we can see the cost of the huge amount of IRQ handling. The other cost is related to the overhead of the context switch. Then there are two types of context switch here. The first is the context switch between the IRQ and the kernel thread. The second overhead is the context switch between the kernel and the user space. Then we can also see the cost of memory copy. The first time of memory copy is from NIC to kernel. Then the second time of data copy is from kernel to user space. Then the user applications need to invoke the system cost to retrieve the data. There is also the cost of system cost here. Let's come to the next page. In this diagram, we can see that the kernel bypass way here is specifically referred to DPDK. The kernel bypass way can eliminate the costs in Linux kernel networking. The pulling mode is running with DPDK. So there's no cost of IRQ handling anymore. Then also the overhead of two types of context switches are limited. Also we can see that the data packet is copied from NIC to use the applications directly. There's only one time of data copy. The last one is the cost of system cost because the user application does not need to call any system cost to retrieve the data from kernel. We can see that there is also no cost of system cost with kernel bypass way. Next, let's have a look at huge pages. In this diagram, we can see that there are four levels of page table for 4K page size. Then there are three levels for two megabyte page size. Then there are only two levels for one gigabyte page size. Then the conclusion is that the larger page size means the less page table levels, then there is the less number of TRB entries. Then finally it can reduce the TRB misses. The next page is talking about the DDIO. The full name is Data Director IO. Then we can first have a look at the picture on the left side. On the left side it is without DDIO. Then the data is copied from NIC to memory by DMA. When CPU access the data in memory, the data is first need to be loaded to the last level cache. Then we can come to the picture on the right side. We can see that with DDIO the data is copied from NIC to last level cache directly. Then the CPU can access the data in last level cache directly. This page is about the data that we receive inside the scaling and the CPU affinity. We receive inside the scaling on NIC and it remains the hash function. Resulting hash value can be mapped to a CPU call. Let me give a simple example. The hash value of the source IP address, TCP port and the destination IP and the destination TCP port can map a TCP connection to a dedicated CPU call. Then the CPU affinity can pin a process of thread to a CPU call. So a process of thread can run on a dedicated CPU call and process all the data packet for one dedicated TCP connection. Then in this way it can avoid the cache misses effectively. In this page let's have a look at the relay architecture of VM host IO interface. In this diagram we can see that for round end is water IO. Then the back end is VM host user running with DPDK. Then the data flow is from NIC to DPDK. Then to open V switch. Then to the VM. We can see that. Then the role of this relay architecture is that it is quite easy to support the VM live migration. Then the call is that there are memory copy overhead because that the data first need to be copied from NIC to DPDK. Then the second memory copy is from DPDK to VM. Then this page shows the path through architecture of VM and VM host IO interface. Then in this architecture the VF on NIC can be assigned to VM directly. Then we can see that the role of this architecture is that it can achieve the same performance as bare metal. This is because that there is no relay between the NIC and the VM. Then the call of this architecture is that it is quite hardware dependent. In many cases it is not possible to support the VM live migration. Here there is a table to compare the calls and the per rest of the relay and the path through architecture. Then we come out a question here. Can we have the high performance with path through and the support of the live migration with relay? Then here comes the solution. Here comes the solution. The solution is called VDPA. The full name is called Vhost data path acceleration. Then we can see that the configuration path can go in the relay way. Then the data transmission path can go in path through. Then here we can see some details. We can see that the CSR and the bug configuration and the map are handled in the relay way. But the ring buffer can be assigned to VM in the path through way. Then the VDPA can achieve all objective. It can achieve the CSR. I always like performance. On the other hand, it can support the VM live migration. The next page is called SmartNIC. As we know that there are several kinds of SmartNICs. In this page, I think that I will only talk about one type of SmartNIC. This type of SmartNIC can do the OVS outloading. Then we can see that all the OVS logic is outloaded from the host to SmartNIC. Then the data processing and forwarding on the SmartNIC can be accelerated by IPGA or ISIC. Then for the last page, I will talk about the quick assistant technology called QAT. Then the encryption and decryption algorithms can be outloaded to QAT, including the symmetric PKE outload and the symmetric chain, the cipher outload and the pseudo-random function outload. Now QAT is supported in open SSL with QAT engine. Then currently, as we know that the QAT can be designed to VM by PCI passes through. Then I think that in the near future, the QAT may be managed by SSL bulk. Then I actually noticed that there is a technical session about SSL bulk. Then I saw the content, there is some content related to the QAT. So now the QAT can be managed by SSL bulk. I think that I have finished my content. Do you have any questions?